Quite a bit of work has been done on uScript in the last 10 years. Here is where all of the research is kept!

Making History: an Emergent System for the Systematic Accrual of Transcriptions of Historic Manuscripts

The Emergent Transcriptions Initiative on Sourceforge

Transcription Assistant API

Current research to further the project

Document Image Analysis toolbox–A paper describing algorithms for analyzing a document and performance metrics for each of the tasks presented.

“UW-ISL Document Image Analysis Toolbox: An Experimental Environment”

J. Liang, R. Rogers, R.M. Haralick, and I.T. Phillips.


Survey of 40 different methods

Reactions: useful analysis of a wide variety of methods. Includes simulation of noisy documents. Conclusions are drawn in paper: clustering-based methods and locally adaptive methods are best for document binarization. Kittler method found to be most effective

Irregular Pyramid method-from ICDAR 2003

Reactions: Conceptually interesting technique, because not only does it do BOTH segmentation and binarization from the grayscale image, but it does so using local adaptive and clustering methods for binarization. Difficult to follow the exact details of the scheme, as the English in the document needs some work.

From ICDAR 2001

Binarization using a Coplanar Filter

Solution to binarization based on noise reduction in the source image. Reactions: Doesn’t sound like it’s what we need–the noise they focus on is not similar to what seems to be our primary problem, which is some words being lightere than others, or obscured in stained areas of the document.

Neural-Based approach to binarization