Previous MQP Reports: 2003 | 2007 | 2008 Paper
The Emergent Transcriptions Initiative on Sourceforge
Transcription Assitant API
Metadata
- The Dublin Core Metadata Initiative
- DCMI Tools
- The International Council on Archives (ICA): organization in charge of the ISADG metadata standard
Archives:
Current research to further the project:
Document Image Analysis toolbox--A paper describing algorithms for analyzing a document and performance metrics for each of the tasks presented.
"UW-ISL Document Image Analysis Toolbox: An Experimental Environment"
J. Liang, R. Rogers, R. M. Haralick, and I. T. Phillips.Binarization:
Survey of 40 different methods
Reactions: useful analysis of a wide variety of methods. Includes simulation of noisy documents. Conclusions drawn in paper: clustering-based methods and locally adaptive methods are best for document binarization. Kittler method found to be most effectiveIrregular Pyramid method-from ICDAR 2003
Reactions: Conceptually interesting technique, because not only does it do BOTH segmentation and binarization from the greyscale image, but it does so using local adaptive and clustering methods for binarization. Difficult to follow the exact details of the scheme, as the English in the document needs some work.From ICDAR 2001
Binarization using a Coplanar Filter
Solution to binarization based on noise reduction in the source image. Reactions: Doesn't sound like it's what we need--the noise they focus on is not similar to what seems to be our primary problem, which is some words being lighter than others, or obscured in stained areas of the document.
Last Updated: April 9, 2008
Copyright © 2008 Uscript, "www.uscript.org". All Rights Reserved.
WPI logo appears copyright © 1995-2008, Worcester Polytechnic Institute. All Rights Reserved.