As soon as researchers began to bring laptops into archives’ reading rooms, they immediately realized the limitations of plain word processors in the laborious work of transcriptions and began to clamor for custom-designed tools to make the tedious job as easy as possible. To this day, the most common process for manuscript transcription is to create a digital file that contains the text of the manuscript. A historian will sit at a computer in the archive’s reading room with an open manuscript nearby, and use a word processor such as Microsoft Word, to type into the computer what they read on the manuscript.

There are many drawbacks to this method, including a lack of support for manuscript- specific styles (e.g., erased text, words written in a different hand, abbreviations, etc.) which are not available on modern word processors, not to mention the difficulty in keeping track of multiple files related to several sources and possibly from different archives. After the historian has finished transcribing the manuscript and returns it to the shelves, another person could, the very next day, start transcribing that very same manuscript, having no way of realizing the existence of this redundancy.

Early Single-User Windows Applications

In the late 1990s, Professor Reinhold Mueller (University of Venice), Dr. Giovanni Caniato (Venice State Archives), and Dr. Stefano Piasentini, historical author and expert user of the Venice State Archives, proposed to the Prof. Fabio Carrera of the Worcester Polytechnic Institute (WPI) to produce an easy-to-use application that would facilitate the painstaking job of historic transcriptions. With this goal in mind, Prof. Fabio Carrera and his colleague, Prof. Stanley Selkow (WPI – Department of Computer Science) began to sponsor undergraduate research projects at WPI in 1998. The initial versions of the application were stand-alone Windows programs that represented enhanced versions of word processors, customized for historical transcribers to include some of the aforementioned missing features.

The initial application, which has evolved into a key component of uScript, was called Transcription Assistant (TA). It began to take shape over a five-year period (1998-2003), as users’ needs were gradually translated into working computer tools. In 2003, the Transcription Assistant was ported from Visual Basic™ to Java™.

On-line Collaborative Emergent Transcriptions Applications

In 2004, inspired by Steven Johnson’s Emergence, the system was re-designed into a true emergent system, wherein the original Transcription Assistant tool is now merely the user interface – and in a sense the “hook” – through which scholars are enticed into sharing their transcriptions with other researchers in a never-ending, self-propagating and self-correcting virtuous cycle. A web interface was added and a smart auto-boxing feature was developed, employing genetic algorithms to automatically detect word boundaries in the manuscript images.

In 2007, we decided to port the entire system to a browser-based AJAX application, based on the Google Web Toolkit, which forms the basis for shared online platforms like Google Docs. The name “uScript” was adopted for the system during the course of this latest redesign.

Publications and dissemination

Several papers and reports on uScript have been published to date. Each WPI student group who completed a research project on the topic also produced a final report, and  Fabio Carrera wrote a conceptual design paper that was presented at the IEEE 8th International Conference on Document Analysis and Recognition in 2005.

Leave a Reply