Transcription Assistant

The Transcription Assistant (TA) is an open-source Java application that runs on an end-user machine through the Java Virtual Machine supported by the Java Application Interface. With the TA, scholars will be able to create a project for each paper or research topic. A project will include several manuscript pages from a variety of collections that together contribute to the development of an historical paper on a specific subject matter.

For each manuscript used in a project, after an XPG image or an MML transcription is downloaded, the transcriber will use our Transcription Assistant to transcribe all or part of the document. Once the transcription is finished, transcribers are required to submit their transcriptions back to the archive so they can be made available to other transcribers, a necessary aspect to the successful functioning of uScript as a collaborative and ever-improving collection of manuscript transcriptions.

The TA is designed to greatly facilitate the process of transcription. It consists of a main screen split into two windows (vertically or horizontally, according to the user’s preference). On one window is loaded the manuscript image and on the other is visualized the transcription, either as a positionally accurate print preview, or as word-processable text, or in MML format.  Automatically detected word boxes (determined upon import into the Archive Assistant) are also shown to the user. We are currently working on making the manual act of box correction a rewardable feature of uScript as well, tying it in to a user’s credit level.

Once each word has been boxed, the user can begin the actual process of transcription. Un-transcribed boxes are red. The next box to be transcribed can be clicked on, revealing a text field where the transcription can be typed. To emulate the functions of a word processor, the user can move to the next box by simply hitting the space bar in between boxes. Once a transcription has been typed in, its box will turn green and its translation will be entered into the transcription MML and will appear in its exact relative location in the preview window.

The current version of TA allows the user to right-click on a box to annotate the transcription. A primary form of annotation has to do with differentiating graphics or symbols from text. If a box is tagged as an “image” or “symbol” a cropped piece of the manuscript will be copied into the preview window (and into the underlying MML). Currently, we provide the following other types of annotations for text boxes: (i) manuscript annotations such as for stricken text or corrected text; (ii) tagging of abbreviations; (iii) identification of numbers; and (iv) identification of handwriting changes (different author). We foresee adding more of these annotations – such as the tagging of currency and marginalia – as well as second-order tags to identify proper names, names of places, professions, dates and the like.

The one-to-one pairing of word-boxes-to-text transcriptions is broken only by abbreviations, wherein a single word box can be exploded into more than one transcription word. In any case, this pervasive one-to-one correspondence allows the accrual of handwriting-recognition capabilities, which are planned for future versions of the software. We foresee that when users will experience difficulties in transcribing a specific word, they will be able to ask for help by hitting a help key.

Using the manuscript metadata for a bounded search – limited to manuscripts that are likely to have the “same hand” – the system will be able to pattern-match the handwriting in the box where the user is having trouble, with a storehouse of boxes from previously completed transcriptions from the same source, yielding transcription suggestions ranked by their different levels of matching. The user will thus be able to pick the suggestion that best fits the sentence being transcribed. We plan to explore the possibility of making this advanced capability available “for a fee” in order to fuel our incentive program. We want to entice users to submit completed transcriptions in order to get the credit they need, so they can later spend their banked credits to “buy” services like this “transcription help”.

Once a transcription has been completed, the user can save it and/or upload it back to the originating archive server for credit accounting. After several manuscripts have been worked on, the user can also save the project and wrap things up for the day.