Overview:  Recording the Index Cards
 
 

 

Part of the project is the creation of digital versions of as many of the (physical) index cards as possible. To achieve a high level of both quantity and quality of the digital index cards, a process had to be designed which deals with the different sources of information.

Up to now the Bildindex contains about 300,000 (physical) index cards referencing mainly pictures and books. The (digital) index cards contain textual information about the referenced item (picture, book, etc.) and offer a preview of it (thumbnail of picture, abstract of text, etc.). The data for an index card is thus compiled from different sources:

  • The physical index cards of the Bildindex contain both textual information and contact prints (in the case of pictures). They are the first source of information to be included into the WEL.
  • Some of the slides and photo negatives from which the contact prints where created are still available. They form a better basis for digitizing than the prints.
  • Some preliminary work has been carried out to make the information available electronically (in the context of other projects). This includes the scanning of pictures in the first place. The data from the older projects is included into the WEL.
  • A fair amount of work on bibliographies and picture collections is being done throughout the world. It is planned to make some of those indexes available to the WEL.

There are many problems associated with the gathering of data from the sources of information:

  • The information on the physical index cards is hand-written, and follows no schema. This makes automatic text recognition (OCR) impossible.
  • The physical index cards have worn out through the years. This is why scanning whole cards using automatic paper transport doesn't work.
  • The contact prints are non-uniform in size, color, and quality. Their location on the index cards is different in each case (sometimes they are even on the back-side of the card). Only some of them can be used to produce thumbnails.
  • The information of the physical index cards is often incomplete and sometimes outdated. To improve the quality of the information, additional research has to be done.

Several scanning technologies have been evaluated. The constraints imposed by the physical index cards led to the following decisions:

  • The textual information is recorded manually. A software tool has been developed to assist the researcher in doing so.
  • The contact prints are photographed, and the resulting slides get scanned. The latter can be done in a semi-automatic way. The image quality is improved by hand later on.

overview / project partners / work items / presentations / publications / related links
Software Systems Institute Welmaster