|
Part of the project is the creation of digital versions of as many of the
(physical) index
cards as possible. To achieve a high level of both quantity and quality of the digital index
cards,
a process had to be designed which deals with the different sources of
information.
Up to now the Bildindex contains about 300,000 (physical) index cards referencing mainly pictures
and books. The (digital) index cards contain textual information about the referenced item
(picture, book, etc.) and offer a preview of it (thumbnail of picture, abstract of text, etc.). The data for
an index card is thus compiled from different sources:
- The physical index cards of the Bildindex contain both textual information and contact prints (in the
case of pictures). They are the first source of information to be included into the
WEL.
- Some of the slides and photo negatives from which the contact prints where created are still
available. They form a better basis for digitizing than the prints.
- Some preliminary work has been carried out to make the information available electronically (in the
context of other projects). This includes the scanning of pictures in the first
place. The data from
the older projects is included into the WEL.
- A fair amount of work on bibliographies and picture collections is being done throughout the
world.
It is planned to make some of those indexes available to the WEL.
There are many problems associated with the gathering of data from the sources of
information:
- The information on the physical index cards is
hand-written, and follows no schema. This makes
automatic text recognition (OCR) impossible.
- The physical index cards have worn out through the
years. This is why scanning whole cards using
automatic paper transport doesn't work.
- The contact prints are non-uniform in size, color, and
quality. Their location on the index cards
is different in each case (sometimes they are even on the back-side of the
card). Only some of them can
be used to produce thumbnails.
- The information of the physical index cards is often incomplete and sometimes
outdated. To improve
the quality of the information, additional research has to be done.
Several scanning technologies have been evaluated. The constraints imposed by the physical index
cards led to the following decisions:
- The textual information is recorded manually. A software tool has been developed to assist the
researcher in doing so.
- The contact prints are photographed, and the resulting slides get
scanned. The latter can be
done in a semi-automatic way. The image quality is improved by hand later on.
|