Kentuckiana Digital Library Production Guide version
2.0
1- About this Guide | 2-
Project Planning | 3- Text Encoding
| 4- Item Metadata | 5-
Digital Imaging
1. About top
of page>
As the Kentuckiana Digital Library has developed, standards
and best practices have been adopted to ensure the long
term viability and preservation of digital assets. This
guide outlines these standards and best practices employed
by the Kentuckiana Digital Library Production Center
at the University of Kentucky.
2. Project Planning
top of
page>
Proposal Form
Project planning for projects included in the Kentuckiana
Digital Library developes as a dialog between a contributing
archive and the central digitization center at the University
of Kentucky. Before a dialog begins, a project proposal
form is completed by the contributing archive. [download form]
Copyright Issues
Contributing archives are responsible for investigating
copyright issues in regard to digital assets. The following
digital assets management form is utilized when a collection is digitized for inclusion in the KDL. [download form]
A few good sites for investigating copyright issues:
Making of Modern Michigan Copyright FAQ
Colorado Digitization Program Copyright Resources
Peter Hirtle's Public Domain Chart
3. Text Encoding:
An Introduction top
of page>
Text Encoding is the actual structuring of textual
content in digital format. Through the use of <tags>,
a given text can be described structurally in terms
of content and organization as well as style and presentation.
Currently, text encoding practices employed in the digital
library field are largely centered on the XML(eXtensible
Markup Language) standard. XML is an international standard
for the definition of device-independent, system-independent
methods of representing textual data in electronic form.
Because of the system-independent and device-independent
nature of XML, the raw XML files are simply comprised
of ASCII text, allowing for portability to future systems.
XML is not a mark up language in and of itself, but
more specifically a meta language or set of rules and
procedures followed in the creation of a markup language.
This is often a confusing point, but one that is important
to make. The XML protocol has been used to create many
markup languages including EAD(Encoded Archival Description)
used to markup electronic archival finding aids, and
TEI(Text Encoding Initiative) used to markup electronic
versions of texts.
Within the XML standard, Elements, Attributes and Document
Instances are defined through the use of a Document
Type Definition, often referred to as a .dtd file. This
file describes and defines the "valid" structuring
and use of <tags> for a given markup language.
Software can then be used to interpret the .dtd file
and facilitate appropriate presentation in a given interface
such as a web browser or through an application assisting
with the batch or manual encoding of a given class of
documents. One very common example of the use of a .dtd
file by a software application, is the common web browser.
Through the use of the HTML document type definition,
web browsers interpret and display HTML encoded documents
for viewing.
HTML is a fairly simple application of the XML standard.
HTML mainly deals with presentation vs. content description.
With current web browsers, in order to utilize more
sophisticated applications of XML, special software
is required to deal with the non-HTML document type
definitions and convert output to HTML for web browsers.
With the advent of XML, some new web browsers are currently
being equipped with an expanded range of functionality.
Instead of being restricted to a given set or only one
document type definition, web browsers are adopting
the capability to interpret any document type definition
created with the XML standard.
3.2 Markup Languages
Due to the variability of both resource formats and
time and resources to digitize specific resources, more
than one markup language is often required for effective
digital library production and access. The Kentuckiana
Digital Library implements XML compliant standard text
encoding languages available and developed in the academic
digital library field.
Utilized DTDs(Document Type Definitions)
EAD v2.0(Encoded Archival Description) for archival
finding aids (official
EAD Website)
TEI-Lite(Text Encoding Initiative) for additional full-text
resources (official
TEI Website)
3.3 Implementing EAD
The Encoded Archival Description was created by Daniel
Pitti at the the University of California Berkeley in
1993. After five years of development and beta testing,
the Encoded Archival Description markup language was
officially accepted as a standard in 1998 by the Library
of Congress. Since its official release, EAD has quickly
become the standard of choice for the larger digital
library community due to its highly descriptive nature
and standards based XML(eXtensible Markup Language)
compliant structure. The EAD markup language has been
created to describe the content and structure of archival
finding aids. Creating these finding aids is an important
first step in establishing electronic access to archival
collections as well as building a sustainable digital
library. Once electronic finding aids are completed,
the items described within them can then be digitized
and linked to their descriptive data elements within
the finding aid, thus allowing searchable access points.
Because of the wide variety of formats for archival
finding aids, EAD allows for a large degree of flexibility.
Still, it is useful in a union environment such as KYVL,
to adopt a standard best practice guideline within the
EAD structure for encoding finding aids. This allows
for better searching as well as a degree of common look
and feel for users.
3.4 Using the EAD Web Template Generator
The EAD template generator was developed by Alvin Pollock
at the University of California Berkeley. Using the
template generator is fairly intuitive. However, keep
in mind that the generator is not intended to produce
a complete finding aid instance. You'll notice that
there is no container list portion to the template generator.
The program produces a finding aid instance down to
the container list. After the EAD code is generated,
this section of the finding aid must be constructed
and added to the generated EAD.
Another important point to keep in mind is that the
generator is really only intended for conversion of
existing finding aids. The template does not allow you
to save a document in progress so you can return to
it later. A good approach is to create the content of
the finding aid and save it in Word or another word
processing software. Then, after the finding aid is
completed, open up the template generator and copy your
finding aid from Word and paste into the template generator.
Please keep in mind that our template generator currently
creates a finding aid as specified by our recommended
best practice. Individual institutions can add additional
code after generating the template output. Also, through
working with the central site, additional elements can
be added to individual institution templates.
Please consult the Template Generator Instruction page
put together by Alvin Pollock for more detailed information
on the how the template generator places your input
into the actual EAD code.
http://sunsite.berkeley.edu/FindingAids/uc-ead/templates/intro.html
Click here to access the list of template generators
currently available for KYVL institutions.
Aside from contact information and specified public
identifiers, these templates produce encoded EAD finding
aids with the same structure. Once completed, the cgi
output can be saved as a text file.
Please contact the Kentuckiana Digital Library Project
Manager to establish a template generator for your institution.
EAD Template Generator
Template Generators for the following
institutions are currently available.
3.5 EAD Structure with Recommended Best Practice
Through the development process for our EAD template
generator, several best practice guidelines have been
consulted. These include the EAD Application Guidelines,
the EAD Tag Library, our consultant's prepared report,
and the RLG Recommended Application Guidelines for EAD.
Due to this consideration, our template is very close
to the best practice examples represented in the above
mentioned resources, especially the RLG guidelines developed
by the RLG EAD Advisory Group.
Here, an overview of the structure for an EAD finding
aid is outlined with the minimum number of required
elements recommended for the Kentuckiana Digital Library
included as well as information on producing an EAD
container list. A sample finding aid constructed with
our EAD template generator is also included here. Please
note that except for the Container List section, specific
EAD elements are not defined in this document. These
are defined in the EAD Tag Library, now available online
for EAD v2.0.
Please note that this document does not attempt to
replicate information in official EAD publications or
in supporting EAD best practice recommendations such
as those specified by the RLG Guidelines and the report
prepared by the KYVL EAD Consultant. However, these
sources have been consulted and given close attention
in order to establish our recommended best practice
within the framework of emerging national best practice
standards for EAD. Also, this document is not meant
to dictate local encoding practice. Additional markup
specified by local standards and finding aid content
may be added to the finding aid generated via the KYVL
EAD template generator. In this way, the KYVL recommendation
can be thought of as a common starting point for all
finding aids in the state. It is strongly recommended,
however, that any additional markup added outside the
template generator be constructed based upon the recommendations
outlined in the EAD Application Guidelines currently
unavailable online.
In order to assign file names to finding aids, please
consult section 4.8 Assigning File Names in Section
4: Metadata Guidelines.
3.5 EAD Outline
|
EAD
|
Header
|
EADID
|
|
File
Description (Title, dates, etc.)
|
|
Profile
Description (Creation date, language,
etc.)
|
|
|
Front
Matter
|
Title
Page
|
|
Date
|
|
Publisher
|
|
Copyright
|
|
|
Archival
Description
|
DID
(Title, origination, physical
description, etc.)
|
|
Administrative
Info (Restrictions, preferred
citation, etc.)
|
|
Biographical
History
|
Marked-up
Paragraph (repeatable)
|
-or-
-or-
|
|
Controlled
Access Terms
|
|
Scope
and Contents Note
|
|
DSC
– Analytic Description
-or-
|
|
DSC
– In-Depth Description
-or-
|
|
|
3.6 Sample Finding Aid With Recommended Structure
and Content
note: Red text is directional and not part of the finding
aid mark-up. Structure and content recommendations developed
by Lisa Carter, Tom Roscoe and Eric Weig.
[Table of Contents Order]
Descriptive Summary
Administrative Information
Biographical Sketch
Summary of Significant Events
Controlled Access Terms
Scope and Contents
Related Material
Series Description
Container List
<?xml version="1.0" encoding="utf-8"
standalone="no"?>
<!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD
ead.dtd (Encoded Archival Description (EAD) Version
2002)//EN" "ead.dtd">
<ead>
<eadheader langencoding="iso639-2b" findaidstatus="unverified-full-draft">
<eadid type="XML catalog">1998ua003<eadid>
<filedesc>
<titlestmt>
<titleproper>Guide to the Margaret I. King(1879-1966)
Papers, <date>1893-1966</date> (University
of Kentucky. University Libraries. Director's Office.)</titleproper>
<author>Processed by Deborah Whalen; Machine-readable
finding aid created by Deborah Whalen.</author>
</titlestmt>
<publicationstmt>
<publisher>University of Kentucky Audio-Visual
Archives</publisher>
</publicationstmt>
</filedesc>
<profiledesc><creation>Machine-readable
finding aid derived from folder labels by rekeying.
Date of source: <date>March 1999</date>
Processed by Deborah Whalen, <date>March 1999</date>;
Finding aid encoded by Deborah Whalen, Special Collections
and Archives, University of Kentucky Libraries, <date>March
1999</date>.
</creation>
<langusage>Description is in<language>English</language></langusage>
</profiledesc>
<revisiondesc>
<change>
<date>January 2004</date>
<item>1998ua003 converted from EAD 1.0 to 2002
by Eric Weig.</item>
</change>
</revisiondesc>
</eadheader>
[Title Page Section]
Title: Guide to the Margaret
I. King Papers, 1893-1966
Papers could also be Collection or Records, etc. depending
on the finding aid
Processed by: Use this to credit
collection processor (person who created the original
finding aid). Any unknown people should be simply designated
as "Staff" followed by any known people (this
is assuming that the known people are more recently
involved than the unknowns). People should be listed
in the order of contribution.
Date Completed: Refers only to
date the processing was completed, not including EAD
generation. If there are two dates, they should both
be listed. In some cases, it would be valuable to list
when the processing was initially completed as well
as when processing was updated following the latest
standards. *Add a revision history section for revision
dates.
<frontmatter>
<titlepage>
<titleproper>Guide to the Margaret I. King Papers,
<date>1893-1966</date></titleproper>
<num>Collection number: 1998UA003</num>
<publisher>Special Collections and Archives<lb/><extptr
show="embed" entityref="ukseal"><lb/>University
of Kentucky Libraries.<lb/>
Lexington, Kentucky
</publisher>
<list type="simple">
<head>Contact Information</head>
<item>Special Collections and Archives</item>
<item>University Archives and Records Program</item>
<item>University of Kentucky</item>
<item>Margaret I. King Library</item>
<item>Lexington, Kentucky</item>
<item>40506</item>
<item>Phone: (606) 257-8372</item>
<item>Fax: (606) 257-6311</item>
<item>Email: <extref href="mailto:uarp@lsv.uky.edu">uarp@lsv.uky.edu</extref></item>
<item>URL: <extref href="http://www.uky.edu/Libraries/Special/uarp/">http://www.uky.edu/Libraries/Special/uarp/</extref></item>
</list>
<list type="simple">
<item>Processed by Deborah Whalen</item>
<item>Date Completed: <date>March 1999</date></item><item>Encoded
by Deborah Whalen</item>
</list>
<p>Copyright 1999 University of Kentucky. All
rights reserved.</p>
</titlepage>
</frontmatter>
[Descriptive Summary]
Title: (marc field 245) Margaret I. King Papers, 1893-1966
Don't use any initial articles or birth and death dates.
Collection Number:1998UA003
List all accession numbers covered by the finding aid.
List primary collection first and other included collections
second.
Creator: (marc field 100) King,
Margaret I., 1879-1966
Use birth and death dates, but not collection dates.
Extent: (marc field 300) cubic ft./linear ft.: # of
boxes and/or folders 9 cubic feet
Enter additional specifications inf available. Often
noted as an approximate number of items in an exact
number of boxes. Can distinguish between boxes of content,
as in: 25 boxes of papers, 3 boxes of photographs. List
extent of papers first and then other materials. Materials
should be described well enough that you can tell if
you have the whole collection. Individual items should
only be listed separately, if they exist separately
and are not considered contents of a box, such as a
scrapbook.
Repository:
University of Kentucky Libraries, Special Collections
and Archives, Lexington, KY 40506-0039
<archdesc level="collection" langmaterial="en">
<did>
<head>Descriptive Summary</head>
<unittitle>Margaret I. King Papers, <unitdate
type="inclusive">1893-1966</unitdate>
(University of Kentucky. University Libraries. Director's
Office.)</unittitle>
<origination>
<corpname>University of Kentucky. University Libraries.
Director's Office. </corpname><persname>King,
Margaret I., 1879-1966</persname>
</origination>
<physdesc><extent>9 cubic feet</extent></physdesc>
<repository>
<corpname>University of Kentucky Libraries</corpname>
<address><addressline>Special Collections
and Archives, Lexington, KY 40506-0039</address>
</repository>
</did>
[Administrative Information
Section]
Access: Recommended default is
- Collection is open to researchers by appointment.
Can be used to designate a collection that is stored
off-site or has other access restrictions such as the
collection is closed until a specified date.
Use Restrictions: Recommended
default is - Copyright has not been assigned to the
University of Kentucky. Can be used to detail how materials
in the collections can be reused or permissions that
need be be acquired before using the collection.
Preferred Citation:
[Identification of item], Margaret I. King Papers, 1893-1966,
1998UA003, Special Collections and Archives, University
of Kentucky, Lexington
Processing Information: This
area should be used to describe the processing of the
collection. While individuals who processed the collection
and the date could be listed here, remember they are
noted on the Title Page. This area should be used to
detail any additional and/or unusual aspects of the
processing, such as that the processing is incomplete
or that the photographs were removed, etc.
Acquisitions Information: This
area should be used to describe how the materials were
acquired by the repository. For example, who donated
the collection, etc.
<descgrp type="admin">
<accessrestrict>
<p>This collection is comprised of University
of Kentucky records, created and maintained in the course
of university business. It is open for research in accordance
with the Kentucky Open Records Act (KRS 61.870-884).</p>
</accessrestrict>
<userestrict>
<p>This collection is comprised of University
of Kentucky records, created and maintained in the course
of university business. The University of Kentucky holds
the copyright for materials created in the course of
business by University of Kentucky employees. Copyright
for all other materials has not been assigned to the
University of Kentucky.</p>
</userestrict>
<prefercite>
<p>[Identification of item], Margaret I. King
Papers, 1893-1966, 1998UA003, Special Collections and
Archives, University of Kentucky, Lexington</p>
</prefercite>
<acqinfo>
<p>Some of the materials in the Margaret I. King
Papers were acquired in the 1960s. Other materials have
no acquisition records. The accession number, 1998UA003,
was assigned to these materials in 1998.</p>
</acqinfo>
</descgrp>
[Biographical Sketch]
This area should be used to privide
history and background information regarding the subject
or originator of the collection. It can be used also
to note significance of the collection and to place
it contextually.
<bioghist>
<p>"She has built the library up from one
that could be housed in a single room to a library that
now contains more than 400,000 volumes and is fourth
or fifth in size among the libraries of the South. It
would be impossible to estimate the value of her contribution
to the University of Kentucky." (Board of Trustees
Minutes 6/25/1948:48).</p>
<p>This is how President Donovan described Margaret
I. King in 1948. As the University's first librarian,
King played a vital role in the development and growth
of the library at the University of Kentucky.</p>
<p>Margaret Isadora King was born in Lexington,
Kentucky, on September 1, 1879, to Gilbert Hinds and
Elizabeth K. King. She earned her Bachelor of Arts from
the Agricultural and Mechanical College of Kentucky
(University of Kentucky) in 1898, and did clerical work
in the Lexington law firm of Allen and Bronston from
1899 to 1905.</p>
<p>In 1905, King began her long career at the
University of Kentucky by serving as secretary to President
James K. Patterson. She became involved with the library
when President Patterson asked her to organize the University's
first library in 1909. While organizing the library,
she continued as secretary to the president until she
was named the University's first librarian in 1912.</p>
<p>During her career as librarian of the University,
King continued her education. She performed some graduate
work at the University of Michigan, and in 1929, she
earned her Bachelor of Science in Librarianship from
Columbia University.</p>
<p>Some of King's professional activities included
the following: serving as the Kentucky Library Association
president from 1926 to 1927, serving as a trustee for
the Lexington Public Library for many years, and directing
the survey of Kentucky libraries from 1936 to 1938 for
the American Library Association's Survey of Research
Materials in Southern Libraries. King's development
of library methods courses eventually led to the establishment
of a Department of Library Science at the University
of Kentucky.</p>
<p>In 1948, after King had directed the University's
library for 39 years, the Board of Trustees voted to
name the library the Margaret I. King Library. This
was a special honor since the Board rarely names buildings
in honor of those still living.</p>
<p>Although King retired as librarian in 1949,
she continued to perform some work for the library at
the University of Kentucky. She died in Lexington on
April 13, 1966.</p>
</bioghist>
[Controlled Access Terms]
Use Library of Congress Subject
Headings or Keywords to describe the collection. List
most relevant keywords first.
<controlaccess>
<list type="simple">
<item><subject>King, Margaret I. (Margaret
Isadora), 1879-1966.</subject></item>
<item><subject source="lcsh">University
of Kentucky--Libraries.</Subject></item>
<item><subject source="lcsh">University
of Kentucky--History--20th century.</Subject></item>
<item><subject source="lcsh">Libraries--Kentucky.</Subject></item>
<item><subject source="lcsh">Library
administration.</Subject></item>
<item><subject source="lcsh">Libraries--History--20th
century.</Subject></item>
</controlaccess>
[Scope and Contents]
This area should be used to describe,
in brief, the content of the collection. Through the
use of organization and arrangement elements, this section
is also used to describe the structure of the finding
aid/collection, e.g. "Organized into the following
series:". Can be used to describe the filing sequence,
e.g. alphabetical, chronological. Organization is used
for a broad description of how the whole collection
was organized, and Arrangement is used as a narrower
description of the filding sequence.
<scopecontent>
<p>This collection consists of materials from
1893 to 1966 relating to Margaret I. King's personal
and professional activities and to the development and
growth of the library at the University of Kentucky.
Items in this collection document the history of the
University of Kentucky in general, and the history of
the University's library in particular. This collection
also includes materials relating to other libraries
in Kentucky and to the development of libraries in Kentucky.</p>
<p>The collection is divided into the following
three series: Alphabetical File Series, Artifact Series,
and Photograph Series.</p>
</scopecontent>
[Related Material]
This area should be used to reference
relationships to materials that are not contained in
the finding aid, such as an e-book at the same institution,
a reference to a book about the subject of the finding
aid or a url to a web site about the subject and/or
origination of the finding aid.
<relatedmaterial>
<p></p>
<p></p>
</relatedmaterial>
<arrangement>
<p>The Alphabetical File Series consists of folders
which are arranged in alphabetical order by topic. These
topics mostly relate to Margaret I. King's activities
and to the development of the library at the University
of Kentucky. The materials in these folders include
clippings, correspondence, presentations by King, and
programs from conferences and events.</p>
<p>Personal or biographical materials which relate
to Margaret I. King are listed under "King."
These materials include King's class notes, her memoirs
of President James K. Patterson, and her correspondence
with people such as Frances Jewell McVey and Glanville
Terrell.</p>
<p>Correspondence which relates to a specific
topic is included with that topic. Correspondence with
certain people, such as President Frank L. McVey, is
included under that person's name. Other correspondence
is included in "Correspondence - General."
The first folder of this general correspondence contains
letters from 1909 to 1918. The remaining general correspondence
folders contain letters from 1918 to 1950 and are arranged
in alphabetical order by the name of the organization
or person sending the letter.</p>
</arrangement>
[Container Description]
List and describe the series
and subseries that the collection has been organized
into. Use Series Titles in original finding aid or within
the collection as the headers. Note the box or boxes
which include that series. Add any informaiton beyond
the Series Title that will better direct a patron to
a group of cohesive materials.
<dsc type="combined">
<!--The following must be generated outside of the
EAD template generator-->
<dsc type="in-depth"><head>Container
List</head>
<c01 level="series">
<did>
<unittitle type="series">1st Series
Title, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p>Several topics in this series include: the
dedications of libraries at the University of Kentucky,
gifts to the library at the University of Kentucky,
plans for state-wide library development, and the activities
of the library and the university during World War I
and World War II. Other topics include the procedures
of various library departments, library orientation
for freshmen, and the business and social activities
of the library staff.</p>
<p>Several organizations which are represented
in this collection include the American Library Association,
the Kentucky Department of Library and Archives, the
Kentucky Library Association, and the Lexington Public
Library.</p>
</scopecontent>
<c02 level="subseries">
<did>
<unittitle type="subseries">1st Subseries
of 1st Series Title, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">1</container>
<unittitle>container contents, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">2</container>
<unittitle>container contents, <unitdate type="inclusive">dates</unitdate></unittitle></unittitle>
<dao show="new" actuate="user"
href="href for digitized item"><daodesc>[view
image]</daodesc>
</dao>
</did>
</c03>
</c02>
<c02 level="subseries">
<did><unittitle type="subseries">2nd
Subseries of 1st Series Title, <unitdate type="inclusive">dates
</unitdate></unittitle>
</did>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="box" type="folder">3</container>
<unittitle> container contents, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">4</container>
<unittitle>container contents, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
</c02>
</c01>
</dsc>
</archdesc>
</ead>
3.7 Encoding the EAD Finding Aid Container List
The web template generator for marking up EAD Finding
Aids currently does not facilitate markup for the "Container
List" portion of the EAD finding aid.
Because of the complexity and number of potential variants
in container lists, these examples taken from the sample
finding aid serve as a general recommendation. See the
EAD Tag Library and the EAD Application Guidelines for
more examples.
What is a Container List?
A "Container List" for an archival collection
is the detailed listing of the collection's contents
and their organization via storage container types.
A "Container List" in EAD is defined through
the <dsc>(Description of Subordinate Components)
element.
There are three approaches to the <dsc> element
in EAD. These approaches are defined through the use
of the <dsc> type attributes:
"analyticover"(A container list made up of
series descriptions only.),
"in-depth" (A container list containing hierarchical
description of the components of a collection including
box, folder, and other locations. This may also be a
"boxlist," "handlist," or "calendar."
For purposes of similar look and feel for users, "Container
List" is the recommended title.), and
"combined" (A mixed series description and
container list.)
The accepted practice for KYVL in regards to the <dsc>
element, is to have one <dsc> section for "combined".
The Component <c0x> Element
In order to create a container list, the use of the
Component element in EAD is essential. Container lists
are often comprised of nested series. Describing these
series with a nested structure is achieved through the
use of the <c0x> elements. Within the current
EAD standards, the Component element can be expanded
to 12 levels (<c012>). NOTE: When nesting occurs,
as in the example below, a <c0x> element cannot
be closed until all the sub-<c0x> elements have
been defined for the series. This is shown below in
the example when the <c01> and <c02> Component
tags are not closed(</c02></c01>) until
the last <c03> component element has been defined
for a series.
The <container> Element
The Container element is used to define the storage
medium for the archival items. The following choices
are offered with EAD;
carton
box
folder
reel
frame
oversize
reel-frame
volume
map-case
box-folder
page
folio
othertype
These units are specified through the type attribute
for the <container> tag and further indicated
by the use of the label attribute as in the example
below. Note: It is recommended that the <container
type="box"> element be repeated beside
each <container type="folder"> element.
It is also recommended that the <container> element
only be used for "item" level component elements..
Example:
<dsc type="combined">
<head>Container List</head>
<c01 level="series">
<did>
<unittitle type="series"> 1st Series
Title<unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p></p>
</scopecontent>
<c02 level="subseries">
<did>
<unittitle type="subseries">1st Subseries
of 1st Series Title <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p></p>
</scopecontent>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="box" type="folder">1</container>
<unittitle>container contents<unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
</c02>
</c01>
The <unittitle> Title of the Unit Element
The <unittitle> element gives a title to the
material being described at all levels of the finding
aid. In the container list, these titles can be specified
for specific items as well as for series and subseries
outlined. When specifying a series title or subseries
title, the type attribute is used to specify "series"
or "subseries".
Example:
<dsc type="combined">
<head>Container List</head>
<c01 level="series">
<did>
<unittitle type="series">1st Series
Title, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p></p>
</scopecontent>
<c02 level="subseries">
<did>
<unittitle type="subseries">1st Subseries
of 1st Series Title, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p></p>
</scopecontent>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">1</container>
<unittitle>container contents, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
</c02>
</c01>
The <unitdate> Date of the Unit Element
The <unitdate> element specifies a year, month,
or day of the described materials at all levels of the
finding aid. The date may be in the form of text or
numbers, and may consist of a single date or range of
dates. In the container list, it is recommended that
the <unitdate> element be used within the <unittitle>
element with the type attribute specifying "inclusive".
Example:
<dsc type="combined">
<head>Container List</head>
<c01 level="series">
<did>
<unittitle type="series"> 1st Series
Title, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p></p>
</scopecontent>
<c02 level="subseries">
<did>
<unittitle type="subseries">1st Subseries
of 1st Series Title, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
<scopecontent>
<p></p>
</scopecontent>
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">1</container>
<unittitle>container contents, <unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
</c02>
</c01>
Using the <dao> (Digital Archival Object)
Element
Once the material in an EAD finding aid has been described
down through the Container List, the material can then
be digitized and embedded into the Container List in
the form of thumbnails or icons or hot-links that link
to the digitized items. This is achieved with the <dao>
Digital Archival Object element. Note in the working
example below in the Series 1, Box 1, Folder 2. The
<dao> element is used with the href attribute.
This attribute simply gives the URL for the digitized
object.
Example:
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">2</container>
<unittitle>container contents, <unitdate type="inclusive">dates</unitdate></unittitle>
<dao show="new" actuate="user"
href="http://kdl.kyvl.org/images/kukav/1998ua003/0024.jpg">
<daodesc>View Image</daodesc>
</dao>
</did>
</c03>
Using the <extref> (External Reference) Element
(for web links)
The <extref> element is used in the EAD Container
List to link to another finding aid or other web address
and/or to link to a secondary application that handles
the navigation of a manuscript or other multi-page resource
in the archival collection. An example of using the
<extref> element is shown below.
Example:
<c03 level="item">
<did>
<container label="box" type="box">1</container>
<container label="folder" type="folder">2</container>
<unittitle>
<extref href="http://www.website.edu">container
contents</extref>
<unitdate type="inclusive">dates</unitdate></unittitle>
</did>
</c03>
Database Approach to EAD Container Lists
An alternative approach to constructing EAD Container
Lists is to use a database to record the information
using content fields that can then be encoded using
EAD elements. The actual encoding is automatically generated
through database output in the form of a report or a
delimited ASCII text file.
The Central site has successfully converted both FileMakerPro
and Access databases to EAD container lists and is happy
to assist project sites developing this approach to
encoding the container lists.
3.8 TEI (Text Encoding Initiative)
The Kentuckiana Digital Library Production Center has
developed automated markup applications that facilitate
the production of digital page image archives with page
scanning from originals or microfilm and high-level
OCR (Optical Character Recognition) text underlying
the page images for full-text searching. The XML markup
utilized is the Text Encoding Initiative's TEI.2 or
TEI-Lite document type definition, created to deal with
a wide variety of textual formats including books, journals,
poetry and original manuscripts.
With any full-text encoding project best practice is
often dependent upon the format and structure of the
specific material to be encoded. Best practice guidelines
for the TEI markup language are available in the TEI
Text Encoding in Libraries Draft Guidelines for Best
Encoding Practices Version 1.0 (July 30, 1999). Prepared
by LeeEllen Friedland, Library of Congress. Nancy Kushigian,
University of California, Davis. Christina Powell, University
of Michigan. David Seaman, University of Virginia. Natalia
Smith, University of North Carolina at Chapel Hill.
Perry Willett, Indiana University. This guide establishes
recommendations for encoding with TEI-Lite and defines
5 levels of encoding based upon the proposed use of
the encoded text and the amount of time and funding
available for a given project.
Encoding levels 1-2 can be encoded via automated processes,
levels 1-4 require no expert knowledge of content. Level
5, in contrast, requires scholarly analysis. Levels
1-4 allow the conversion and encoding of texts to be
performed without the assistance of content experts
and can be enriched with more markup at any time. Recommendations
for Levels 1-4 are intended for projects wishing to
create encoded electronic text with structural markup,
but minimal semantic or content markup.
4 Item Level Metadata:
An Introduction top
of page>
Metadata is commonly referred to as "data about data".
More specifically, metadata is the structured description
of an object or collection of objects through the use
of specific data elements. Another common description
for metadata is "cataloging data". In the
creation of digital libraries, effective metadata is
essential for the presentation, discovery and navigation
of digital archival objects.
By allowing for a structured description, metadata
offers searchable access points (title, author, subject
terms) for discovery systems as well as parsable data
for applications developed for the migration and/or
reformatting of data. This is an important consideration.
Metadata is used in multiple contexts with current and
expected future usage and is often reformatted for presentation
in a variety of formats currently ranging from database
structures such as OAI (Open
Archives Initiative) to tagged XML document headers.
Traditionally, the creation of metadata has mainly
been in the form of MARC bibliographic records. Although
MARC is very robust and well defined as a standard,
the overhead and level of expertise needed to catalog
with MARC is a considerable stumbling block for many
digital library projects whose participants do not have
significant if any MARC cataloging experience. In order
to provide a more streamlined approach to describing
digital resources, the Dublin Core Element Set including
15 elements to describe networked resources has emerged
as the likely candidate for adoption as a preferred
standard used in describing digital library resources.
4.1 The Dublin Core: An Item Level Metadata Element
Set
The Dublin Core metadata element set grew from efforts
spearheaded by OCLC(Online Computer Library Center Inc.)
in 1995. Its focus is centered on one basic set of metadata
elements selected and refined by a group of experts
from the national and international library and information
science communities. Specifically, the Dublin Core metadata
element set is comprised of the following fifteen elements
used to describe digital networked resources.
The 15 Dublin Core elements are categorized into the
following three groups:
Content: Title, Subject, Description, Source,
Language, Relation (to another resource), Coverage (spatial
or temporal characteristics of intellectual content)
Intellectual Property: Creator, Publisher, Contributor,
Rights
Instantiation: Date, Type (such as archival
finding aid, electronic text, etc.), Format (of data,
to identify software and hardware required for use),
Identifier (URL).
4.2 Why Use the Dublin Core?
The Dublin Core Element Set is intended to allow metadata
implementors to strike a balance between ease of implementation
and the production of metadata records that facilitate
effective resource discovery. The Dublin Core's simplicity
allows implementation by non-catalogers. At the same
time, the Dublin Core is extensible, allowing for the
incorporation of more sophisticated description standards.
By establishing and promoting a common, easily understood
core set of elements to describe digital networked resources,
as the Dublin Core is adopted more widely it will facilitate
searching across discipline boundaries.
The Dublin Core Element Set can be converted to MARC
and other common bibliographic record formats.
Most importantly, with an international developmental
scope including active participation and support in
over 20 countries in North America, Europe, Australia,
and Asia, the Dublin Core has been established as the
primary candidate for the establishment of a formal
standard for describing digital networked resources.
4.3 Overall Guidelines for Data Entry
Punctuation
Unless the resource includes punctuation or the element
definition includes specific guidelines for punctuation,
don't put it in.
Symbols and Abbreviations
Do not use symbols to abbreviate unless they are taken
from the source or the Element Definition specifies.
For example, for an uncertain date for either the Date
Digital or Date Original element, the '?' symbol is
used. Use abbreviations if they are taken from the source,
or are accepted as common and easily understood.
Capitalization
Taken from the source or specified through the use of
general grammatical standards. Exclude initial articles.
Only acronyms warrant the use of all caps.
Keywords, Topics and Subject Terms
The Dublin Core encourages combining both subject terms
taken from a controlled vocabulary such as the Library
of Congress Subject Headings, and keywords assigned
by the record creator. Keywords also include the KYVL
Kentuckiana Topics list designed to allow broad topic
access to the digital collection. List most specific
terms first and broader terms last.
Questions to Ask When Entering a Record
- Are you entering information into the record for
an entire collection or for an individual item?
- Is the information comprising the record useful
for resource discovery?
- Is the content of the element known with certainty
or readily available from existing databases or information
sources? If not, can you provide an educated, informed
guess that will not be misleading?
- If you are emphasizing the attributes of the original
object (not the digital surrogate) in the record,
have you included this information in the correct
element fields? Have you included meaningful information
about the digital surrogate in the appropriate element
fields?
4.4 Digital Library Metadata Dictionary
The Official Dublin Core Metadata Element Set Guidelines
specify that all of the 15 metadata elements are optional.
However, for the sake of effective resource discovery,
the KYVL specifies optional, recommended, and required
elements. The following record structure lists 9 required
Dublin Core metadata elements, 4 recommended Dublin
Core metadata elements, 2 optional Dublin Core metadata
elements.
The following specifications serve as guidelines for
creating item level metadata for digital archival objects
withing EAD finding aids. Definitions listed for the
Dublin Core elements are borrowed from the Official
Dublin Core Guidelines available on the Web at: http://purl.org/dc/documents/wd-guide-current.htm
4.5 Elements
Required Item Level Dublin Core Elements for EAD
Items
NOTE: Elements marked with a "*" are those elements that are inherant within the broader sections of an EAD instance. Therefore, these elements do not need to be input at the item level.
Element: Title
Required: Yes
MARC: 245|a
Use more than Once: When more that one title exists.
For example, with primary and secondary titles or in
the case of variant titles. If resource discovery is
enhanced by the addition of an alternate title or titles
supplied by the source.
Scheme: Free Text
Definition: A name given to the resource, usually by
the Creator or Publisher. Typically, a title will be
a name by which the resource is formally known.
Use Guidelines:
1. Input titles and subtitles using the punctuation
appearing on the source.
2. For items or collections not given a title or name,
select a descriptive term or short phrase. Such a term
or short phrase may be derived from other descriptive
fields within the record.
3. Use "untitled" as an entry only when the
resource was deliberately given this as its formal title.
Element: Date
Required: Yes
MARC: 260|c
Use more than Once: Yes
Scheme: ISO8601
Modifier: Digital
Definition: The date when the digital version of the
resource was created. Recommended best practice for
encoding the date value is defined in a profile of ISO
8601 [W3CDTF] and follows the YYYY-MM-DD format. In
this case, the creation date of digital resource
Use Guidelines:
1.Date format: YYYY-MM-DD as defined in ISO 8601, http://www.w3.org/TR/NOTE-datetime.
2.Use a dash '-' in order to separate dates.
3.Use a question mark '?' before the date if the date
is not definite. Use a 'ca' before the date to indicate
an approximation.
Element: Identifier
Required: Yes
MARC: 856, 020, 022...
Use more than Once: Yes
Scheme: URL, URN
Definition: A string or number used to uniquely identify
the resource in a given context. Examples for networked
resources include URLs and URNs (when implemented).
Recommended best practice is to identify the resource
by means of a string or number conforming to a formal
identification system, in this case a Uniform Resource
Locator (URL) or Persistent Uniform Resource Locator
(PURL). This field is hotlinked in the Kentuckiana Digital
Library Database record.
Use Guidelines: Enter to URL or PURL for the digital
object in this field.
Element: Source *
Required: Yes
MARC: 534|n
Use more than Once: Yes, but not recommended.
Scheme: Free Text, Accession No., Control No., Call
No., ISBN, ISSN, FPI
Definition: A Reference to a resource from which the
present resource is derived. While it is generally recommended
that elements contain information about the present
resource only, this element contains metadata for the
second resource when it is considered important for
discovery of the present resource. Source is not applicable
if the present resource is in its original form. The
present resource may be derived from the Source resource
in whole or in part. Recommended best practice is to
reference the resource by means of a string or number
conforming to a formal identification system. Use Guidelines:
1. Include local call number, local control number,
or accession number, etc. 2. The Description field is
used for other information describing the Source resource.
Element: Publisher *
Required: Yes
MARC: 260|b
Use more than Once: Yes
Scheme: Free Text Modifier: Personal or Corporate Name
Definition: The entity responsible for making the resource
available in its present form, such as a publishing
house, a university, or a corporate entity. Use Guidelines:
1. List multiple publishers in separate fields 2. Use
Publisher element to indicate the appropriate KCVL institution.
Element: Language *
Required: Required
MARC: 546|a
Use more than Once: Yes
Scheme: ISO639
Definition: A language of the intellectual content of
the resource. Use Guidelines: 1. The content of the
Language field should be taken from RFC 1766 [RFC1766]
also known as the ISO 639 standard [ISO639] which includes
a two-letter Language Code.
Element: Rights *
Required: Yes
MARC: 506|a
Use more than Once: Yes
Scheme: Free Text
Definition: An identifier that links to a rights management
statement, or an identifier that links to a service
providing information about rights management for the
resource. Information about rights held in and over
the resource. Rights information often encompasses Intellectual
Property Rights (IPR), Copyright, and various Property
Rights. This field is hotlinked in the Kentuckiana Digital
Library Database. Use Guidelines: 1. Establish a generic
textual statement describing the rights management statement
for your digital resources on the Internet. If restrictions
exist, supply an alternate URL indicating how to contact
the appropriate library faculty for specifics on using
the resource.
Element: Resource Type
Required: Yes
MARC: 655|a
Use more than Once: Yes
Scheme: Text, Image, Sound, Dataset
Definition: The nature or genre of the content of the
resource. Comment: Type includes terms describing general
categories, functions, genres, or aggregation levels
for content. To describe the physical or digital manifestation
of the resource, use the Format element. Use Guidelines:
1. Select appropriate type. Options: Audio File,
Electronic Text, Photograph, Video File
Element: Format
Required: Yes
MARC: 856
Use more than Once: Yes
Scheme: Free Text
Definition: The physical or digital manifestation of
the resource. Comment: Typically, Format may include
the media-type or dimensions of the resource. Format
may be used to determine the software, hardware or other
equipment needed to display or operate the resource.
Examples of dimensions include size and duration. Recommended
best practice is to select a value from a controlled
vocabulary.
Recommended Item Level Dublin Core Elements
Element: Creator
Required: Yes
MARC: 1xx, 7xx
Use more than Once: When more that one Creator exists
and the inclusion of the additional Creator(s) enhances
resource discovery.
Scheme: Free Text
Modifier: Personal or Corporate Name
Definition: The person or organization primarily responsible
for creating the intellectual content of the resource.
For example, authors in the case of written documents,
artists, photographers, or illustrators in the case
of visual resources. Examples of a Creator include a
person, an organization, or a service.
Use Guidelines:
1. Creators should be listed separately in the same
order that they appear on the source.
2. Personal names should be listed surname or family
name first, forename or given name, middle name or initial,
suffix, prefix.. When in doubt, give the name just as
it appears on the source. Add known birth and death
dates.
3. Use full corporate names. The entry element is the
full name of the business or organization excluding
initial articles.
Examples:
PERSONAL NAMES
CORPORATE NAMES
King, Margaret Isadora, 1979-1949
Fitzgerald, F. Scott (Francis Scott), 1896-1940
Hemingway, Ernest, 1899-1961
Berry, Wendell, 1934-
Burton, Alonzo Carroll
Digital Imaging, Inc.
Kentucky Art Museum
Warner Brothers Company
National Resource Center for Family Services
Colonization Society of Kentucky
Element: Subject
Required: Recommended
MARC: 6xx
Use more than Once: Yes
Scheme: Keyword, LCSH (Library of Congress Subject Headings),
TGM I (Thesaurus for Graphic Materials I), TGM II (Thesaurus
for Graphic Materials)
Definition: The topic of the content of the resource.
Typically, a Subject will be expressed as keywords,
key phrases or formal classification subject terms that
describe a topic of the resource. Recommended best practice
is to select a value from LCSH and/or TGM in addition
to selected keywords.
Use Guidelines:
1. Keywords and Subjects may come from other Dublin
Core fields defined for the resource.
2. Enter person or organization for Subject as outlined
under Creator element.
3. Try to be as specific as possible with the underlying
focus on aiding resource discovery. If using a keyword,
use the most significant or unique words first, with
more general words for broad description used as necessary.
Use terms found on or about the item.
4. Use subject strings if able. This is strongly encouraged.
They may be taken from an alternate record that already
exists( for instance, borrowing them from an existing
MARC record or creating them), or they can be created
for the Dublin Core record.
Element: Coverage
Required: Optional
MARC: 654|a
Use more than Once: Yes
Qualifier: Spatial or Temporal
Definition: The spatial or temporal characteristics
of the intellectual content of the resource. Spatial
coverage refers to a physical region (e.g., celestial
sector) using place names or coordinates (e.g., longitude
and latitude). Temporal coverage refers to what the
resource is about rather than when it was created or
made available (the latter belonging in the Date element).
Temporal coverage is typically specified using named
time periods (e.g., Neolithic) or the same date/time
format as recommended for the Date element.
Use Guidelines:
1. Select terms from a subject heading list or thesaurus
to identify place names (i.e., Getty Thesaurus of Geographical
Names, Library
of Congress subject Headings, etc.)
2. Use freetext to input B.C.E dates.
3. Enter range of dates on the same line and use a dash
(-) to separate dates.
4. Some time periods are not adequately described using
a date format, such as Jurassic Period or the Dark Ages.
In this case, give the text form of the time period
(i.e. Jurassic Period.) Select terms from a subject
heading list or thesauri to identify these time periods
(i.e. Library of Congress Subject Headings).
5. If date is uncertain use question mark (?) following
the date to indicate it is an approximate date. If the
date is estimated, use "ca" prior to the date
to indicate estimation.
6. It is important to make the distinction between temporal
Coverage, source Date, and Date. For example, the temporal
coverage of a photograph of an art object is the date
of the art object, and the date of the photograph is
the Date Original. The date the photograph was digitized
is the information entered into the Date Digital element.
Element: Description
Required: Recommended
MARC: 5xx
Use more than Once: Yes
Scheme: Free Text
Modifier: Abstract, Free Text
Definition: A textual description of the content of
the resource, including abstracts in the case of document-like
objects or content descriptions in the case of visual
resources. Description may include but is not limited
to: an abstract, table of contents, reference to a graphical
representation of content or a free-text account of
the content. Can also include information describing
related resources and information describing the source.
Use Guidelines:
1. Enter natural language descriptive text, remarks,
and comments about the object taken from the item, or
provided by the record creator.
2. Be as brief as possible, but not at the expense of
a rich description for resource discovery. A few sentences
or paragraphs is a good average.
3. Include additional information such as measurements
of a depicted object, description, provenance, etc.
as long as this information is not included in other
elements.
Element: Contributor
Required: Optional
MARC: 1xx, 7xx
Use more than Once: Yes
Qualifier: Personal or Corporate Name
Definition: A person or organization not specified in
a Creator element who has made significant intellectual
contributions to the resource but whose contribution
is secondary to any person or organization specified
in a Creator element (for example, editor, transcriber,
encoder, findaid preparer and illustrator). Use Guidelines:
1. Enter personal and corporate names in same format
as Creator element. 2. Do not specify role (e.g., editor,
translator, etc.) of the Contributor. Use the Description
field to tie this information together within the record.
Element: Relation
Required: Recommended when a relationship exists. MARC:
787|n, 787|o
Use more than Once: Yes
Modifier: Yes.
Scheme: URL URN
Definition: An identifier of a second resource that
holds a specific relationship to the present resource.
This element permits links between related resources
and resource descriptions to be indicated. Examples
include an edition of a work (IsVersionOf), a translation
of a work (IsBasedOn), and an item from an archival
collection with finding aid (IsPartOf). Recommended
best practice is to reference the resource by means
of a string or number conforming to a formal identification
system. This field is hotlinked in the Kentuckiana Digital
Library Database. Use Guidelines: 1. For relation to
an existing finding aid or other resource, place the
identifier for the finding aid in the relation identifier
field. Include a description of the relation(s) in this
field.
4.6 Creating Item Level Metadata
The following directions are intended for use by developers
locally digitizing material for inclusion in the KYVL
Kentuckiana Digital Library. These directions are used
in the central digitization center's workflow and are
intended to facilitate fast and accurate creation of
records by allowing project developers to gather metadata
in a straight-forward and efficient manner. Once the
metadata has been created, the local spreadsheet file
should be delivered to the KYVL Kentuckiana Digital
Library manager for processing.
Simple Spreadsheet Approach
Required Software: Microsoft Excel
The simple spreadsheet approach to gathering metadata
for Kentuckiana Digital Library projects is meant to
capture all the essential data that is unique to specific
individual items. Data that is consistant for an entire
range of records, is added later, automatically, as
the data is re-formed into XML EAD (Encoded Archival
Description) finding aid Container List items. Examples
of consistant data elements that can be added later
and/or those that find their place in earlier sections
of an EAD Finding Aid, are the Rights, Language and
Publisher elements.
Basic Spreadsheet Layout

- Rows represent item records
- A separate spreadsheet should be used for each project.
A project should be designated by a unique collection/accession
number, so that each separate spreadsheet holds data
related to only one collection/accession number.
- The identifier field should specify the unique id
for the digital object. This will also be used as
the file name for the digital object itself. When
re-forming the data, this information is used to build
the URL for the digital resource. This also allows
digital conversion to go on separate from the metadata
creation, with the two products coming together seemlessly
in the end result.
- The identifier in the database records should be
no more than 8 characters.
- Example: digital photographs from a collection would
be listed as accession + sequential number; 64m1.0001,
64m1.0002, ... When the photographs are scanned, the
resulting files should be stored simply as sequential
numbers inside a directory that specifies the accesion
number, so on the server, in the above example, the
files would be in /images/kukwf/64m1/0001.jpg, 0002.jpg,
...
- Additional Information: Along with each spreadsheet
file, the central site needs to know the following
information conerning the data; Publisher,:Collection
Title, Collection Number.
- How to Handle Multiple Subjects: Always create the
subject as the last cell in a row. Place multiple
subjects in the same row, in the cells extending to
the right. ...
- How to Handle other Multiple Fields: Most of the
fields in Dublin Core can be repeated. For example,
you may have more than one creator for a resource.
However, aside from the Subject field, most of the
other Dublin Core elements will only need to be repeated
occasionally. To compensate for this, if needed, simply
create a new column, or however many are required,
to the right of the field that needs to be repeated.
Label the new collumns with Dublin Core element name
+ number; Creator2, Creator3, etc.
- Series and Subseries Columns: When a new series
and subseries begins, a new row should be created.
As shown in the example graphic above, the new row
holding the series and subseries references should
not contain any other information.
The central production center has also converted various database formats to EAD. For an example of this type of approach, please take a look at the
Conversion of Microsoft Access Databases into EAD-Encoded Finding Aids
document from the UC Berkeley site.
5 Digital Library
Imaging: An Introduction top
of page>
Digital imaging is a field of study that includes digital
photography, scanning, composition and manipulation
of digital images. What is a digital image? It is binary
code defining the digital representation of an actual
image or item such as a photograph or book page. The
binary code (Computer code represented by a series
of bits, the smallest unit of computer data, indicated
by a 1 or a 0.) defines tiny segments of the digital
image called pixels. These pixels are assigned color
characteristics within a given color space, and aligned
on a grid of columns and rows that can be viewed on
a computer monitor or printed onto paper using a computer
printer. Through the practice of image capture, the
number of pixels created for a given area of a scanned
item as well as the number of colors or shades of gray
that are used to define the color characteristics of
each pixel are specified. Image capture is then achieved
through the use of a scanning device such as a flat
bed scanner or digital camera. The scanning device reflects
bright light off of or through the item and tiny light-sensitive
sensors called diodes detect the degree of presence
or absence of light created. The scanning device then
converts these light intensity readings to binary code
for each individual pixel comprising a digital image.
Once the scanning device has completed this process
and constructed a digital image, it can be stored as
a computer file and manipulated through the use of digital
image editing software.
In relation to digital libraries, digital imaging plays
a central role in the creation of digital archival objects.
Unfortunately, at this time there is no single set of
guidelines or accepted standards for determining the
level of image quality required in the creation of digital-image
databases. Also, as with other aspects of digital library
production, dealing with technology that is in a constant
state of flux presents unique challenges and pitfalls.
It is therefore important to define desired quality
guidelines based on current and expected use. These
guidelines should be established and followed to produce
consistent and expected results.
What is the scanned item's expected use? Should the
image be available in a printable format? Does the expected
use warrant a high resolution on-screen version? Is
the item being scanned for long term use? The answers
to most of these questions can differ from one item
to another. Current technology standards may dictate
answers, especially in the case of printable formats.
Monetary budgets may also dictate answers. However,
one question that can be answered universally at this
point is whether or not the digital items for the Kentuckiana
Digital Library are being scanned for long term use.
A digital library is in the business of establishing
sustainable access to collected resources and since
a large part of the cost in building a digital library
is associated with the initial digital imaging work,
it's best to approach this activity with long term usability
in mind.
5.1 Archival Imaging
It is often a confusing point to establish what is
really meant when the term "archival" is used
in the context of digital imaging for libraries. Although
best efforts are made to represent the original in a
digital format with visual characteristics as close
to the original as possible, the current state of digital
imaging technology is not capable of faithfully duplicating
original material. The term "archival" when
used in relation to digital imaging does not refer to
the creation of an exact digital replica of the original.
The appropriate use of this term in this context is
that it is the practice of digital imaging for current
use and long-term viability of the digital images.
5.2 "Master" and "Deliverable"
Images
One of the most important concepts to consider when
utilizing these guidelines is the concept of "master"
and "deliverable" image files. The "master"
image is captured for off line storage, scanned at hi
resolution, and saved with the TIFF lossless image compression
or uncompressed format. This will be a large file in
terms of size and is preferably stored on CD ROM or
large hard disk with backup.
Alternatively, "deliverable" images sometimes
called "derivatives" are lower quality and
derived from the "master" image typically
through the use of a batch image processing software
application. "Deliverable" images are saved
with a lossy compression scheme to achieve acceptable
files sizes for current network access within the digital
library infrastructure. For effective levels of access,
several "deliverable" versions of a "master"
image may be required.
5.3 Implementation Overview
When establishing guidelines for imaging, the Kentuckiana
Digital Library worked with an imaging consultant and
sought to find a reasonable consensus among the multitude
of imaging practices adopted by other digital library
projects to provide a succinct and clear best practice
for our digital imaging efforts. In doing so, the following
overall implementation guidelines were identified.
- Adopt practice of creating high resolution "master"
image files used to produce lower resolution "deliverable"
image files for standard network delivery to the public.
- Calibrate scanning equipment before beginning any
scanning project using standard photographic targets.
Make only minimal changes to a scanned object to be
saved as a "master" image file. It is best
to avoid having to make any adjustments. This allows
for a more streamlined workflow in terms of guidelines
for student scanning technicians and more importantly,
lends consistency to the collection of images. This
consistency facilitates recording administrative metadata
for images as well as scripting of batch processing
software to produce "deliverable" images.
- Capture "master" files using 24-bit color
rather than 8-bit gray scale when there is any color
information in the original documents. When in doubt,
use 24-bit color.
- Most if not all of the "deliverable" images
will end up on the Web. Therefore, except for hi-resolution
versions of the images, these files should have a
target file size limit of 200K, compressed to avoid
especially slow performance.
- Establish specific minimum standards for imaging
and follow them. These not only include specific resolutions
for various types of material, but also include specific
screen size requirements for effective access and
use of "deliverable" digital images.
- Always run test scans for quality before moving
into full production for a scanning project.
- Save an ASCII text file for each scanned batch describing
the image capture procedures utilized. This file should
list bit depth, color space, dpi and file type. Store
this file with the "master" images.
- When expected use warrants it, create multiple "deliverable"
images for access.
5.4 Important Concepts
Understanding the following concepts is essential to
establishing proper scanning practice for a digital
library project.
Resolution (dots per inch)
DPI = Dots per inch = units used to measure the resolution.
- Spatial Resolution
By definition, spatial resolution is used to describe
what a printer can print, a scanner can scan, and
a monitor can display. In printers and scanners, resolution
is measured in dots per inch (dpi)--the number of
pixels a device can fit in an inch of space. The physical
resolution at which a device can capture an image.
The term is used most frequently in reference to optical
scanners and digital cameras.
- Interpolated Resolution
This term indicates the resolution that the device
can yield through interpolation -- the process of
generating intermediate values based on known values.
For example, most scanners offer an optical resolution
of 300 dpi, but an interpolated resolution of up to
4,800 dpi. This means that the scanner can actually
capture 90,000 pixels per square inch. Then, based
on the values of these pixels, it can add 15 additional
pixels in-between each pair of known values to yield
an interpolated higher resolution.
The finer the detail, the higher the resolution required
to capture faithfully. This is true due to the fact
that the higher the dpi, the more information recorded
in the file. Higher resolution facilitates the ability
to enlarge a detail in the image.
Color/Bit Depth
Color depth, also referred to as bit depth, measures
the number of bits of color data which are stored for
each pixel; the greater the bit depth, the greater the
number of gray scale or color tones that can be represented
and the larger the file size. Common bit depths are:
1 bit bitonal (black and white) is a usable color depth
for select textual information where the original is
clean and free of defects that will effect the quality
of the scan. 1 bit bitonal is also used in scanning
textual information from microfilm. It is recommended
that project managers wishing to use 1 bit color depth
run sample scans to determine acceptable quality. Also,
consider that a "master" image can be captured
at 8-bit gray scale or 24-bit color and then converted
to a 1-bit bitonal "delivery" copy.
8 bit color depth is not suitable for digital masters
and is not recommended for use.
8 bit gray scale is used for select items, generally
black and white photographs that have no color characteristics,
and microfilm. If there are color characteristics, even
apparent from the age and natural deterioration of the
photograph or other material, 24-bit color is recommended
in order to capture the image as true to the original
as possible.
24-bit color is recommended whenever there is color
information in the original item. Although the "master"
scan will be larger in file size vs. other color depths
such as 8 bit gray scale, with JPEG compression, 24-bit
"deliverable" color images usually are no
larger than the JPEG gray scale "deliverable"
image.
Color Space
Color Space defines the palette of colors used to create
the color of each pixel in a digital image. For screen
images, the RBG(red, blue, green) color space is used.
With a 24-bit color depth utilizing an RBG color space,
2 to the power of 24 or more than 16 million unique
colors are possible. Each of these colors is a result
of combining the colors red, blue and green.
It is best to communicate only one color space to the
end user to facilitate optimal rendering of all images
across all platforms and devices. Since the majority
of access to image files will be via a monitor using
an RBG color space, using RBG as a default color space
is recommended. Images in RBG will display reasonably
well even on uncalibrated monitors.5
Alternately, printers use the CMYK (cyan, magenta,
yellow, and black) color space model that is based on
the absorbing quality of ink printed on paper.
File Compression
In order to serve digital image files over a network
such as the World Wide Web, they must be compressed
in terms of their file size so that acceptable download
time for users can be achieved as well as acceptable
file sizes for the given storage space. The lower the
file size, the better. However, a bench mark of 200K
is recommended as a file size limit for non-hi resolution
"deliverable" images. The following file formats
are listed and described in terms of their use.
GIF(Graphics Interchange Format): This compression
format is only recommended for use in creating thumbnails
and 1 bit bitonal(Black & White) images. For additional
image types, JPEG provides superior compression.
JPEG(Joint Photographic Experts Group): Best of the
compression file formats. JPEG is recommended for use
in creating medium and hi resolution images for Web
delivery.
PDF(Portable Document Format): This is a format created
by the Adobe Software Corporation. It is a compressed
standard for digital image, most of textual information.
PDF files require users to have the Adobe Acrobat Reader
software installed as a web browser helper application
on their computer. This software is free and easy to
install. The benefit with the PDF format is the users
ability to re-size and print individual pages.
5.5 Best Practice for Image Capture and Formatting
In the production of "master" image files,
the intent is to produce a high quality image that will
serve as the source file for the production of present
and future "deliverable" image files. The
thought here is, the higher the resolution, the longer
the half life of the "master" image in terms
of it's usefulness. There is a degree of consensus to
establish 600dpi as the preferred resolution level for
capture of any document size and type. The justification
for this is that 600dpi is sufficient to capture extremely
small text legibly and can produce a high-quality publication
at double life-size.2 The drawback here is that capture
time is longer and file sizes become very large and
difficult to handle without a well equipped PC. Due
to this fact, 600dpi is recommended for KDL only where
the required hardware, software and human resources
are available or the material to be scanned requires
600dpi for the effective capture of fine detail.
Due to the wide range of material types to be scanned
in archival collections and the feasibility of always
capturing at 600dpi, it is difficult to assign one capture
resolution to all document types and sizes. The KDL
has relied on the experience of our imaging consultant
and lessons learned in other projects to establish a
range of recommended resolutions for "master"
image files based on the size of the original material.
The recommended resolutions are specified by item size
and listed in the Resolution Coverage Table below. This
table has been modified from one devised by Howard Besser
for the California Digital Library. Although this table
will serve as the best approach for the majority of
imaging jobs, it is always useful to do a test scan
or two and check for quality, especially with pictorial
material. If you scan at 300dpi or 400dpi by default,
you may find that scanning pictorial material with a
particularly fine level of detail is done more effectively
at a higher resolution such as 600dpi.
The range of minimum, default, and high resolution
settings is offered to allow individual institutions
to find the resolution that works best for their local
level of technological and human resources. Best efforts
should be made to scan at the default resolution setting.
Scanning at less than the minimum is not considered
"archival", appropriate for long term sustainability.
5.6 Resolution Table for Master Images (8-bit grey
scale and 24-bit color)
| Long dimension of original (in inches) |
Minimum Resolution Setting (dpi) |
Default Resolution Setting (dpi) |
High Resolution Setting (dpi) |
| 11.5 (digital camera or flat bed scanner) |
300 |
400 |
600 |
| 15.5 (digital camera and/or large format flat
bed scanner) |
300 |
400 |
450 |
| 23 (digital camera) |
300 |
300 |
300 |
| 35 (digital camera) |
200 |
200 |
200 |
| 46 (digital camera) |
150 |
150 |
150 |
A rule to follow in scanning items, is to check
to make sure that the long pixel dimension is at least
3,000 pixels when scanned at the chosen resolution setting.
5.7 Image Sizing Table for Deliverable Images
| Thumbnail |
Access Image |
Hi-Resolution Image |
| 150-200 pixels across the long dimension |
640x480
800x600 |
1024x768
1280x1024
1200 pixels across the long dimension
1000 - 5000 pixels across the long dimension |
5.8 File Format Table for Master and Deliverable
Images
| Master Image |
Deliverable Thumbnail |
Deliverable Access Image |
Deliverable Hi-Resolution Image |
uncompressed TIFF or TIFF with lossless compression
( Process that reduces the storage space needed
for an image file without loss of data. If a digital
image that has undergone lossless compression is
decompressed, it will be identical to the digital
image before it was compressed. Document images
(i.e., in black and white, with a great deal of
white space) undergoing lossless compression can
often be reduced to one-tenth their original size;
continuous-tone images under lossless compression
can seldom be reduced to one-half or one-third their
original size.) |
JPEG,GIF |
JPEG, GIF |
JPEG, PDF |
5.9 Media Formats
A variety of original material can be found in archival
collections. Due to this fact, the first step in imaging
or scanning an item, is to decide how to approach the
object in relation to its physical format. The following
material types are outlined with general guidelines
for practice.
Textual Documents
The "master" image may be most appropriately
captured in 1-bit bitonal, 8-bit grey scale or 24-bit
color depending on the color characteristics of the
original. The use of a photographic target is recommended.
"Deliverable" images are compressed JPEG.
An alternate "deliverable" sometimes utilized
to save storage space is a 1-bit bitonal, web optimized
GIF.
Pictorial Items
Among archival material, pictorial items such as photographs
can present many challenges due to the fine level of
detail often present in the original. It is useful to
perform tests with photographs and other pictorial material.
Try to find the finest detail in the image and see if
this comes through effectively with current minimum
or default dpi settings. It is also recommended that
photographic targets be used in the scanning process
for pictorial items.
Maps and Oversized Records
Use 8-bit grey scale/24-bit color depth. 300 dpi is
sufficient for "master" image. These oversized
items are currently best served as "deliverables"
using Lizardtech's MrSID image format and server application(installed
on the digilib.kyvl.org machine).
Graphic Records and Materials
These can include line drawings and artistic illustrations.
Smaller graphic materials no greater than 8.5x11 in
size should be scanned in the same manner as Pictorial
Items. Larger graphic materials should be scanned in
the same manner as Maps and Oversized Records.
5.10 "Master" Image Quality
Image quality is a complex issue. There are many subjective
aspects to defining image quality when comparing a digital
image to the original. This makes an absolute assurance
method improbable. It is expected that as digital imaging
technology progresses, enhanced software will provide
methodology for precise image quality standards. However,
there are standard quality control issues that can currently
be addressed to establish a consistent and acceptable
level of image quality.
- Properly calibrate the computer monitor attached
to scanning device. The monitor's manual should provide
good information on this process. Also, the Denver
Public Library's Western History Photography Collection
Site provides an excellent online, step by step process
to calibrate your monitor. http://photoswest.org/calib.htm
- Properly calibrate scanning device using photographic
targets and device manuals.
- Select appropriate color depth based on color characteristics
of the original.
- Select appropriate resolution based on size of the
original as well as the level of fine detail apparent.
Perform test scans to assure that your "master"
image files are captured properly.
- Check for blur, moir?atterns, and color characteristics
compared to the original. When scanning photographs
with fine
- Moir? Patterns: Striped or checkered patterns
appearing in a scanned halftone (Printing process
by which images are rendered by hundreds of tiny
dots. Halftone images are commonly found in newspapers.)
image. Your scanner should allow for correction
at the time of scanning through the "descreening"
function. Retrospective fixes via image editing
software do not work well.
- Deskewing: Scanned item is not aligned correctly.
It is best to align the scanned item before scanning.
In some cases this may not be possible and the
image editing software will need to be used to
align the captured image.
- Cropping: The "master" image should
capture to the edges of the original material.
Cropping is used to cut out any white space around
the scanned document.details.
- Also check the test scans for proper resolution
setting. The quality of this original image has a
major impact on the quality of the "deliverable"
images derived from it.
- Save "master" images with lossless compression
format (TIFF).
Additionally, the following are important aspects
to consider when producing "master" images.
5.11 "Deliverable" Image Quality
Assuming that the "master" image was captured
properly, the quality of the derived "deliverable"
images is effected by resolution, size, and compression.
The key tradeoff in defining an appropriate level of
image quality for "deliverable" images is
the balancing of compressed file size and resulting
storage requirements with image quality.3
The following are important aspects to consider when
producing "deliverable" images.
Artifacts
Visual effects introduced into a digital image as a
result of image compression. These are seen as blurry,
wavy lines molded around details in the image. The higher
the compression level, the greater the artifactual effects.
Compression Level:
Image editing software allows for the setting the level
of compression. This is usually set on a scale from
1 to 10. For deliverable images, medium compression
levels 4, 5, or 6 are recommended.
Sharpening:
Producing "deliverable" images from a "master"
image involves re-sizing the image to an appropriate
scale for screen presentation. A side effect is a blurry
image. Using the sharpening function available in many
image editing software applications such as Photoshop
can compensate for the effect and produce more usable
images. The following is a general set of rules to follow
in using the unsharp mask function.
Amount: 100-200%
Radius: 1 to 2 pixels
Threshold: 2 to 8 levels
Levels of Access: For appropriate access to digital
images, more than one version of the image may need
to be offered. The following are recommended for the
Kentuckiana Digital Library.
Thumbnail Copy (contained in a database record, finding
aid container list, etc.)
Access Copy (medium resolution, fits onto computer
screen)
Optional Hi-Resolution Copy (enhanced access to level
of detail)
Optional PDF Copy (allow users to print a copy)
5.12 Administrative Image Metadata
The following information should be saved as an ASCII
file and stored with the "master" image files
it describes.
How? bit depth, color space, dpi
When? ISO 8601 [W3CDTF] follows the YYYY-MM-DD
Who?
5.13 Assigning File Names
It is important for the central site to have a consistent
method for naming files so that such a method can be
specified in outsourcing contracts and so that over
time file names across institutions will not conflict
and can be managed within the context of batch processing
for migration to future formats. The question of file
naming conventions was addressed by all three of our
consultants. Unfortunately, there are no set standards
for this. The best advice from our consultants in terms
of strategy was "be consistent" and preferably
include a unique identifier in the file name that relates
to the source. With this in mind, the Kentuckiana Digital
Library specifies the following file naming implementation
for individual institutions to adopt when creating files
for the Kentuckiana Digital Library.
In relation to naming files, the term "handle"
is often used to describe a unique identifier for a
specific resource. Generally, a handle is comprised
of two main parts, the first being a naming authority
name and the second being a string unique to that naming
authority.8 In the context of KYVL, the naming authority
part of the handle should specify a particular KYVL
institution such as Morehead State University. Instead
of specifying the naming authority in full, an abbreviated
reference is used. This reference is based on a global
naming authority, for our purpose, the institution's
OCLC Institution Code. This part of the handle will
be specified by the centralized file management structure
as in the following example and referred to as the GLOBAL
ID.
Deliverable image files held by the central site for
Kentucky State University will live under: http://digilib.kcvl.org/images/kys/
The second half of the handle is then supplied through
referencing a unique string within the specified naming
authority. This is based on an accession, control or
other unique numbering system used by the specified
naming authority and is referred to as the ITEM ID.
A third part of of the handle involves structural metadata
to specify the order in which a group of related digital
files are existing. This is given via sequential numbering
of the digital archival objects and can be specified
within an archival collection from start to finish,
or within specific groups of files such as individual
page images for a manuscript. For specific resources,
alternate structural metadata may also be required.
5.12 File Naming Implementation
- All directories and filenames will be in lower case.
- Filenames will be assigned following these conventions:
- Within a document, you must increment up 1 number
for each sequential page [no random numbers or
names]
- Recommended Basic Structure Naming File Directories
and Digital Archival Object Files
unique institution code(OCLC Institution Code)
+ / + accession # OR
unique institution code(OCLC Institution Code)
+ / + accession # + / + item # in sequence
- Digital Archival Objects with EAD
OCLC Institution Code + / + accession # + / +
item # in sequence
- Example: The second photograph in the Doris
Ulmann Photograph Collection (96PA104) where there
are 154 frames total: kuk/96pa104/002.ext
- Example: Third slide of 506 in the "Turtles"
series of Barbour(Roger W.) Photograph Collection
where there are a total of 8 series with "Turtles"
being the 2nd, and a total of 1389 slides in the
entire collection: kmm/92bp/0506.ext
- Note: When producing digital image files, alternate
versions of the image files for thumbnails, pdf
and hi-resolution should be given the same file
name but placed into the following sub-directories.
thumbs
hi-res
pdf
- EAD Finding Aids: OCLC Institution Code + /
+ accession#
5.13 OCLC Institution Codes for KYVL Institutions
These codes are used for institution naming authority
within our file naming structure.
Ashland Community College: KUA
Boone County Public Library: KYB
Centre College: KCC
Eastern Kentucky University: KEU
Filson Club Historical Society: KTN
Georgetown College: KGG
Kentucky Department for Libraries and Archives: KSL
Kentucky Historical Society: KNU
Kentucky State University KSU
Lexington Community College: KUT
Lexington Public Library: KYL
Louisville Free Public Library: KLP
Morehead State University: KMM
Northern Kentucky University: KHN
Southeast Community College: KUS
Transylvania University: KTU
Union College: KUC
University of Kentucky: KUK
University of Louisville: KLG
Western Kentucky University: KNV
5.14 Safe Handling of Archival Material
An important part of the digital imaging process that
cannot be overlooked, is the proper approach to handling
archival material in relation to its digital capture.
The following minimum guidelines are recommended to
preserve the integrity of the source material used during
the digital image capture process. More complete guidelines
for study are available from the Library of Congress's
in-house course handouts entitled 'Criteria for Selecting
Items for Conservation Treatment before Digital Scanning'
and 'Care and Handling of Library Materials for Digital
Scanning'.
Minimum Guidelines
- No food or drink in work space.
- Wash your hands before handling materials.
- Clear work area before working with originals.
- Keep sharp items and pens/markers away from original
material.
- When original material is not being scanned, it
should be covered and stored in a secure place.
- Wear cotton gloves to prevent transfer of skin oils
to original material.
- Pages of a book should be turned carefully. Conservators
recommend lifting the upper corner of the page, then
using one's whole hand to support the page as it's
turned.
- For books, do not open at an angle greater than
120 degrees. This entails the use of an overhead scanner
with book cradle or a book edge scanner.
- Flat paper items that are oversize should be placed
between two rigid boards to be flipped over. Turning
a large item over should never be attempted without
additional support. This operation often requires
two people.
- To pick up a single sheet, paper original, use a
corner of paper inserted under the edge.
- Keep materials in order to minimize handling.
- Do not flex item when turning it over.
- Unfold items carefully. Do not unfold items unless
they have been identified as non-brittle.
- Brittle materials may need a polyester support such
as Mylar in order to be handled safely and scanned
at all.
- Repair of damaged originals should be done before
digital capture by a conservator or other trained
professional.
NOTE: For additional reading,
visit the Cornell Online Tutorial "Digital Imaging: Moving Theory
into Practice" online at:: http://www.library.cornell.edu/preservation/tutorial/contents.html
Last Update: May 2004
|