White Paper: Usefulness of OAI-Enabled Search Portal
to Cultural Heritage Material for K-12 Educators

by Sarah L. Shreeves and Christine Kirkham

Introduction
Research Question and Literature Review
Methodology
Results
Conclusions
References
Appendix 1: Classroom Assignment
Appendix 2: Interview Guide for Focus Groups

 

Introduction

The Illinois project sought to demonstrate the viability of the search and retrieval of aggregated metadata harvested using the Open Archives Initiative Protocol for Metadata Harvesting (OAI PMH). It also attempted to document the benefits of this approach to harvesting and to examine usage patterns. This paper documents an effort to understand how such a portal to aggregated metadata might be used by a specific group of users.

The Illinois repository is accessed through a search portal called the UIUC Digital Gateway to Cultural Heritage Materials[1]. The repository contains approximately 1.1 million original metadata records. The portal uses the XPAT indexing and search tools developed by the Digital Library Extension Service (DLXS) at the University of Michigan. As of December 2002, we had collected metadata from 39 providers, including museums, archives, libraries, historical societies, consortiums, and digital libraries. The aggregated metadata describes an array of cultural heritage resources held by more that 500 institutions worldwide. Some resources exist in digital formats, such as .JPG images. Other resources exist only in analog format and are represented digitally through the metadata.

The common schema used for metadata stored in the repository is Dublin Core (DC). Approximately half of the participating institutions are registered OAI data providers whose records are harvested directly from their own servers. The non–OAI-registered providers delivered “data dumps” of metadata, which became the core of surrogate provider sites implemented at Illinois and used only for harvest by this project. Also included in the repository were item-level metadata records derived from more than 8,000 Encoded Archival Description (EAD) finding aids (describing mostly analog resources). When a custom algorithm was applied, these EAD files generated more than 1.5 million records, bringing the total number of item-level DC records to approximately 2.5 million. (Prom 2003)

 

Research Question and Literature Review

We chose to focus on K-12 educators because we were interested in how cultural heritage materials might be utilized in a classroom setting. Our research question was:

What is the usefulness of an OAI service provider search portal to aggregated cultural heritage material for K-12 educators?

Related questions include: How did educators make choices about which resources to use? Did they pay attention to the institution a resource was coming from? Did educators use the decomposed EAD finding aids included in the portal?

Little has been written on how users interact with collections of aggregated metadata. However, with the advent of the OAI-PMH, several projects and institutions have begun to look at the challenges of aggregating metadata for both internal processes and end-users. Hagedorn (2003) conducted transaction log analysis, user testing, and a user survey for the University of Michigan OAIster service provider. She notes that often end-users (scholars and researchers) did not know where to begin to look for online information (online journals and reference sources). Arms et al (2003), Cole et al (2002), and Shreeves et al (2003) note how variations in metadata authoring practices challenge service providers’ abilities to build consistently searchable systems. Ward (2003) analyzes the use of Dublin Core by OAI data providers. Variations in metadata studied by these authors include which elements and vocabularies are used, the granularity of objects described, and the depth of description. The Illinois project developed a variety of strategies to minimize these disparities, including indexing and presenting metadata by type of material (image, text, physical object, etc.) and applying a normalization vocabulary to the Date, Coverage, and Type elements.

Because our study was focused on K-12 educators, we conducted a literature search to gain a better understanding of how educators look for digital primary sources and how they choose digital sources for use in the classroom. We discovered that the literature focuses on pre-packaged lesson plans and general-use web search engines. Many articles simply offered lists of useful web sites.

VanFossen and Shiveley (2000) describe three ways for educators to access primary sources: textbooks, which often include duplicates of primary source material; commercial reproductions (sometimes called “jackdaws”), which may include suggestions for lesson plans; and the Internet. They suggest using prepared lessons, such as those found on the Library of Congress American Memory web site[2] or using standard search engines, such as Yahoo! or Google, to create one’s own lessons.

While VanFossen and Shiveley do not speak to the question of authority, Kobrin (2001) writes that sites are most useful to use in a history classroom when they have been vetted by a historian. He suggests that History Matters[3] as well as the Library of Congress[4] and the National Archives[5] web sites are “safe, secure, informative, and always accurate.” Warren (2001) describes using digital jackdaws provided by the National Archives in a high school history classroom. Lee (2002) notes that “educators and historians must closely evaluate digital historical resources before using them.”

Websites with digitized primary sources that had been interpreted by historians or curators and placed in a historical context were highly valued by educators. In the Digital Cultural Heritage Community Project at the University of Illinois at Urbana-Champaign (1999-2000)[6], educators, librarians, museum curators, and archivists worked together to identify primary source materials from local museum, library, and archives collections to digitize for use in 3rd-, 4th-, and 5th-grade classrooms. Participants sought primary sources that could be linked to curriculum units and Illinois state learning standards. Bennett et al (2000) note that it was sometimes difficult to match local collections to the broader scope of curricula units. Educators wanted access to national digitized artifacts, and they relied heavily on curator or archivist interpretations. (Bennett and Sandore, 2000; Bennett and Jones, 2001)

Bennett and Jones (2001) conclude: “We need to spend more time making what we digitize useful for teachers and students and less time worrying about getting in on the web.” Supporting this view, Gilliland-Swetland, Kafai, and Landis (1999) note that “a comparatively small amount of primary source material, if appropriately selected, described, and contextualized” [emphasis theirs] can be adequate for use in the classroom.

VanFossen and Shiveley (2000) note that while textbooks and jackdaws undergo a vetting and editorial process and provide context, often the material found on the Internet has not. “The selection of one’s own primary source documents from the Internet also presents the task of packaging the material in a contextually accurate manner.” In addition to interpretation and context, Lee (2002) notes that clarity and a commitment to K-12 education are important.

 

Methodology

Our user population was comprised of 23 college students training to become K-12 social studies teachers in an honors-level curriculum and instruction course. We chose this group because (1) we did not have the resources to identify and study working educators; (2) the professor of this class was eager to participate; and (3) these users were comfortable using the Internet. They were assigned to use the UIUC Digital Gateway to Cultural Heritage Materials to find primary sources for a lesson plan on a specific social sciences topic, and then submit short papers about their experience. (See Appendix 1: Classroom Assignment.)

For purposes of this test, we created a duplicate portal for use by these students and we provided them with a unique URL. This enabled us to conduct a transaction log analysis after the test. Before beginning the assignment, users were introduced to the concept of metadata aggregation and were informed that the search portal would provide pointers to digital content held elsewhere. They were also told that some records refer to analog resources.

After the students completed the assignment, we conducted focus group interviews. (See Appendix Two: Interview Guide for Focus Groups.) These interviews were taped, transcribed, and coded. We also received copies of the students’ papers (with names removed); however, because the papers reiterated comments made during focus groups, we did not code them.

 

Results

In this section we present our analysis of this user group’s experience using this search portal. Our conclusions are specific to this study and cannot be generalized.

Focus Groups and Papers

In both the focus group interviews and the papers, the students expressed frustration with the portal. First, despite their prior introduction to the nature of a portal to aggregated metadata, in practice the users expected all records to point directly to corresponding digital objects. They reported disappointment when records referred to analog resources, and their dismay was exacerbated by the large number of item-level records derived from EAD files, which by definition describe analog resources. Thus, a user who selected a search result labeled “letters from a WWI soldier” might find that the record referred to the holding institution’s finding aid instead of to the letters themselves. The test group was unanimous in finding the inclusion of finding aids exasperating and unhelpful.

An additional point of confusion for users was the fact that one category of material, called Audio, included no sound files. Rather, the 934 items bearing this label were finding aids housed at the Vincent Voice Library at Michigan State University Libraries[7]. The Format/Quantity of Material field for these items contained the phrase audio files: digital sound recording. Thus, they had been indexed in the Audio category. Users expected to exploit this category to locate and play sound files and felt that the labeling was misleading.

Users reported a significant slowing of their efforts when a pointer (active link) within a record went to a top-level or intermediate web page on which users might have to resubmit their request using the institution’s own search engine. These users believed that the inclusion of a live URL in a search result that did not immediately display the digital object of interest worked, in effect, like an online “bait and switch” operation, and they were vocal in their disapproval.

Variations in controlled vocabularies and disparities in the use of DC had resulted in widespread inconsistencies in the harvested metadata. As a result, the Illinois team had decided to enable greater recall by making the default search a keyword search on all fields. Not unexpectedly, keyword searches produced vast quantities of unsorted results. The lack of a ranking feature in search results exacerbated the difficulty of identifying useful resources and resulted in the test group feeling overwhelmed and being unable to make good use of search results.

In an attempt to address the known limitations of keyword searching, the team had also provided an advanced search screen that included typical search-refinement features, such as restricting to specific fields. Transaction log analysis revealed that the test group rarely made use of these fields. In addition, the interviews and papers revealed that users who attempted to refine their searches were unfamiliar with metadata fields like Format and did not know how to specify entries for them.

Another result was the discovery that users accorded equal credibility to all contributing collections. They reported that they made no decisions about which items to examine based on the name of the holding institutions.

We also found that feelings of frustration around failed searches were directed at the search portal rather than at individual institutions providing metadata. Users held the portal responsible for the usability of its aggregated metadata — even when that metadata originated elsewhere and remained largely outside the control of the Illinois project.

Sometimes, as might be predicted, the lack of a common controlled vocabulary among metadata providers was problematic. For example, one user’s search for “Asian American Working Class” found no results, even though the repository contained records for a large collection of photographs from an early 20th century Japanese-American community.

Finally, the users reported being surprised at what was not contained in the repository. They apparently expected the content to be more universal and more predictable than the nature of this research allowed it to be. The repository’s content was dictated by the metadata available to be harvested using the OAI-PMH and by which institutions were willing to share their metadata. As a result, coverage of cultural heritage topics was variable and unpredictable. Despite the vastness of the aggregated metadata, some users found that there simply were no matches for particular topics.

In general, the test group reported that the portal would not be useful for K-12 educators. The inclusion of finding aids and other “dead ends,” the unpredictable content of the repository, and the lack of ranking or sorting for results were key factors in their decision. Many users commented that they had better luck finding online primary sources by searching with Google or Yahoo! or by going directly to one of the provider sites, such as the Library of Congress American Memory Project.

Transaction Log Analysis

Although we are aware of the limitations of transaction log analysis, we found that site usage data helped to supplement the qualitative data we gathered.

Sessions and Searches

Twenty-three users accessed the site 120 times during the test period. During these 120 sessions, they performed 555 searches. These searches were almost evenly split between the simple Keyword Search screen (268) and the Advanced Search screen (287). (However, most users of the Advanced Search screen did not utilize the Advanced Search features.)

Figure 1 — Original Simple Search screen

Original Simple Search

Figure 2 — Original Advanced Search screen

Original Advanced Search

 
Refining searches

Overall, these users did not exploit the advanced search interface for limiting searches by fields. Boolean operators associated with four text boxes and dropdown lists allowed users to limit searches by particular metadata fields, namely, Title, Author or Artist, Subject or Description, Publisher, Format, Type, and Language. Default selections are as shown in Figure 2.

Users utilized the Title field in a total of 186 searches, however, in only 10 searches was Title selected where it was not pre-selected by default. Author/Artist was selected in 218 searches but was selected only once in a list in which it was not the default. Table 1 illustrates how often each field was used.

Table 1 — Use of fields on Advanced Search screen

Field

Default selection?

# of searches

Selected when not a default?

Title

Yes

186

10

Author/Artist

Yes

218

1

Subject/Description

Yes

223

11

Publisher

No

10

 

Format

No

5

 

Type

No

12

 

Language

No

0

 

 

Boolean operators
On the Advanced Search Screen users could combine search terms using from one to three Boolean operator lists. (The fourth Boolean operator was non-functional.) The first operator was set to AND and utilized in 241 searches. In the same operator, OR was never used and NOT was used in one search. In the second operator, AND was selected and used in 231 searches. In the third operator, AND was used in 229 searches. Neither OR nor NOT operators were used in the 2nd or 3rd operators. On the Simple Search screen, AND was automatically inserted between words unless the user typed a phrase in quotation marks (much as Google operates).

Online access to digital objects

Although users were vocal about their preference for quickly retrieving direct links to primary material, fewer than half of all searches (226) included the Online Access Only switch, which by default was not selected. Users reported confusion about the wording of this option, which may account for the many searches in which it was not selected.

On the occasions when users selected Online Access Only and retrieved online finding aids, they believed that the option malfunctioned. Therefore, most users reported that they abandoned their efforts to restrict searches to what they considered primary materials. This situation in large part was responsible for the students’ overwhelming failure to locate material they felt was useable in the classroom.

Limiting by date

In 140 searches (over 37 sessions), a user selected a date range. Table 2 shows the number of searches in which each date range was selected. By default, no date range was selected, so each instance indicates a deliberate selection by the user.

Table 2 — Number of searches limited to specific date ranges*

Date Range

Times selected

 

Date Range

Times selected

20th Century

53

 

1850-1874

11

1975-1999

63

 

1825-1849

21

1950-1974

74

 

1800-1824

8

1925-1949

63

 

16th Century

1

1900-1924

69

 

6th – 10th Century

3

19th Century

8

 

0 – 50th Century

1

1875-1899

8

 

BCE

2

*    Note: The 18th Century, 17th Century, 15th Century, 14th Century, 13th Century, 12th Century, and 11th Century options were never selected.

 

Redirects

In each retrieved record, users could select a link labeled Online Access Available which acted as a redirect, taking the user off of the Gateway portal to an individual collection’s own web site. Once again, these users interpreted the phrase online access to indicate that the link would take them to a digital object, such as a photograph. In reality, the redirect links took users to all types of materials, from photographs and text to finding aids. This was a source of frustration for users, who believed the Online Access Available option was not operational.

Online Access Available links were clicked 216 times. Of the 120 sessions, the redirect link was selected directly from the short record 89 times. In the remaining 31 sessions, the user first displayed the full record, then clicked the redirect link. Table 3 below shows the number of times users were redirected to particular collections.

Table 3 – Number of redirects by collection

Domain

Collection

Times users linked to it

aim25.ac.uk

Archives in London and the M25 Area*

41

spurlock

Spurlock Museum

26

oasis.harvard.edu

Harvard University Libraries

24

bigbird.lib.umn.edu

University of Minnesota Libraries

21

umich.edu

University of Michigan

21

oac.cdlib.org

Online Archive of California

18

pl.lib.uchicago.edu

University of Chicago Library

9

ohiohistory.org

Ohio Historical Society

7

search.tpl.lib.wa.us

Tacoma Public Library

6

hdl.loc.gov

Library of Congress

6

alliance.librarysystem.com

Illinois State Library

5

ftp.archive

The Open Video Project

5

radcliff.edu

Harvard University Libraries

4

davidrumsey.org

David Rumsey Map Collection

3

helios.dli.utk.edu

University of Tennessee Special Collections

3

libtext.library.wisc.edu

University of Wisconsin-Madison Library

3

diglib.lib.utk.edu

University of Tennessee Special Collections

2

lib.msu.edu

Michigan State University Libraries

2

archives.state.co.us

Colorado State Libraries

1

digital.lib.umn.edu

University of Minnesota Libraries

1

ibiblio.org

IBIBLIO.org

1

lib.uiowa.edu

Iowa Women's Archives

1

library.ppld.org

Pikes Peak Library District

1

lcweb.loc.gov

Library of Congress

1

memory.loc.gov

American Memory Project

1

mnhs.org

Minnesota Historical Society

1

open-video.org

The Open Video Project

1

*    Note: The Archives in London and the M25 Area web site received a disproportionate number of redirects because it appeared first in search results. Focus group interviews indicate that many users never looked beyond the first couple of screens of search results.

 

Conclusions

A clear and obvious finding of our work is that, while the OAI-PMH itself is readily implemented, the challenges posed by large amounts of heterogeneous metadata are significant. Certainly the application of more sophisticated pre-processing tools as well as robust, scalable search tools and ranking of results would make the portal a more effective tool for users. Other options include the development of thematic exhibits (based on human and/or machine analysis of metadata) that would offer glimpses into the range and type of materials available, and offering users the ability to annotate individual records to highlight particularly useful resources. Providing a quick-browse feature to give users a preview of what is — and is not — available in the portal would make it more useful to educators. In general, the interface challenge extends to any tools that help adjust user expectations.

The inclusion of EAD finding aids and their decomposed item-level records was an obstacle for these users. They did not understand why the records were included and were confused by opaque labels, such as “Box 23.” Several members of the test group commented that finding aids may be useful for researchers or scholars but not for educators. As a result of the test, we eliminated EAD records from the UIUC Gateway to Cultural Heritage Resources. We are currently investigating creating a portal that is specific to EAD finding aids that will enable further research into the use of EAD with OAI-PMH.

In addition, the tests led to several changes in the interface. We combined the simple and advanced search screens, improved labeling, and combined several resource-type categories into a simpler set of options (see Figure 3). We also attempted to clarify for users which resources offered direct online access and which did not (see Figure 4).

Figure 3 — Revised search screen

Revised Search

The single Online Access Available link in search results was replaced by two, more specifically-worded links. (1) View Item was applied to resources that are directly viewable online from the search result. (2) Learn more about this item was applied to results that would lead the user to a collection’s web site or to descriptive information about the resource.

Figure 4 — Revised wording in search results

Revised Wording in Results

Unfortunately, time did not allow us to conduct a second assessment of the portal’s usefulness once these changes were made. However, we continue to work with OAI-enabled aggregations of metadata, and we expect that future research will build on the baseline work done here.


References

Arms, W.Y. et al. “A Case Study in Metadata Harvesting: the NSDL” Library Hi Tech, vol. 21, no.2, 2003, pp. 228-237.

Bennett, N.A. et al. “Integration of Primary Resource Materials into Elementary School Curricula” Papers: Museums and the Web 2000 (MW 2000). Accessed online on July 10, 2003, at http://www.archimuse.com/mw2000/papers/bennett/bennett.html.

Bennett, N.A. and T. Jones. “Building a Web-Based Collaborative Database – Does it Work?” Papers: Museums and the Web 2001 (MW 2001). Accessed online on July 10, 2003, at http://www.archimuse.com/mw2001/papers/bennett/bennett.html.

Cole, T.W. et al. “Now That We’ve Found the ‘Hidden Web’ What Can We Do With It? The Illinois Open Archives Initiative Metadata Harvesting Experience” Museums and the Web 2002: Selected Papers from an Int’l Conf. (MW 2002), Archives & Museum Informatics, Toronto, 2002, pp. 63-72. Accessed online on July 10, 2003, at: http://www.archimuse.com/mw2002/papers/cole/cole.html.

Gilliland-Swetland, A.J., Y. Kafai, and W.E. Landis. “Integrating Primary Sources into the Elementary School Classroom: A Case Study” Archivaria. vol. 48, 1999, pp. 89-116.

Hagedorn, K. “OAIster: A “No Dead Ends” OAI Service Provider” Library Hi Tech, vol.21, no.2, 2003, pp. 170-181.

Kobrin, D. “Using History Matters with a Ninth Grade Class” The History Teacher, vol. 34, no. 3, 2001, pp. 339-343.

Lee, J.K. “Digital History in the History/Social Studies Classroom” The History Teacher, vol. 35, no. 4, 2002, pp. 503-517.

Prom, C.J. “Reengineering Archival Access Through the OAI Protocols” Library Hi Tech. vol. 21, no.2, 2003. pp. 199-209.

Shreeves, S.L., J. Kaczmarek, and T.W. Cole. “Harvesting Cultural Heritage Metadata Using the OAI Protocol” Library Hi Tech. vol. 21, no.2, 2003, pp. 159-169.

VanFossen, P.J. and J.M. Shiveley. “Using the Internet to Create Primary Source Teaching Packets” The Social Studies, v.91, no. 6, 2000, pp. 244-252.

Ward, J. “A Quantitative Analysis of Unqualified Dublin Core Metadata Element Set Usage within Data Providers Registered with the Open Archives Initiative” Proceedings: 2003 Joint Conference on Digital Libraries (JCDL 2003), Institute of Electrical and Electronics Engineers, Inc., Los Alamitos, CA, 2003, pp. 315-317.

Warren, W.J. “Using the World Wide Web for Primary Source Research in Secondary History Classes” in History.edu: Essays on Teaching with Technology, ed. D.A. Trinkle and S.A. Merriman, M.E. Sharpe, Armonk, NY, 2001.


Appendix One: Classroom Assignment

Primary Source Evaluation: Open Archive Initiative (OAI)

The Open Archive Initiative (OAI) is a search site project focusing specifically on cultural heritage materials, which is currently being developed at the University of Illinois at Urbana-Champaign. The UIUC-OAI site is one of seven privately sponsored research sites, and the only one dealing specifically with cultural heritage materials. This site currently has numerous partners with more than a million artifacts accessible to date. The UIUC-OAI site allows for accessing artifacts without going directly to individual institutional homepages. The UIUC-OAI also provides the user with an opportunity to complete a more refined search and selection of cultural heritage objects without utilizing broad search engines such as google.com.

While various initiatives like the UIUC-OAI are seeking to develop, implement, and evaluate the usefulness of such sites, there remains little research on how educators – and specifically, how student teachers – conceptualize and utilize such sites. The purpose of this project is to provide you with an opportunity to use this site, to evaluate the process as well as an opportunity to communicate with the project team as they seek to improve the purpose and use of the site. Your evaluation of this site will aid in further development of the site and its usefulness for educators.

You may consider such a task to be daunting, depending on the degree to which you feel comfortable working with search engines. And to evaluate a site without a specific purpose in mind may well be a fruitless activity. In order to consider and critique such questions and issues concerning (1) the process of navigating the site; and (2) the actual object availability, you will be required to identity two themes that you will include in your unit project. You will then select suitable primary source materials from the site. How you incorporate the sources into your unit is solely up to you and your group. However, in selecting the source, you need to consider what you think the source provides that other available classroom sources (i.e. textbooks, handouts, lectures, etc.) cannot. You will also have to consider how rich the source is, how you will utilize the information presented about the source and its inclusion in a learning activity. And finally, you will have to provide a listing of questions that focuses specifically on student learning from the source.

So, the questions to consider as you work through this project will focus on the UIUC-OAI site and should include the following:

You will be evaluated on the degree in which you can engage an argument about the site, the cogency of the argument, and recommendations. As well, you will be expected to select, provide a rationale, develop and evaluate the use of the primary sources.


Appendix Two: Interview Guide for Focus Groups

TELL ME ABOUT SOME OF YOUR SEARCHES.

IN THE LIST OF RESULTS, HOW DID YOU DECIDE WHAT TO LOOK AT FIRST?

HOW HELPFUL IS THIS SITE FOR PUTTING TOGETHER A LESSON PLAN?

ARE THERE OTHER PLACES WHERE YOU COULD FIND THE SAME TYPE OF MATERIAL?

OVERALL, HOW USEFUL IS THIS SITE FOR TEACHERS?

WHAT WOULD MAKE THIS SITE MORE USEFUL FOR YOU?


Footnotes

[1] The revised UIUC Digital Gateway to Cultural Heritage Materials portal can be found at: http://nergal.grainger.uiuc.edu/cgi/b/bib/bib-idx/

[2] Library of Congress American Memory Project Learning Page: http://memory.loc.gov/ammem/ndlpedu/index.html

[3] History Matters: http://www.historymatters.gmu.edu

[4] Library of Congress American Memory Project: http://memory.loc.gov/

[5] U.S. National Archives and Records Administration: http://www.archives.gov

[6] Digital Cultural Heritage Community: http://images.library.uiuc.edu/projects/DCHC/

[7] MSU Vincent Voice Library: http://www.lib.msu.edu/vincent/

 

 

 

 

This page last updated on August 22, 2003.

˙