The Open Language Archives Community: An infrastructure for distributed archiving of language resources

Gary Simons,Steven Bird
DOI: https://doi.org/10.48550/arXiv.cs/0306040
2003-06-10
Abstract:New ways of documenting and describing language via electronic media coupled with new ways of distributing the results via the World-Wide Web offer a degree of access to language resources that is unparalleled in history. At the same time, the proliferation of approaches to using these new technologies is causing serious problems relating to resource discovery and resource creation. This article describes the infrastructure that the Open Language Archives Community (OLAC) has built in order to address these problems. Its technical and usage infrastructures address problems of resource discovery by constructing a single virtual library of distributed resources. Its governance infrastructure addresses problems of resource creation by providing a mechanism through which the language-resource community can express its consensus on recommended best practices.
Computation and Language,Digital Libraries
What problem does this paper attempt to address?
The main problems that this paper attempts to solve include: 1. **Problems in resource discovery**: - Although electronic resources may already exist, they may not be indexed by search engines, or even if they are indexed, the user's search terms do not match the indexed terms, resulting in the user's inability to find the required resources. - When accessing relevant data resources, users may lack the necessary tools or guidance to use these resources effectively. 2. **Problems in resource creation**: - Due to the fact that different data providers use multiple formats and methods, researchers are confused when preparing materials for online publication. - The development of tools is scattered in multiple directions, and no single method can provide a complete and easy - to - use set of tools. - Due to the short life cycles of hardware and software, these new resources are at risk of losing their availability. If no measures are taken for the long - term preservation of electronic information resources, many of today's resources will become almost unusable within 10 years. To solve these problems, the paper describes the infrastructure constructed by the Open Language Archives Community (OLAC), specifically including: - **Technical infrastructure**: Solve the problem of distributed resource discovery by constructing a single virtual library. OLAC is based on the Open Archives Initiative (OAI) metadata harvesting protocol to ensure metadata compatibility and sharing among institutions. - **Usage infrastructure**: Define how human participants interact with machines, including providing a unified search engine, a controlled vocabulary server, and an online metadata editor (ORE) to facilitate the participation of more individuals and institutions. - **Governance infrastructure**: Promote community consensus by formulating standards and best practice recommendations to ensure the persistence and portability of language resources. The governance infrastructure also defines the organizational structure and operation mode of OLAC to ensure that community members can cooperate and communicate effectively. In summary, this paper aims to solve the problems existing in the discovery and creation processes of language resources by constructing the OLAC infrastructure, thereby improving the availability and persistence of language resources.