Metadata – Page 10 – DigitalKoans

"Repository Metadata: Approaches and Challenges"

John W. Chapman, David Reynolds, and Sarah L. Shreeves have self-archived "Repository Metadata: Approaches and Challenges" in IDEALS.

Here's an excerpt:

Many institutional repositories have pursued a mixed metadata environment, relying on description by multiple workflows. Strategies may include metadata converted from other systems, metadata elicited from the document creator or manager, and metadata created by library or repository staff. Additional editing or proofing may or may not occur. The mixed environment brings challenges of creation, management, and access. In this article, repository efforts at three major universities are discussed. All three repositories run on the DSpace software package, and the opportunities and limitations of that system will be examined. The authors discuss local strategies in light of current thinking on metadata creation, user behavior, and the aggregation of heterogeneous metadata. The contrasts between the mission of each repository effort will show the importance of local customization, while the experience of all three institutions forms the basis for recommendations on strategies of benefit to a wide range of librarians and repository planners.

Jerry Kline's SkyRiver to Take on OCLC

Jerry Kline, owner of Innovative Interfaces, is launching a new company, SkyRiver, on Friday that will compete with OCLC's cataloging services. SkyRiver's database will initially include about 20 million bibliographic records.

Enhanced Ingest to Digital E-Research Repositories: Final Report

JISC has released Enhanced Ingest to Digital E-Research Repositories: Final Report.

Here's an excerpt:

The project developed a demonstrator that implemented an enhanced deposit and ingest process to a digital repository based on Fedora. The process incorporates the SWORD API for deposit, and accepts deposits that contain multiple files (packaged as a zip file). The workflow performs preservation actions (e.g. capturing PREMIS metadata, format migration), extraction of resource discovery metadata (for text-based formats such as PDF, MS Word, HTML), and capture of publisher self-archiving policies (for post-prints). The resources are ingested into the repository following an atomistic model—individual files and directories correspond to individual digital objects, and relationships between them (i.e. the membership relation between files/directories) are represented as RDF statements. The workflow was constructed from a variety of components developed by other projects.

Institutional Identifiers in Repositories: A Survey Report for the NISO I2 Workgroup

The National Information Standards Organization has released Institutional Identifiers in Repositories: A Survey Report for the NISO I2 Workgroup.

Here's an excerpt:

The survey showed that standardized institutional identifiers are seen as important and it was agreed there is a need for them in the repository community. The need for identifiers is underscored by the ways in which repository content is shared. A clear majority of repositories include identifiers for the repository itself and many include institutional identifiers. Those that include the latter generally also include identifiers for subordinate units within the identified institution. Most of these identifiers are not used in other usage contexts—e.g., Inter-Library Loan, electronic resource management systems, etc.—but there is some agreement that it would be important for a single identifier to be used for all organizational purposes. The majority of respondents would be willing to participate in a registry of institutional identifiers provided that participation is voluntary and cost-free.

Institutional identifiers already in usage are largely based upon the Uniform Resource Identifier (URI) standard, whether they take the form of Hypertext Transfer Protocol (HTTP) URIs, Uniform Resource Names (URNs), CNRI Handles, or OCLC PURLs. An overwhelming majority of respondents consider resolvability of institutional identifiers important.

"Google's Book Search: A Disaster for Scholars"

In "Google's Book Search: A Disaster for Scholars," Geoffrey Nunberg examines the limitations of Google Book Search's metadata, which he calls "a train wreck: a mishmash wrapped in a muddle wrapped in a mess."

Authority Control for Repositories: Names Project: Final Report

JISC has released the Names Project: Final Report.

Here's an excerpt:

The Names Project began in July 2007. It was funded to investigate requirements for a name authority service for UK repositories. Prototype name authority software has been developed as part of this work and a number of connections have been made with UK stakeholders and with international projects working in a similar space.

Plugins to Import E-Print Metadata from arXiv into an EPrints Repository

The IncReASe (Increasing Repository Content through Automation and Services) project has released four plugins to facilitate importing e-print metadata from arXiv into an EPrints repository.

Here's an excerpt from the plugins' Web page:

Potentially, content in arXiv could provide a "quick win" for repository population. No arXiv depositor we have talked to date has objected to our importing their work into WRRO [White Rose Research Online]. From discussions with arXiv users, we are assuming that local deposit in WRRO with a "push" of data to arXiv may be difficult to achieve—we'd need to demonstrate some clear benefit to the depositor. arXiv serves its community well. A more likely model may be that arXiv users continue to deposit as now but IRs "harvest" data from arXiv (or perhaps arXiv will develop a facility to push material into local IRs).

"Saying What We Do—Doing What We Say: Preservation Issues (Metadata and Otherwise) in Institutional Repositories"

Sarah L. Shreeves has self-archived her presentation "Saying What We Do—Doing What We Say: Preservation Issues (Metadata and Otherwise) in Institutional Repositories" in IDEALS.

Streamline Integrating Repository Function with Work Practice: Tools to Facilitate Personal E-Administration, Final Report v1.3

JISC has released Streamline Integrating Repository Function with Work Practice: Tools to Facilitate Personal E-Administration, Final Report v1.3.

Here's an excerpt:

The tools developed include an automatic metadata generation tool that completes as much of the metadata as possible, from documentation associated with a learning object, including suggesting key words to the user; and resource discovery tools, which recommend additional resources based on closeness of objects to the original search results. In addition, we contributed to a variety of widgets, developed with the PERSoNA project, to demonstrate the use of social networking tools to promote sharing of resources through the repository.

“RKBExplorer: Repositories, Linked Data and Research Support”

Hugh Glaser, Ian Millard, and Les Carr have self-archived "RKBExplorer: Repositories, Linked Data and Research Support" in the ECS EPrints Repository.

Here's an excerpt:

RKBExplorer (http://rkbexplorer.com/) is a system for publishing Linked Data to Semantic Web standards, also providing a browser that allows users to explore this interlinked Web of Data, primarily in the domain of scientific endeavour. As part of the activity, we have harvested the metadata from a number of the larger ePrints repositories into http://eprints.rkbexplorer.com, and republished it as Linked Data. This allows the RKBExplorer browser to present a unified view of these repositories and related data from other sources such as dblp and dbpedia (a Semantic Web version of Wikipedia). Users can thus investigate concepts related to the ePrints people and articles, such as related people, projects and institutions.

“Repurposing ProQuest Metadata for Batch Ingesting ETDs into an Institutional Repository”

Shawn Averkamp and Joanna Lee have published "Repurposing ProQuest Metadata for Batch Ingesting ETDs into an Institutional Repository" in the latest issue of the Code4Lib Journal.

Here's an excerpt:

This article describes the workflow used by the University of Iowa Libraries to populate their institutional repository and their catalog with the data collected by ProQuest UMI Dissertation Publishing during the submission of students' theses and dissertations. Re-purposing the metadata from ProQuest allowed the University of Iowa Libraries to streamline the process for ingesting theses and dissertations into their institutional repository The article includes a discussion of the benefits and limitations of the workflow described.

Vocabulary Mapping Framework Announced

A cooperative project, the Vocabulary Mapping Framework, is mapping major metadata standards (CIDOC CRM, DCMI, DDEX, DOI, FRBR, MARC21, LOM, ONIX, and RDA).

Here's an excerpt from the press release:

The new vocabulary is not intended as a replacement for any existing standards, but as an aid to interoperability, whether automatic or human-mediated. The expanded Framework will include mappings of terms from code lists or allowed value sets in the existing standards to the RDA/ONIX vocabulary, enabling the computation of "best fit" mappings between any pairing of standards. . . .

The work will result in:

a mapping of vocabularies from the source standards to support the building of crosswalks and transformations between any of them;

a definitive reference set which editors can draw on when creating and developing standards;

a downloadable RDF/OWL ontology to support the interchange of metadata content between these major standards, which will be useful to enable automated reuse of metadata from different sources and schemas, to improve the quality and access and reduce the cost of metadata;

a governance scheme to oversee further development.

Creating Catalogues: Bibliographic Records in a Networked World

The Research Information Network has released Creating Catalogues: Bibliographic Records in a Networked World .

Here's an excerpt from the announcement:

Against this background the RIN report: Creating Catalogues: bibliographic records in a networked world, is a very timely overview of the whole process of bibliographic record production for printed and electronic books, and for scholarly journals and journal articles. This report follows the production of these data from publisher through a range of intermediaries to the end user. Whilst there are pressures to make these data more freely available, each player in the process has its own motivations and business models in creating, adding to, using or re-using bibliographic data, all of which need to be considered.

We find that there would be considerable benefits if libraries, along with other organisations in the supply chain, were to operate more at the network level but that there are significant barriers in the way of making significant moves in that direction.

Creating Catalogues cannot attempt to solve all the problems in the way of making bibliographic data more freely available for re-use and innovation, or of eliminating wasteful duplication of effort. Our objective is to clarify the key issues and to stimulate debate on possible ways forward. Creating Catalogues provides a number of key recommendations and the RIN will work with the academic library community and other key stakeholders in the supply chain to raise awareness and understanding of the issues raised in this report, of the benefits to be achieved by moving to new models, and of how we might overcome the barriers to achieving them.

Webcast: FRBR: Things You Should Know. . .

The Library of Congress has released the FRBR: Things You Should Know. . . Webcast presented by Barbara Tillett.

Here's an excerpt from the description:

This presentation for non-catalogers is intended to present basic concepts and benefits of using the FRBR conceptual model (Functional Requirements for Bibliographic Records) in resource discovery systems.

IR Deposit Using Embedded Document Metadata: Deposit Plait: Final Report

JISC has released Deposit Plait: Final Report.

Here's an excerpt:

The aim of the Deposit Plait project was to examine potential for easing the deposit of journal articles into institutional repositories by making use of any metadata embedded within the document properties of the document being deposited. . . .

The first stage of the project was to see how easy it is to extract this metadata. The target file formats that the project worked with were the Open Document Format (as created by OpenOffice), OpenXML (as created by Microsoft Office 2007), and .doc files (as created by version of Microsoft Office from 97 to 2003). There are standard open source software libraries that can extract both standard and custom metadata fields from each of these file forms.

The second stage of the project was to see how easy it is to use extracted metadata as search terms in order to search for a more complete metadata record. In the case where the item being deposited into the repository has been in existence for some time (it is a 'retrospective deposit') then metadata found can be used to perform a search. Different search methods were implemented as examples, including using search APIs, and screen scraping from search services. Whilst the method works fine, there are the normal licensing issues to consider, and whether licences cover the user for this type of metadata re-use.

The project concluded by creating an online demonstration system. In contrast to a normal repository deposit where the user enters metadata, and then uploads a file, this system requires the user to first upload a file. The metadata is extracted, and the user is allowed to choose which (one or more) of the fields to use as the basis of a search. The search is then initiated and matching records returned. The user can then pick and choose fields from the results the 'plait' together their final metadata record.

Library of Congress Makes ID.LOC.GOV Authorities and Vocabularies Service Publicly Available

The Library of Congress has made its ID.LOC.GOV authorities and vocabularies service publicly available.

Here's an excerpt from the announcement:

The Library of Congress has opened its ID.LOC.GOV web service, Authorities and Vocabularies, with the Library of Congress Subject Headings (LCSH) as the initial offering. The primary goal of this service is to enable machines to programmatically access data at the Library of Congress but the web interface also provides simple user access. We view this service as a step toward exposing and interconnecting vocabulary and thesaurus data via URLs. For LCSH, we are fortunate to have been able to link terms to a similar service provided in Europe for RAMEAU, a French subject heading vocabulary closely coordinated with LCSH.

OCLC Releases Networking Names Report

OCLC has released the Networking Names report.

Here's an excerpt from the press release:

This report identifies the necessary components of a "Cooperative Identities Hub" that would address the problem space in the research community and have the most impact across different target audiences.

The fifteen members of the RLG Partnership Networking Names Advisory Group developed fourteen use case scenarios around academic libraries and scholars, archivists and archival users, and institutional repositories that provide the context in which different communities would benefit from aggregating information about persons and organizations, corporate and government bodies, and families, and making it available on a network level.

The report summarizes the group's recommendations on the functions and attributes needed to support the use case scenarios.

Advancing the State of the Art in Distributed Digital Libraries: Accomplishments of and Lessons Learned from the Digital Library Federation Aquifer Metadata Working Group

The DLF Aquifer Metadata Working Group has released Advancing the State of the Art in Distributed Digital Libraries: Accomplishments of and Lessons Learned from the Digital Library Federation Aquifer Metadata Working Group.

DRAFT: TEI Text Encoding in Libraries: Guidelines for Best Encoding Practices

DRAFT: TEI Text Encoding in Libraries: Guidelines for Best Encoding Practices is now available for comment until May 6, 2009.

Here's an excerpt from the comment survey:

The revised "TEI Text Encoding in Libraries: Guidelines for Best Encoding Practices," currently in draft form, contain updated versions of the widely adopted encoding 'levels'—from fully automated conversion to content analysis and scholarly encoding. They also contain a substantially revised section on the TEI Header, designed to support interoperability between text collections and the use of complementary metadata schemas such as MARC and METS. The new Guidelines also reflect an organizational shift. Originally authored by the DLF-sponsored TEI Task Force, the current revision work is a partnership between members of the Task Force and the TEI Libraries SIG. As a result of this partnership, responsibility for the Guidelines will migrate to the SIG, allowing closer work with the TEI Consortium as a whole and a stronger basis for advocating for the needs of libraries in future TEI releases.

OECD: We Need Publishing Standards for Datasets and Data Tables

OECD has released We Need Publishing Standards for Datasets and Data Tables.

Here's an excerpt:

Datasets are a significant part of the scholarly record and are being published more and more frequently, either formally or informally. Many publishers are beginning to link to them from their journals and authors are trying to cite them in their articles. Librarians would like a way to manage them alongside other publications. In short, they need to be integrated into the scholarly information system so that authors, readers and librarians can use, find and manage them as easily as they do working papers, journal articles and books.

In this paper, OECD is proposing some standards for citing and bibliographic management of datasets and data tables. OECD is currently building a new online publishing platform which will host working papers, journals, books, tables and datasets. Due to be launched in mid-2009, this platform will use the standards proposed above. Librarians will be offered MARC 21 records for datasets, alongside records for OECD books and periodicals. Users of the platform will be invited to download citations for datasets and tables in a form compatible with popular bibliographic management systems. All the DOIs for the datasets and tables will be deposited with CrossRef, ready for other publishers to use.

OCLC: A Symposium for Publishers and Librarians

OCLC has released presentations and other documents related to its recent event, A Symposium for Publishers and Librarians.

Here's an excerpt from the symposium report:

On March 18th and 19th representatives from libraries, the publisher supply chain and organizations supporting these communities met at OCLC's Conference Center in Dublin, Ohio to discuss metadata needs, practices, lifecycle and economics across the communities and to explore opportunities for change.

“Name Authority Control in Institutional Repositories”

Dorothea Salo has self-archived "Name Authority Control in Institutional Repositories" in MINDS@UW.

Here's an excerpt:

Neither the standards nor the software underlying institutional repositories anticipated performing name authority control on widely disparate metadata from highly unreliable sources. Without it, though, both machines and humans are stymied in their efforts to access and aggregate information by author. Many organizations are awakening to the problems and possibilities of name authority control, but without better coordination, their efforts will only confuse matters further. Local heuristics-based name-disambiguation software may help those repository managers who can implement it. For the time being, however, most repository managers can only control their own name lists as best they can after deposit while they advocate for better systems and services.

CrossRef’s Geoffrey Bilder on Author Identifiers

Gobbledygook has interviewed CrossRef's Geoffrey Bilder about author identifiers.

Here's an excerpt:

Of course, lots of the same issues can be raised with CrossRef, right? What guarantees that CrossRef won’t become evil and co-opt all of our identities? This, of course is the big fear underlining the knee-jerk reaction against "centralized systems" in favor of "distributed systems". The problem with this, as I mentioned in the FriendFeed thread is that my personal and unfashionable observation is that "distributed" begets "centralized." For every distributed service created, we’ve then had to create a centralized service to make it useable again (ICANN, Google, Pirate Bay, CrossRef, DOAJ, ticTocs, WorldCat, etc.). This gets us back to square one and makes me think the real issue is- how do you make the centralized system that eventually emerges accountable? This is, of course, a social issue more than a technical issue and involves making sure that whatever entity emerges has clearly defined data portability policies and a "living will" that attempts to guarantee that the service can be run in perpetuity- even if by another organization. For the record, I don’t think adopting the slogan "don’t be evil" is enough ;).

“On the Communication of Scientific Results: The Full-Metadata Format”

Moritz Riede, Rico Schueppel, Kristian O. Sylvester-Hvid, et al. have self-archived "On the Communication of Scientific Results: The Full-Metadata Format" in arXiv.

Here's an excerpt:

In this paper, we introduce a scientific format for text-based data files, which facilitates storing and communicating tabular data sets. The so-called Full-Metadata Format builds on the widely used INI-standard and is based on four principles: readable self-documentation, flexible structure, fail-safe compatibility, and searchability. As a consequence, all metadata required to interpret the tabular data are stored in the same file, allowing for the automated generation of publication-ready tables and graphs and the semantic searchability of data file collections. The Full-Metadata Format is introduced on the basis of three comprehensive examples. The complete format and syntax is given in the appendix.

Indiana University Digital Library Program Releases IN Harmony Sheet Music Cataloging Tool

The Indiana University Digital Library Program has released the IN Harmony Sheet Music Cataloging Tool.

Here's an excerpt from the tool's page:

The IN Harmony Sheet Music Cataloging Tool is an open source tool developed by the Indiana University Digital Library Program with funding from the Institute of Museum and Library Services as part of the IN Harmony: Sheet Music From Indiana project. This tool has been designed to assist libraries, archives, museums, and individual collectors describe their sheet music collections in a robust and standards-based way. This is a production system of the Indiana University Digital Library Program and was used to catalog more than 10,000 pieces of sheet music for the IN Harmony project.

The tool collects descriptive metadata about sheet music and exports it in the MODS, simple Dublin Core, and OAI-PMH Static Repository formats.