Metadata – Page 2 – DigitalKoans

"Twenty Years of Wikipedia in Scholarly Publications: A Bibliometric Network Analysis of the Thematic and Citation Landscape"

Results also show that the author collaboration network is very sparsely connected, indicating the absence of close collaboration among the authors in the field. Furthermore, results reveal that the Wikipedia research institutions’ collaboration network reflects a North–South divide as very limited cooperation occurs between developed and developing countries’ institutions. Finally, the multiple correspondence analysis applied to obtain the Wikipedia research conceptual map reveals the breadth, diversity, and intellectual thrust of the Wikipedia’s scholarly publications.

https://doi.org/10.1007/s11135-023-01626-7

"Scaling Identifiers and Their Metadata to Gigascale: An Architecture to Tackle the Challenges of Volume and Variety"

Persistent identifiers are applied to an ever-increasing variety of research objects, including software, samples, models, people, instruments, grants, and projects, and there is a growing need to apply identifiers at a finer and finer granularity. Unfortunately, the systems developed over two decades ago to manage identifiers and the metadata describing the identified objects no longer scale. Communities working with physical samples have grappled with these three challenges of the increasing volume, variety, and variability of identified objects for many years. To address this dual challenge, the IGSN 2040 project explored how metadata and catalogues for physical samples could be shared at the scale of billions of samples across an ever-growing variety of users and disciplines. In this paper, we focus on how we scale identifiers and their describing metadata to billions of objects and who the actors involved with this system are. Our analysis of these requirements resulted in the definition of a minimum viable product and the design of an architecture that not only addresses the challenges of increasing volume and variety but, more importantly, is easy to implement because it reuses commonly used Web components. Our solution is based on a Web architectural model that utilises Schema.org, JSON-LD, and sitemaps. Applying these commonly used architectural patterns on the internet allows us to not only handle increasing variety but also enable better compliance with the FAIR Guiding Principles.

http://doi.org/10.5334/dsj-2023-005

"Geospatial Open Data Usage and Metadata Quality"

The Open Government Data portals (OGD), thanks to the presence of thousands of geo-referenced datasets, containing spatial information are of extreme interest for any analysis or process relating to the territory. For this to happen, users must be enabled to access these datasets and reuse them. An element often considered as hindering the full dissemination of OGD data is the quality of their metadata. Starting from an experimental investigation conducted on over 160,000 geospatial datasets belonging to six national and international OGD portals, this work has as its first objective to provide an overview of the usage of these portals measured in terms of datasets views and downloads. Furthermore, to assess the possible influence of the quality of the metadata on the use of geospatial datasets, an assessment of the metadata for each dataset was carried out, and the correlation between these two variables was measured. The results obtained showed a significant underutilization of geospatial datasets and a generally poor quality of their metadata. In addition, a weak correlation was found between the use and quality of the metadata, not such as to assert with certainty that the latter is a determining factor of the former.

https://doi.org/10.3390/ijgi10010030

"Measuring the Concept of PID Literacy: User Perceptions and Understanding of Persistent Identifiers in Support of Open Scholarly Infrastructure"

The increasing centrality of persistent identifiers (PIDs) to scholarly ecosystems and the contribution they can make to the burgeoning ‘PID graph’ has the potential to transform scholarship. Despite their importance as originators of PID data, little is known about researchers’ awareness and understanding of PIDs, or their efficacy in using them. In this article we report on the results of an online interactive test designed to elicit exploratory data about researcher awareness and understanding of PIDs. This instrument was designed to explore recognition of PIDs and the extent to which researchers correctly apply PIDs within digital scholarly ecosystems, as well as measure researchers’ perceptions of PIDs. Our results reveal irregular patterns of PID understanding and certainty across all participants, though statistically significant disciplinary and academic job role differences were observed in some instances. Uncertainty and confusion were found to exist in relation to dominant schemes such as ORCID and DOIs, even when contextualized within real-world examples. We also show researchers’ perceptions of PIDs to be generally positive but that disciplinary differences can be noted, as well as higher levels of aversion to PIDs in specific use cases and negative perceptions where PIDs are measured on an ‘activity’ semantic dimension. This work therefore contributes to our understanding of academics’ ‘PID literacy’ and should inform those designing PID-centric scholarly infrastructures, that a significant need for training and outreach to active researchers remains necessary.

https://arxiv.org/abs/2211.07367

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"The Preprint Revolution — Implications for Bibliographic Databases"

In the box below, we present six recommendations for optimizing the indexing of preprints in bibliographic databases. As we will discuss later, implementing these recommendations requires close collaboration between bibliographic databases and other actors in the scholarly publishing system.

Recommendation 1: Cover all relevant preprint servers.

A bibliographic database should index preprints from all relevant preprint servers. A disciplinary database (e.g., PubMed and Europe PMC) should index preprints from all preprint servers relevant in a particular discipline. A multidisciplinary database (e.g., Dimensions, the Lens, Scopus, and Web of Science) should index preprints from all preprint servers across all disciplines.

Recommendation 2: Provide comprehensive preprint metadata.

A bibliographic database should provide metadata for preprints that is as comprehensive as metadata for journal articles. The metadata should at least include the title and abstract of a preprint, the names and affiliations of the authors, the reference list, and funding information. It should also include a version history.

Recommendation 3: Provide links between preprints and journal articles.

If an article has been published both on a preprint server and in a journal, a bibliographic database should provide a link between the preprint and the journal article. The link establishes that the preprint and the journal article are different versions of the same article. The preprint and the journal article belong to the same publication family.

Recommendation 4: Provide links between preprints and peer reviews.

If a preprint has been peer reviewed and the reviews have been made openly available, a bibliographic database should index the reviews and should provide links between the preprint and the reviews.

Recommendation 5: Provide deduplicated citation links between publication families.

A bibliographic database should provide deduplicated citation links at the level of publication families. If there are multiple citation links from publications in one publication family (e.g., from a preprint and from a journal article) to publications in another publication family, these citation links should be deduplicated.

Recommendation 6: Do not make arbitrary distinctions between publication types (preprints, journal articles, and others).

A bibliographic database should not make arbitrary distinctions between preprints, journal articles, and other publication types. A database may inform its users about relevant differences between publications of different types (e.g., whether publications have been peer reviewed or not), but otherwise it should treat all publications in the same way, regardless of their publication type.

bit.ly/3KtuWXl

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"NISO Publishes New Recommended Practice for Video and Audio Metadata"

NISO’s new Video and Audio Metadata Recommended Practice will help address these challenges, by providing a vocabulary that enables connectivity between existing standards covering key metadata elements: administrative (e.g., dates, versions, and identifiers); semantic (e.g., subject classifications and keywords); technical (e.g., media type, encoding, and bitrate); rights (e.g., rights owner, licensor, and embargo information); and accessibility (e.g., accessibility features and access).

bit.ly/3ImojVp

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Cluster Analysis of Open Research Data: A Case for Replication Metadata"

Research data are often released upon journal publication to enable result verification and reproducibility. For that reason, research dissemination infrastructures typically support diverse datasets coming from numerous disciplines, from tabular data and program code to audio-visual files. Metadata, or data about data, is critical to making research outputs adequately documented and FAIR. Aiming to contribute to the discussions on the development of metadata for research outputs, I conducted an exploratory analysis to determine how research datasets cluster based on what researchers organically deposit together. I use the content of over 40,000 datasets from the Harvard Dataverse research data repository as my sample for the cluster analysis. I find that the majority of the clusters are formed by single-type datasets, while in the rest of the sample, no meaningful clusters can be identified. For the result interpretation, I use the metadata standard employed by DataCite, a leading organization for documenting a scholarly record, and map existing resource types to my results. About 65% of the sample can be described with a single-type metadata (such as Dataset, Software orReport), while the rest would require aggregate metadata types. Though DataCite supports an aggregate type such as a Collection, I argue that a significant number of datasets, in particular those containing both data and code files (about 20% of the sample), would be more accurately described as a Replication resource metadata type. Such resource type would be particularly useful in facilitating research reproducibility.

http://www.ijdc.net/article/view/833

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Persistent Identifiers — Risks and Trust Related Issues Explored with New Knowledge Exchange Report and Case Studies"

As part of the work around Risks and Trust in pursuit of a well-functioning PID infrastructure for research, this Knowledge Exchange report examines the complex PID landscape within its six partner countries and beyond. The benefits of an efficient PID infrastructure and how this is a precondition for research communities impending research agendas, are explained. The report provides an in-depth look at what can go wrong with an unreliable PID service.

https://www.knowledge-exchange.info/news/articles/2-2-23

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Designing Digital Discovery and Access Systems for Archival Description"

Archival description is often misunderstood by librarians, administrators, and technologists in ways that have seriously hindered the development of access and discovery systems. It is not widely understood that there is currently no off-the-shelf system that provides discovery and access to digital materials using archival methods. This article is an overview of the core differences between archival and bibliographic description, and discusses how to design access systems for born-digital and digitized materials using the affordances of archival metadata. It offers a custom indexer as a working example that adds the full text of digital content to an Arclight instance and argues that the extensibility of archival description makes it a perfect match for automated description. Finally, it argues that building archives-first discovery systems allows us to use our descriptive labor more thoughtfully, better enable digitization on demand, and overall make a larger volume of cultural heritage materials available online.

bit.ly/3DhKmcC

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Non-Fungible Token (NFT) in the Academic and Open Access Publishing Environment: Considerations towards Science-Friendly Scenarios"

The article describes the use and possible value creation of Non-Fungible Tokens (NFT) in the academic and open access publishing environment. It defines NFTs, describes disadvantages and possible solutions, especially in the intended scientific environment. An overview of existing NFT service providers from the publishing environment illustrates that there is not yet a suitable one for researchers. Accordingly, three possible scenarios are shown where NFT services could be located in a science-friendly way. One would be with library- or scholarly-led university presses, repositories, and other publication infrastructures (such as OJS or OMP). Another would be to use centralizing and channelling article submission platforms with which universities have contracts, such asChronosHub. The third and broadest approach would be through Digital ObjectIdentifier (DOI) registration agencies such as ChronosHub and DataCite, although complexities come into play here due to the triangular relationship with publishers registering DOIs (some of them having exclusive usage rights transferred to themselves). This complexity could be reduced by registeringNFTs only for open access publications with a Creative Commons Attribution license. A summary and outlook provide an overview of open questions and initial starting points to get started.

https://doi.org/10.3998/jep.2574

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Digital Books Wear Out Faster than Physical Books"

Mega-publishers are saying electronic books do not wear out, but this is not true at all. The Internet Archive processes and reprocesses the books it has digitized as new optical character recognition technologies come around, as new text understanding technologies open new analysis, as formats change from djvu to daisy to epub1 to epub2 to epub3 to pdf-a and on and on. This takes thousands of computer-months and programmer-years to do this work. This is what libraries have signed up for—our long-term custodial roles.

Also, the digital media they reside on changes, too—from Digital Linear Tape to PATA hard drives to SATA hard drives to SSDs. If we do not actively tend our digital books they become unreadable very quickly.

Then there is cataloging and metadata. If we do not keep up with the ever-changing expectations of digital learners, then our books will not be found. This is ongoing and expensive.

https://cutt.ly/VMTDGdL

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |

"Clarivate and OCLC Settle Lawsuit"

Clarivate continues to deny OCLCs allegations of wrong-doing and maintains that the issue lay between OCLC and its customers, who sought to co-create an efficient community platform for sharing of bibliographic records. Clarivate will not develop a record exchange system of MARC records that include records which OCLC has claimed are subject to its policy and contractual limitations. Clarivate will bear its own fees and costs.

https://cutt.ly/vN3RDOx

| Research Data Publication and Citation Bibliography | Research Data Sharing and Reuse Bibliography | Research Data Curation and Management Bibliography | Digital Scholarship |