AI Is Running Out of New Training Data: Consent in Crisis: The Rapid Decline of the AI Data Commons


General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. . . .Our longitudinal analyses show that in a single year (2023-2024) there has been a rapid crescendo of data restrictions from web sources, rendering ~5%+ of all tokens in C4, or 28%+ of the most actively maintained, critical sources in C4, fully restricted from use. For Terms of Service crawling restrictions, a full 45% of C4 is now restricted. If respected or enforced, these restrictions are rapidly biasing the diversity, freshness, and scaling laws for general-purpose AI systems.

https://tinyurl.com/4k56axzk

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Digital Initiatives Librarian at University of Utah


The Digital Library Services Division at the J. Willard Marriott Library seeks a detail-oriented and collaborative individual to create metadata for digital collections, manage our digital exhibits program, and share their metadata expertise within the library and our digital exhibit partners. This person joins a team dedicated to creating descriptive metadata for the long-standing and innovative Digital Library program at the Marriott Library. The library also has engaging collaboration opportunities with Special Collections, our research data program, digital scholarship center, Digital Matters, and more.

Job Ad

| Digital Library Jobs |
| Electronic Resources Jobs |
| Library IT Jobs |
| Digital Scholarship |

"STM Statement Regarding Unlicensed Use of STM’s Members’ Content in the Training, Development, and Operation of AI Models"


The unlicensed use of STM’s members’ content in the training, development, and operation of AI models is of great concern to STM and to our members. Because STM’s members do not share a single jurisdiction, the particular actions and practices of a given AI developer with respect to a given domestic copyright law are too varied to enumerate here. However, regardless of legal nuances among jurisdictions, STM considers the conclusion to be the same — the collection of our members’ content and its use in AI training without authorization, compensation or attribution, amounts to infringement. We support the statements about third parties’ use of content in generative AI training and development that have been made by our sister organizations the International Publishers Association and the UK Publishers Association.

https://tinyurl.com/5n6zh9sy

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Scholarly Publishing Librarian at University of Pittsburgh


Through a combination of instruction, consultation, and outreach activities, the position provides leadership and expertise within the University Library System and for researchers and authors across the University of Pittsburgh in scholarly publication, copyright, open-access publishing, and principles of open scholarship. The position also provides operational oversight for the library’s publishing platforms and associated services, including, but not limited to, Open Journal Systems (OJS) for journals and Omeka.net for user-generated digital collections and exhibits. In this capacity, the Scholarly Publishing Librarian communicates with users of these platforms and evaluates new publishing proposals.

Job Ad

| Digital Library Jobs |
| Electronic Resources Jobs |
| Library IT Jobs |
| Digital Scholarship |

"Google’s Wrong Answer to the Threat of AI — Stop Indexing Content"


"Google is no longer trying to index the entire web," writes Schmalbach [Vincent Schmalbach, SEO expert]. "In fact, it’s become extremely selective, refusing to index most content. This isn’t about content creators failing to meet some arbitrary standard of quality. Rather, it’s a fundamental change in how Google approaches its role as a search engine." The default setting from now on will be not to index content unless it is genuinely unique, authoritative and has ‘brand recognition’.

https://tinyurl.com/32t98fhu

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Digital Collections Specialist at College of the Holy Cross


The Digital Collections Specialist primarily supports the digital activities of Archives & Distinctive Collections (ADC). Tasks include digitization, description, preservation and uploading content to various digital platforms. They also will assist with the creation of in-person and digital exhibits, contribute to the Holy Cross Libraries’ social media accounts as well as staff the Reading Room’s desk and respond to reference requests.

Job Ad

| Digital Library Jobs |
| Electronic Resources Jobs |
| Library IT Jobs |
| Digital Scholarship |

"Attitudes on Data Reuse Among Internal Medicine Residents"


Results: We surveyed a population of 162 residents, and 67 residents responded, representing a 41.36% response rate. Strong majorities of residents exhibited positive views of secondary data analysis. Moreover, in our sample, those with exposure to secondary data analysis research opined that secondary data analysis takes less time and is less difficult to conduct compared to the other residents without curricular exposure to secondary analysis.

Discussion: The survey reflects that residents believe secondary data analysis is worthwhile and this highlights opportunities for data librarians. As current residents matriculate into professional roles as clinicians, educators, and researchers, libraries have an opportunity to bolster support for data curation and education.

https://doi.org/10.5195/jmla.2024.1772

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Web & Electronic Resources Librarian at Norwich University


Supports the mission of Norwich University by managing the maintaining the library’s website, providing access to the library’s electronic resources, and compiling usage and other statistical reports. Participates as a member of the team providing reference and instruction in a broad range of subject areas.

Job Ad

| Digital Library Jobs |
| Electronic Resources Jobs |
| Library IT Jobs |
| Digital Scholarship |

"The Societal Impact of Open Science: A Scoping Review"


Open Science (OS) aims, in part, to drive greater societal impact of academic research. Government, funder and institutional policies state that it should further democratize research and increase learning and awareness, evidence-based policy-making, the relevance of research to society’s problems, and public trust in research. Yet, measuring the societal impact of OS has proven challenging and synthesized evidence of it is lacking. This study fills this gap by systematically scoping the existing evidence of societal impact driven by OS and its various aspects, including Citizen Science (CS), Open Access (OA), Open/FAIR Data (OFD), Open Code/Software and others. Using the PRISMA Extension for Scoping Reviews and searches conducted in Web of Science, Scopus and relevant grey literature, we identified 196 studies that contain evidence of societal impact. The majority concern CS, with some focused on OA, and only a few addressing other aspects. Key areas of impact found are education and awareness, climate and environment, and social engagement. We found no literature documenting evidence of the societal impact of OFD and limited evidence of societal impact in terms of policy, health, and trust in academic research. Our findings demonstrate a critical need for additional evidence and suggest practical and policy implications.

https://doi.org/10.1098/rsos.240286

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Associate Vice Provost for Collections & Scholarly Communications at University of Pennsylvania


The University of Pennsylvania Libraries invites applications for the position of Gershwind & Bennett Family Associate Vice Provost for Collections & Scholarly Communications. This senior strategic leadership role, reporting directly to the H. Carton Rogers III Vice Provost and Director of Libraries, is pivotal in overseeing a wide array of outward-facing services. These encompass academic and student engagement, research services, community engagement, collection strategy, scholarly communications, and the administration of eleven departmental libraries and centers that serve professional schools and specific subject areas.

Job Ad

| Digital Library Jobs |
| Electronic Resources Jobs |
| Library IT Jobs |
| Digital Scholarship |

"Appeals Court Hears Internet Archive Copyright Case "


At a lengthy June 28 hearing in New York, a three-judge panel of the U.S. Court of Appeals for the Second Circuit heard oral arguments in the Internet Archive’s appeal of a March 2023 court decision finding its program to scan and lend print library books to be copyright infringement. And while the court clearly appeared skeptical of the Internet Archive’s arguments, the panel was deeply engaged and well-prepared, peppering both sides with a wide array of questions.

https://tinyurl.com/4nkf3cdp

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Electronic Resources Coordinator-CARLI (Remote)


The Electronic Resources Coordinator supports library resources and software serving the member libraries of the Consortium of Academic and Research Libraries of Illinois (CARLI). In support of I-Share (currently Alma and Primo VE), electronic resources, and other CARLI services, the Electronic Resources Coordinator works with CARLI’s member library staff as well as other CARLI staff to troubleshoot, support and assist projects, services, and non-routine projects related to electronic resources as well as develop continuing education programs, consortial meetings in support of CARLI’s strategic plan.

Job Ad

| Digital Library Jobs |
| Electronic Resources Jobs |
| Library IT Jobs |
| Digital Scholarship |

"ARL & CNI Release Deluxe Edition of AI-Influenced Future Scenarios for Research Environment"


This Deluxe Edition of the ARL/CNI AI Scenarios includes:

  • The Final Scenario Set: This final scenario set explores potential futures where AI plays a pivotal role, providing critical insights into the evolving challenges and opportunities for the research environment.
  • The Strategic Context Report: This report summarizes community feedback gathered through focus groups and interviews about an AI-influenced future for the research environment that were held in winter 2023–24 and spring 2024.
  • The Provocateur Interview Report: Featuring forward-thinking dialogues with industry leaders, these interviews challenge conventional wisdom and stimulate stretch thinking with regards to an AI-influenced future.

https://tinyurl.com/5n7xwc8c

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Electronic Resources Librarian at Georgia Institute of Technology


Reporting to the Head of Technical Services this position will serve as the Service Owner for E-Resource management and participate in strategic decision making as a core member of various internal and external committees. The Electronic Resources Librarian will provide expert advice regarding e-resource acquisition and management to inform and support the Georgia Tech Library collection development strategy and be a responsible steward.

Job Ad

| Digital Library Jobs |
| Electronic Resources Jobs |
| Library IT Jobs |
| Digital Scholarship |

"The Oligopoly of Academic Publishers Persists in Exclusive Database"


Global scholarly publishing has been dominated by a small number of publishers for several decades. We aimed to revisit the debate on corporate control of scholarly publishing by analyzing the relative shares of major publishers and smaller, independent publishers. Using the Web of Science, Dimensions and OpenAlex, we managed to retrieve twice as many articles indexed in Dimensions and OpenAlex, compared to the rather selective Web of Science. As a result of excluding smaller publishers, the ‘oligopoly’ of scholarly publishers persists, at least in appearance, according to the Web of Science. However, both Dimensions’ and OpenAlex’ inclusive indexing revealed the share of smaller publishers has been growing rapidly, especially since the onset of large-scale online publishing around 2000, resulting in a current cumulative dominance of smaller publishers. While the expansion of small publishers was most pronounced in the social sciences and humanities, the natural and medical sciences showed a similar trend. A major geographical divergence is also revealed, with some countries, mostly Anglo-Saxon and/or located in northwestern Europe, relying heavily on major publishers for the dissemination of their research, while others being relatively independent of the oligopoly, such as those in Latin America, northern Africa, eastern Europe and parts of Asia. The emergence of digital publishing, the reduction of expenses for printing and distribution and open-source journal management tools may have contributed to the emergence of small publishers, while the development of inclusive bibliometric databases has allowed for the effective indexing of journals and articles. We conclude that enhanced visibility to recently created, independent journals may favour their growth and stimulate global scholarly bibliodiversity.

https://arxiv.org/abs/2406.17893

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Director, National Agricultural Library


  • Directs, oversees, and defends the development and execution of Agency program goals and resource requirements to the Department, OMB, and Congressional officials.
  • Facilitating cooperation and coordination for the agricultural libraries of colleges, universities, USDA, in conjunction with private industry and other research libraries.
  • Responsible for providing leadership and direction in the formulation, implementation and evaluation of the development and execution of broad programs of library and technical information services.
  • Overseeing the application of advanced computer and telecommunications technology for the worldwide collection, evaluation, and dissemination of specialized information in the agricultural and related sciences.

Job Ad

| Digital Library Jobs |
| Electronic Resources Jobs |
| Library IT Jobs |
| Digital Scholarship |

Research Data Alliance: Recommendations on Open Science Rewards and Incentives


Open Science contributes to the collective building of scientific knowledge and societal progress. However, academic research currently fails to recognise and reward efforts to share research outputs. Yet it is crucial that such activities be valued, as they require considerable time, energy, and expertise to make scientific outputs usable by others, as stated by the FAIR principles. To address this challenge, several bottom-up and top-down initiatives have emerged to explore ways to assess and credit Open Science activities (e.g., Research Data Alliance, RDA) and to promote the assessment of a broad spectrum of research outputs, including datasets and software (e.g., Coalition for Advancement of Research Assessment, CoARA). As part of the RDA-SHARC (SHAring Rewards and Credit) interest group, we have developed a set of recommendations to help implement various rewarding schemes at different levels. The recommendations target a broad range of stakeholders. For instance, institutions are encouraged to provide digital services and infrastructure, organise training and cover expenses associated with making data available for the community. The funders should establish policies requiring open access to data produced by funded research and provide corresponding support. The publishers should favour open peer-review models and open access to articles, data and software. Government policymakers should set up a comprehensive Open Science strategy, as recommended by UNESCO and followed by a growing number of countries. The present work details different measures that are proposed to the stakeholders. The need to include sharing activities in research evaluation schemes as an overarching mechanism to promote Open Science practices is specifically emphasised.

https://tinyurl.com/4rhk44mn

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Systems & Web Services Librarian at Carroll College


  • Serving as the technical expert for the Library’s Integrated Library System (ILS); maintaining and enhancing its operations as well as applications that support its services;
  • Providing leadership in solving problems associated with the ILS and other services;
  • Develop and document internal processes and procedures to support operations within Library Infrastructure Systems, including but not limited to the ILS, EZProxy, Worldshare Interlibrary Loan (OCLC), the Institutional Repository (Carroll Scholars), and the Carroll Digital Archives);

Job Ad

| Digital Library Jobs |
| Electronic Resources Jobs |
| Library IT Jobs |
| Digital Scholarship |

Electronic Resources Librarian at East Stroudsburg University


  • Performs all management, coordination, collection development, and troubleshooting for the library’s electronic resources, including but not limited to databases, journals, and eBooks.
  • Working from East Stroudsburg University’s campus, the Electronic Resources Librarian is responsible for maintaining the integrated library system (Alma), the discovery layer (Primo), and other electronic subscriptions.
  • Implement, troubleshoot, and maintain consistent and reliable operation, delivery, and access to the library’s ILS system, discovery layer, databases, and aggregate electronic resources.

Job Ad

| Digital Library Jobs |
| Electronic Resources Jobs |
| Library IT Jobs |
| Digital Scholarship |

"Effects of Research Paper Promotion via ArXiv and X"


In the evolving landscape of scientific publishing, it is important to understand the drivers of high-impact research, to equip scientists with actionable strategies to enhance the reach of their work, and to understand trends in the use of modern scientific publishing tools to inform their further development. Here, we study trends in the use of early preprint publications and revisions on ArXiv and the use of X (formerly Twitter) for promotion of such papers in computer science and physics. We find that early submissions to ArXiv and promotion on X have soared in recent years. Estimating the effect that the use of each of these modern affordances has on the number of citations of scientific publications, we find that peer-reviewed conference papers in computer science that are submitted early to ArXiv gain on average 21.1±17.4 more citations, revised on ArXiv gain 18.4±17.6 more citations, and promoted on X gain 44.4±8 more citations in the first 5 years from an initial publication. In contrast, journal articles in physics experience comparatively lower boosts in citation counts, with increases of 3.9±1.1, 4.3±0.9, and 6.9±3.5 citations respectively for the same interventions. Our results show that promoting one’s work on ArXiv or X has a large impact on the number of citations, as well as the number of influential citations computed by Semantic Scholar, and thereby on the career of researchers. These effects are present also for publications in physics, but they are relatively smaller. The larger relative effect sizes, effects of promotion accumulating over time, and elevated unpredictability of the number of citations in computer science than in physics suggest a greater role of world-of-mouth spreading in computer science than in physics.

https://arxiv.org/abs/2401.11116v2

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Electronic Resources and Collections Development Librarian at Ball State University


Perform activities related to developing and sustaining University Libraries’ electronic and physical resource collections to meet user needs, including material selection, deselection, preservation, evaluation and assessment; coordinate pre-order activities for new e-resource acquisition, including trials and licensing; serve as collection development liaison for assigned academic departments; support an inclusive, diverse, and collaborative work environment and service culture to advance the strategic directions of the University Libraries.

Job Ad

| Digital Library Jobs |
| Electronic Resources Jobs |
| Library IT Jobs |
| Digital Scholarship |

"A Real-World Test of Artificial Intelligence Infiltration of a University Examinations System: A ‘Turing Test’ Case Study"


The recent rise in artificial intelligence systems, such as ChatGPT, poses a fundamental problem for the educational sector. In universities and schools, many forms of assessment, such as coursework, are completed without invigilation. Therefore, students could hand in work as their own which is in fact completed by AI. Since the COVID pandemic, the sector has additionally accelerated its reliance on unsupervised ‘take home exams’. If students cheat using AI and this is undetected, the integrity of the way in which students are assessed is threatened. We report a rigorous, blind study in which we injected 100% AI written submissions into the examinations system in five undergraduate modules, across all years of study, for a BSc degree in Psychology at a reputable UK university. We found that 94% of our AI submissions were undetected. The grades awarded to our AI submissions were on average half a grade boundary higher than that achieved by real students. Across modules there was an 83.4% chance that the AI submissions on a module would outperform a random selection of the same number of real student submissions.

https://doi.org/10.1371/journal.pone.0305354

| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |