You Better Be Good, You Better Not Copy

The Wall Street Journal reports that Attributor Corp "has begun testing a system to scan the billions of pages on the Web for clients’ audio, video, images and text—potentially making it easier for owners to request that Web sites take content down or provide payment for its use."

The company will use specialized digital fingerprinting technology in its copy detection service, which will become available in the first quarter of 2007. By the end of December, it will have about 10 billion Web pages in its detection index.

An existing competing service, Copyscape, offers both free and paid copy detection.

Source: Delaney, Kevin J. "Copyright Tool Will Scan Web For Violations." The Wall Street Journal, 18 December 2006, B1.

Hear Luminaries Interviewed at the 2006 Fall CNI Task Force Meeting

Matt Pasiewicz and CNI have made available digital audio interviews with a number of prominent attendees at the 2006 Fall CNI Task Force Meeting. Selected interviews are below. More are available on Pasiewicz’s blog.

Version 66, Scholarly Electronic Publishing Bibliography

Version 66 of the Scholarly Electronic Publishing Bibliography is now available. This selective bibliography presents over 2,830 articles, books, and other printed and electronic sources that are useful in understanding scholarly electronic publishing efforts on the Internet.

The SEPB URL has changed:


There is a mirror site at:

The Scholarly Electronic Publishing Weblog URL has also changed:


There is a mirror site at:

The SEPW RSS feed is unaffected.

Changes in This Version

The bibliography has the following sections (revised sections are marked with an asterisk):

Table of Contents

1 Economic Issues*
2 Electronic Books and Texts
2.1 Case Studies and History*
2.2 General Works*
2.3 Library Issues*
3 Electronic Serials
3.1 Case Studies and History*
3.2 Critiques
3.3 Electronic Distribution of Printed Journals*
3.4 General Works*
3.5 Library Issues*
3.6 Research*
4 General Works*
5 Legal Issues
5.1 Intellectual Property Rights*
5.2 License Agreements*
6 Library Issues
6.1 Cataloging, Identifiers, Linking, and Metadata*
6.2 Digital Libraries*
6.3 General Works*
6.4 Information Integrity and Preservation*
7 New Publishing Models*
8 Publisher Issues*
8.1 Digital Rights Management*
9 Repositories, E-Prints, and OAI*
Appendix A. Related Bibliographies
Appendix B. About the Author*
Appendix C. SEPB Use Statistics

Scholarly Electronic Publishing Resources includes the following sections:

Cataloging, Identifiers, Linking, and Metadata
Digital Libraries*
Electronic Books and Texts*
Electronic Serials
General Electronic Publishing
Repositories, E-Prints, and OAI*
SGML and Related Standards

Further Information about SEPB

The HTML version of SEPB is designed for interactive use. Each major section is a separate file. There are links to sources that are freely available on the Internet. It can be searched using a Google Search Engine. Whether the search results are current depends on Google’s indexing frequency.

In addition to the bibliography, the HTML document includes:

(1) Scholarly Electronic Publishing Weblog (biweekly list of new resources; also available by e-mail—see second URL—and RSS Feed—see third URL)

(2) Scholarly Electronic Publishing Resources (directory of over 270 related Web sites)

(3) Archive (prior versions of the bibliography)

The 2005 annual PDF file is designed for printing. The printed bibliography is over 210 pages long. The PDF file is over 560 KB.

Related Article

An article about the bibliography has been published in The Journal of Electronic Publishing:

Scholarly Electronic Publishing Weblog Update (12/18/06)

The latest update of the Scholarly Electronic Publishing Weblog (SEPW) is now available, which provides information about new scholarly literature and resources related to scholarly electronic publishing, such as books, journal articles, magazine articles, newsletters, technical reports, and white papers. Especially interesting are: The Complete Copyright Liability Handbook for Librarians and Educators, "Copyright Concerns in Online Education: What Students Need to Know," Digital Archiving: From Fragmentation to Collaboration, "Fixing Fair Use," "Mass Digitization of Books," MLA Task Force on Evaluating Scholarship for Tenure and Promotion, "Open Access: Why Should We Have It?," "Predictions for 2007," "Readers’ Attitudes to Self-Archiving in the UK," "The Rejection of D-Space: Selecting Theses Database Software at the University of Calgary Archives," "Taming the Digital Beast," and Understanding Knowledge as a Commons: From Theory to Practice.

The SEPW URL has changed. Use:


There is a mirror site at:

The RSS feed is unaffected.

For weekly updates about news articles, Weblog postings, and other resources related to digital culture (e.g., copyright, digital privacy, digital rights management, and Net neutrality), digital libraries, and scholarly electronic publishing, see the latest DigitalKoans Flashback posting.

Lessig’s Code: Version 2.0 Is Published

Lawrence Lessig’s Code: Version 2.0 is out. This update of the now classic Code and Other Laws of Cyberspace was written using a Wiki, with Lessig editing and refining that digital text.

The resulting book is under a Creative Commons Attribution-ShareAlike 2.5 License.

It can be freely downloaded in PDF form. Later, the final version of the book will be available on a second Wiki.

MLA Task Force on Evaluating Scholarship for Tenure and Promotion Report

The MLA Task Force on Evaluating Scholarship for Tenure and Promotion has issued an important report. (The MLA is the Modern Language Association of America.)

Here’s some background on the report from its Executive Summary:

In 2004 the Executive Council of the Modern Language Association of America created a task force to examine current standards and emerging trends in publication requirements for tenure and promotion in English and foreign language departments in the United States. The council’s action came in response to widespread anxiety in the profession about ever-rising demands for research productivity and shrinking humanities lists by academic publishers, worries that forms of scholarship other than single-authored books were not being properly recognized, and fears that a generation of junior scholars would have a significantly reduced chance of being tenured. The task force was charged with investigating the factual basis behind such concerns and making recommendations to address the changing environment in which scholarship is being evaluated in tenure and promotion decisions.

The task force made 20 key recommendations, including:

3. The profession as a whole should develop a more capacious conception of scholarship by rethinking the dominance of the monograph, promoting the scholarly essay, establishing multiple pathways to tenure, and using scholarly portfolios. . . .

4. Departments and institutions should recognize the legitimacy of scholarship produced in new media, whether by individuals or in collaboration, and create procedures for evaluating these forms of scholarship. . . .

15. The task force encourages further study of the unfulfilled parts of its charge with respect to multiple submissions of manuscripts and comparisons of the number of books published by university presses between 1999 and 2005.

16. The task force recommends establishing concrete measures to support university presses. . . .

19. The task force encourages discussion of the current form of the dissertation (as a monograph-in-progress) and of the current trends in the graduate curriculum.

Creative Commons Web Site Makeover and CC Labs

The Creative Commons has redone its Web site using WordPress and added a new feature: CC Labs, which features development projects.

Current projects include the DHTML License Chooser, the Freedoms License Generator, and the Metadata Lab. (Consulting the Creative Commons Licenses page before using these tools will give you a preview of your license options.)

The symbols used to represent the CC licenses have changed. For example, here’s the Creative Commons Attribution-NonCommercial 2.5 License symbol.

Creative Commons License

Read more about these changes in Lawrence Lessig’s blog posting.

STARGATE Final Report and Tools

The STARGATE project has issued its final report. Here’s a brief summary of the project from the Executive Summary:

STARGATE (Static Repository Gateway and Toolkit) was funded by the Joint Information Systems Committee (JISC) and is intended to demonstrate the ease of use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Static Repository technology, and the potential benefits offered to publishers in making their metadata available in this way This technology offers a simpler method of participating in many information discovery services than creating fully-fledged OAI-compliant repositories. It does this by allowing the infrastructure and technical support required to participate in OAI-based services to be shifted from the data provider (the journal) to a third party and allows a single third party gateway provider to provide intermediation for many data providers (journals).

To support the its work, the project developed tools and supporting documentation, which can be found below:

Details on Open Repositories 2007 Talks

Details about the Open Repositories 2007 conference sessions are now available, including keynotes, poster sessions, presentations, and user groups. For DSpace, EPrints, and Fedora techies, the user group sessions look like a don’t miss with talks by luminaries such as John Ockerbloom and MacKenzie Smith. The presentations sessions include talks by Andrew Treloar, Carl Lagoze and Herbert Van de Sompel, Leslie Johnston, Simeon Warner among other notables. Open Repositories 2007 will be held in San Antonio, January 23-26.

Hopefully, the conference organizers plan to make streaming audio and/or video files available post-conference, but PowerPoints, as was the case for Open Repositories 2006, would also be useful.

International Journal of Digital Curation Launched

The Digital Curation Centre has launched the International Journal of Digital Curation, which will be published twice a year in digital form (articles are PDF files). It is edited by Richard Waller, who also edits Ariadne. It is published by UKOLN at the University of Bath, using Open Journal Systems.

The journal is freely available. Although individual articles in the first volume do not have copyright statements, the Submissions page on the journal Web site has the following copyright statement:

Copyright for articles published in this journal is retained by the authors, with first publication rights granted to the University of Bath. By virtue of their appearance in this open access journal, articles are free to use, with proper attribution, in educational and other non-commercial settings.

The first issue includes "Digital Curation, Copyright, and Academic Research"; "Digital Curation for Science, Digital Libraries, and Individuals"; "Scientific Publication Packages—A Selective Approach to the Communication and Archival of Scientific Output"; and other articles.

Digital Preservation via Emulation at Koninklijke Bibliotheek

In a two-year (2005-2007) joint project with Nationaal Archief of the Netherlands, Koninklijke Bibliotheek is developing an emulation system that will allow digital objects in outmoded formats to be utilized in their original form. Regarding the emulation approach, the Koninklijke Bibliotheek says:

Emulation is difficult, the main reason why it is not applied on a large scale. Developing an emulator is complex and time-consuming, especially because the emulated environment must appear authentic en must function accurately as well. When future users are interested in the contents of a file, migration remains the better option. When it is the authentic look and feel and functionality of a file they are after, emulation is worth the effort. This can be the case for PDF documents or websites. For multimedia applications, emulation is in fact the only suitable permanent access strategy.

J. R. van der en Wijngaarden Hoeven’s paper "Modular Emulation as a Long-Term Preservation Strategy for Digital Objects" provides a overview of the emulation approach.

In a related development, a message to padiforum-l on 11/17/06 by Remco Verdegem of the Nationaal Archief of the Netherlands reported on a recent Emulation Expert Meeting, which issued a statement noting the following advantages of emulation for digital preservation purposes:

  • It preserves and permits access to each digital artifact in its original form and format; it may be the only viable approach to preserving digital artifacts that have significant executable and/or interactive behavior.
  • It can preserve digital artifacts of any form or format by saving the original software environments that were used to render those artifacts. A single emulator can preserve artifacts in a vast range of arbitrary formats without the need to understand those formats, and it can preserve huge corpuses without ever requiring conversion or any other processing of individual artifacts.
  • It enables the future generation of surrogate versions of digital artifacts directly from their original forms, thereby avoiding the cumulative corruption that would result from generating each such future surrogate from the previous one.
  • If all emulators are written to run on a stable, thoroughly-specified "emulation virtual machine" (EVM) platform and that virtual machine can be implemented on any future computer, then all emulators can be run indefinitely.

Scholarly Electronic Publishing Weblog (11/20/06)

The latest update of the Scholarly Electronic Publishing Weblog (SEPW) is now available, which provides information about new scholarly literature and resources related to scholarly electronic publishing, such as books, journal articles, magazine articles, newsletters, technical reports, and white papers. Especially interesting are: "Author Addenda: An Examination of Five Alternatives"; "Building Preservation Environments with Data Grid Technology"; "Improving Access to Research Results: Six Points"; "Improving Access to Research Results: What’s in It for the Institution? Can We Make the Case?"; "Is There a Viable Business Model for Commercial Open Access Publishing?"; "Library Access to Scholarship"; "The Open Access Movement in China"; and "Standards-Based Interfaces for Harvesting and Obtaining Assets from Digital Repositories."

For weekly updates about news articles, Weblog postings, and other resources related to digital culture (e.g., copyright, digital privacy, digital rights management, and Net neutrality), digital libraries, and scholarly electronic publishing, see the latest DigitalKoans Flashback posting.

Under the Hood of PLoS ONE: The Open Source TOPAZ E-Publishing System

PLoS is building its innovative PLoS ONE e-journal, which will incorporate both traditional and open peer review, using the open source TOPAZ software. (For a detailed description of the PLoS ONE peer review process, check out "ONE for All: The Next Step for PLoS.")

What is TOPAZ? It’s Web site doesn’t provide specifics, but "PLoS ONE—Technical Background" by Richard Cave does:

The core of TOPAZ is a digital information repository called Fedora (Flexible Extensible Digital Object Repository Architecture). Fedora is an Open Source content management application that supports the creation and management of digital objects. The digital objects contain metadata to express internal and external relationships in the repository, like articles in a journal or the text, images and video of an article. This relationship metadata can also be search using a semantic web query languages. Fedora is jointly developed by Cornell University’s computer science department and the University of Virginia Libraries.

The metastore Kowari will be used with Fedora to support Resource Description Framework (RDF) metadata within the repository.

The PLoS ONE web interface will be built with AJAX. Client-side APIs will create the community features (e.g. annotations, discussion threads, ratings, etc.) for the website. As more new features are available on the TOPAZ architecture, we will launch them on PLoS ONE.

There was a TOPAZ Wiki at PLoS. It’s gone, but it’s pages are still cached by Google. The Wiki suggests that TOPAZ is likely to support Atom/RSS feeds, full-text search, and OAI-PMH among other possible features.

For information about other open source e-journal publishing systems, see "Open Source Software for Publishing E-Journals."

Results from the DSpace Community Survey

DSpace conducted an informal survey of its open source community in October 2006. Here are some highlights:

  • The vast majority of respondents (77.6%) used or planned to use DSpace for a university IR.
  • The majority of systems were in production (53.4%); pilot testing was second (35.3%).
  • Preservation and interoperability were the highest priority system features (61.2% each), followed by search engine indexing (57.8%) and open access to refereed articles (56.9%). (Percentage of respondents who rated these features "very important.") Only 5.2% thought that OA to refereed articles was unimportant.
  • The most common type of current IR content was refereed scholarly articles and theses/dissertations (55.2% each), followed by other (48.6%) and grey literature (47.4%).
  • The most popular types of content that respondents were planning to add to their IRs were datasets (53.4%), followed by audio and video (46.6% each).
  • The most frequently used type of metadata was customized Dublin Core (80.2%), followed by XML metadata (13.8%).
  • The most common update pattern was to regularly migrate to new versions; however it took a "long time to merge in my customizations/configuration" (44.8%).
  • The most common types of modification were minor cosmetics (34.5%), new features (26.7%), and significant user interface customization (21.6%).
  • Only 30.2% were totally comfortable with editing/customizing DSpace; 56.9% were somewhat comfortable and 12.9% were not comfortable.
  • Plug-in use is light: for example, 11.2% use SRW/U, 8.6% use Manakin, and 5.2% use TAPIR (ETDs).
  • The most desired feature for the next version is a more easily customized user interface (17.5%), closely followed by improved modularity (16.7%).

For information about other recent institutional repository surveys, see "ARL Institutional Repositories SPEC Kit" and "MIRACLE Project’s Institutional Repository Survey."

QuickTime Videos and PowerPoints from the Transforming Scholarly Communication Symposium

When I was chairing the Scholarly Communications Public Relations Task Force at the UH Libraries, the task force initiated a series of projects to increase awareness of key issues on the UH campus under the name "Transforming Scholarly Communication": a Website, a Weblog, and a symposium.

I’m pleased to announce that both the PowerPoint presentations and the QuickTime videos of the symposium speeches are now available. Thanks again to our speaker panel for participating in this event.

Ray English, Director of Libraries at Oberlin College and Chair of the SPARC Steering Committee, kicked things off with a talk on "The Crisis in Scholarly Communication" (PowerPoint, QuickTime Video, and "Sites and Cites for the Struggle: A Selective Scholarly Communication Bibliography").

Next, Corynne McSherry, Staff Attorney at the Electronic Frontier Foundation and author of Who Owns Academic Work?: Battling for Control of Intellectual Property, spoke on "Copyright in Cyberspace: Defending Fair Use" (PowerPoint and QuickTime Video).

Finally, Peter Suber, Research Professor of Philosophy at Earlham College, Senior Researcher at the Scholarly Publishing and Academic Resources Coalition (SPARC), and the Open Access Project Director at Public Knowledge, discussed "What Is Open Access?" (PowerPoint and QuickTime Video).

Statement about My Resignation for Library Journal

Library Journal contacted me about my resignation. I declined an interview, but I did issue the below statement:

During my thirty-one-year career, I have always viewed myself as a technological change agent. In the current environment, academic libraries must make difficult resource allocation choices between maintaining print collections, supporting ever-growing collections of licensed electronic resources, and fostering new modes of scholarly communication. There is no universal "right" choice. Each library must realistically make its own decision about what the right mix of these activities is in light of unique local circumstances. At this stage of my life, I believe that I can best serve my particular passions in the realm of scholarly communication and digital libraries elsewhere, although I am grateful for the support I have received at the University of Houston Libraries from many colleagues, both past and present, and I am especially grateful to Robin. N. Downes, former Director of the UH Libraries. For those interested in following my continued digital publishing activities, they can do so at

Scholarly Electronic Publishing Weblog (11/6/06)

The latest update of the Scholarly Electronic Publishing Weblog (SEPW) is now available, which provides information about new scholarly literature and resources related to scholarly electronic publishing, such as books, journal articles, magazine articles, newsletters, technical reports, and white papers. Especially interesting are: "Building an Information Infrastructure in the UK," "Considering a Marketing and Communications Approach for an Institutional Repository," "Creative Commons Licences in Higher and Further Education: Do We Care?," "Examining the Claims of Google Scholar as a Serious Information Source," "Fedora and the Preservation of University Records Project," "The Mandates of October," "The Need to Archive Blog Content," "No-Fee Open-Access Journals," "Risk Assessment and Copyright in Digital Libraries," and To Stand the Test of Time: Long-Term Stewardship of Digital Data Sets in Science and Engineering.

For weekly updates about news articles, Weblog postings, and other resources related to digital culture (e.g., copyright, digital privacy, digital rights management, and Net neutrality), digital libraries, and scholarly electronic publishing, see the latest DigitalKoans Flashback posting.

Scholarly Electronic Publishing Bibliography Changes

I have resigned my position as Assistant Dean for Digital Library Planning and Development at the University of Houston Libraries effective 1/31/07.

Effective immediately, there are several important changes to the Scholarly Electronic Publishing Bibliography (SEPB), Scholarly Electronic Publishing Resources (SEPR), and the Scholarly Electronic Publishing Weblog (SEPW) that users should be aware of:

1. These publications have been moved to my domain:

2. While the UH Libraries will archive SEPB versions up to version 64, no new versions will be published on their Website. If you maintain a catalog record for SEPB, I would ask that you update it with the new address. Next Monday’s SEPW will be published at the new site.

3. A transition version of SEPB (65) has been published at the new site. There are no content changes. This version simply makes a number of HTML coding adjustments needed for the new location. A Google Custom Search Engine replaces the prior search capability. Once Google starts indexing the new site, search results will be from that site.

4. The SEPW mailing list will be discontinued at the end work today. You can continue to get an e-mail version from FeedBurner. I’m sorry for the inconvenience of your having to sign up again; all that is required is your e-mail address.

5. The SEPW RSS feed remains the same.

6. You can continue to follow my digital publishing activities at my domain and at DigitalKoans.

Thanks for your patience during this transition.

SEPW and SEPB Now Searchable Using a Google Custom Search Engine

The Scholarly Electronic Publishing Weblog is now searchable using a Google Custom Search Engine. The new search box is near the bottom of the Weblog’s home page.

The Scholarly Electronic Publishing Bibliography is also now searchable using a Google Custom Search Engine. This will be incorporated into a future version of SEPB. Only the bibliography sections of the document are searchable using this method (e.g., SEPW and SEPR are excluded).

Keep in mind when you search that you will retrieve bibliography section file or Weblog archive file titles with a single representative search result shown from that file. To see all hits, click on the cached page, which shows the retrieved search term(s) in the file highlighted in yellow.

For those who might be interested in including these Google Custom Search Engines in their Web pages, see "Code for Bailey’s Google Search Engines"