OAIster Hits 10,000,000 Records

Excerpt from the press release:

We live in an information-driven world—one in which access to good information defines success. OAIster’s growth to 10 million records takes us one step closer to that goal.

Developed at the University of Michigan’s Library, OAIster is a collection of digital scholarly resources. OAIster is also a service that continually gathers these digital resources to remain complete and fresh. As global digital repositories grow, so do OAIster’s holdings.

Popular search engines don’t have the holdings OAIster does. They crawl web pages and index the words on those pages. It’s an outstanding technique for fast, broad information from public websites. But scholarly information, the kind researchers use to enrich their work, is generally hidden from these search engines.

OAIster retrieves these otherwise elusive resources by tapping directly into the collections of a variety of institutions using harvesting technology based on the Open Archives Initiative (OAI) Protocol for Metadata Harvesting. These can be images, academic papers, movies and audio files, technical reports, books, as well as preprints (unpublished works that have not yet been peer reviewed). By aggregating these resources, OAIster makes it possible to search across all of them and return the results of a thorough investigation of complete, up-to-date resources. . . .

OAIster is good news for the digital archives that contribute material to open-access repositories. "[OAIster has demonstrated that]. . . OAI interoperability can scale. This is good news for the technology, since the proliferation is bound to continue and even accelerate," says Peter Suber, author of the SPARC Open Access Newsletter. As open-access repositories proliferate, they will be supported by a single, well-managed, comprehensive, and useful tool.

Scholars will find that searching in OAIster can provide better results than searching in web search engines. Roy Tennant, User Services Architect at the California Digital Library, offers an example: "In OAIster I searched ‘roma’ and ‘world war,’ then sorted by weighted relevance. The first hit nailed my topic—the persecution of the Roma in World War II. Trying ‘roma world war’ in Google fails miserably because Google apparently searches ‘Rome’ as well as ‘Roma.’ The ranking then makes anything about the Roma people drop significantly, and there is nothing in the first few screens of results that includes the word in the title, unlike the OAIster hit."

OAIster currently harvests 730 repositories from 49 countries on 6 continents. In three years, it has more than quadrupled in size and increased from 6.2 million to 10 million in the past year. OAIster is a project of the University of Michigan Digital Library Production Service.

Orphan Works Challenge Fails

The U.S. Court of Appeals for the Ninth Circuit has denied an appeal of Kahle v. Gonzales, leaving the legal status of orphan works unchanged. The plaintiffs’ attorneys were Jennifer Stisa Granick, Lawrence Lessig, and Christopher Sprigman.

Eric Auchard’s article "U.S. Court Upholds Copyright Law on ‘Orphan Works’" gives an overview of the Ninth’s decision.

The opinion is also available. Here is an excerpt:

Plaintiffs appeal from the district court’s dismissal of their complaint. They allege that the change from an "opt-in" to an "opt-out" copyright system altered a traditional contour of copyright and therefore requires First Amendment review under Eldred v. Ashcroft, 537 U.S. 186, 221 (2003). They also allege that the current copyright term violates the Copyright Clause’s "limited Times" prescription. . . .

Arguments similar to Plaintiffs’ were presented to the Supreme Court in Eldred, which affirmed the constitutionality of the Copyright Term Extension Act against those attacks. The Supreme Court has already effectively addressed and denied Plaintiffs’ arguments. . . .

In March 2004, Plaintiffs Brewster Kahle, Internet Archive, Richard Prelinger, and Prelinger Associates, Inc. filed an amended complaint seeking declaratory judgment and injunctive relief. Brewster Kahle and Internet Archive have built an "Internet library" that offers free access to digitized audio, books, films, websites, and software. Richard Prelinger and Prelinger Associates make digital versions of "ephemeral" films available for free on the internet. Each Plaintiff provides, or intends to provide, access to works that allegedly have little or no commercial value but remain under copyright protection. The difficulty and expense of obtaining permission to place those works on the Internet is overwhelming; ownership of these "orphan" works is often difficult, and sometimes impossible, to ascertain. . . .

Plaintiffs also argue that they should be allowed to present evidence that the present copyright term violates the Copyright Clause’s "limited Times" prescription as the Framers would have understood it. That claim was not directly at issue in Eldred, though Justice Breyer discussed it extensively in his dissent. See Eldred, 537 U.S. at 243. Plaintiffs assert all existing copyrights are effectively perpetual. . . .

Both of Plaintiffs’ main claims attempt to tangentially relitigate Eldred. However, they provide no compelling reason why we should depart from a recent Supreme Court decision.

Creative Commons India to Launch on 1/26/07

The Creative Commons India will be launched on Friday.

From "Creative Commons Readies for India Launch":

Creative Commons-India’s project head Shishir K Jha, assistant professor at the IIT’s Shailesh J. Mehta School of Management, said the project would focus on three specific areas in India.

These are—centres of higher education like the seven IITs, regional technology institutes and management and other institutions. . . .

Creative Commons-India also plans to focus on non-profit and non-governmental organisations and corporates keen on adopting easier-to-share licences for the dissemination of their documents.

Scholarly Electronic Publishing Weblog Update (1/22/07)

The latest update of the Scholarly Electronic Publishing Weblog (SEPW) is now available, which provides information about new scholarly literature and resources related to scholarly electronic publishing, such as books, journal articles, magazine articles, newsletters, technical reports, and white papers. Especially interesting are: "Beyond Google: What Next for Publishing?"; "Copyright, Publishing, and Scholarship: The ‘Zwolle Group’ Initiative for the Advancement of Higher Education"; "Electronic Books and the Humanities: A Survey at the University of Denver"; "E-Prints and Journal Articles in Astronomy: A Productive Co-Existence,"; "Evaluating Research Impact through Open Access to Scholarly Communication"; "If the Academic Library Ceased to Exist, Would We Have to Invent It?"; and Managing Digitization Activities.

For weekly updates about news articles, Weblog postings, and other resources related to digital culture (e.g., copyright, digital privacy, digital rights management, and Net neutrality), digital libraries, and scholarly electronic publishing, see the latest DigitalKoans Flashback posting.

2006 PACS Review Use Statistics

The Public-Access Computer Systems Review (PACS Review) was a freely available e-journal, which I founded in 1989. It allowed authors to retain their copyrights, and it had a liberal copyright policy for noncommercial use. It’s last issue was published in 1998.

In 2006, there were 763,228 successful requests for PACS Review files, 2,091 average successful requests per day, 751,264 successful requests for pages, and 2,058 average successful requests for pages per day. (A request is for any type of file; a page request is for a content file, such as an HTML, PDF, or Word file). These requests came from 41,865 distinct host computers.

The requests came from 134 Internet domains. Leaving aside requests from unresolved numerical addresses, the top 15 domains were: .com (Commercial), .net (Networks), .edu (USA Higher Education), .cz (Czech Republic), .jp (Japan), .ca (Canada), .uk (United Kingdom), .au (Australia), .de (Germany), .nl (Netherlands), .org (Non Profit Making Organizations), .in (India), .my (Malaysia), .it (Italy), and .mx (Mexico). At the bottom were domains such as .ms (Montserrat), .fm (Micronesia), .nu (Niue), .ad (Andorra), and .az (Azerbaijan).

Rounded to the nearest thousand, there had previously been 3.5 million successful requests for PACS Review files.

This is the last time that use statistics will be reported for the PACS Review.

Fedora 2.2 Released

The Fedora Project has released version 2.2 of Fedora.

From the announcement:

This is a significant release of Fedora that includes a complete repackaging of the Fedora source and binary distribution so that Fedora can now be installed as a standalone web application (.war) in any web container. This is a first step in positioning Fedora to fit within a standard "enterprise system" environment. A new installer application makes it easy to setup and run Fedora. Fedora now uses Servlet Filters for authentication. To support digital object integrity, the Fedora repository can now be configured to calculate and store checksums for datastream content. This can be done globally, or on selected datastreams. The Fedora API also provides the ability to check content integrity based on checksums. The RDF-based Resource Index has been tuned for better performance. Also, a new high-performing triplestore, backed by Postgres, has been developed that can be plugged into the Resource Index. Fedora contains many other enhancements and bug fixes.

ScientificCommons.org: Access to Over 13 Million Digital Documents

ScientificCommons.org is an initiative of the Institute for Media and Communications Management at the University of St. Gallen. It indexes both metadata and full-text from global digital repositories. It uses OAI-PMH to identify relevant documents. The full-text documents are in PDF, PowerPoint, RTF, Microsoft Word, and Postscript formats. After being retrieved from their original repository, the documents are cached locally at ScientificCommons.org. It has indexed about 13 million documents from over 800 repositories.

Here are some additional features from the About ScientificCommons.org page:

Identification of authors across institutions and archives: ScientificCommons.org identifies authors and assigns them their scientific publications across various archives. Additionally the social relations between the authors will be extracted and displayed. . . .

Semantic combination of scientific information: ScientificCommons.org structures and combines the scientific data to knowledge areas with Ontology’s. Lexical and statistical methods are used to identify, extract and analyze keywords. Based on this processes ScientificCommons.org classifies the scientific data and uses it e.g. for navigational and weighting purposes.

Personalization services: ScientificCommons.org offers the researchers the possibilities to inform themselves about new publications via our RSS Feed service. They can customize the RSS Feed to a special discipline or even to personalized list of keywords. Furthermore ScientificCommons.org will provide an upload service. Every researcher can upload his publication directly to ScientificCommons.org and assign already existing publications at ScientificCommons.org to his own researcher profile.

New UC Report: The Promise of Value-based Journal Prices and Negotiation

The University of California libraries have released The Promise of Value-based Journal Prices and Negotiation: A UC Report and View Forward.

Here is the report’s abstract:

In pursuit of their scholarly communication agenda, the University of California ten-campus libraries have posited and tested the case that a journal’s institutional price can and should be related to its value to the academic enterprise. We developed and tested a set of metrics that comprise "value-based pricing" of scholarly journals. The metrics are the measurable impact of the journal, the transparent measures of production costs, the institutionally-based contributions to the journal, such as editorial labor, and the transaction efficiencies from consortial purchases. Initial modeling and use of the approaches are promising, leading the libraries to employ and further develop the approaches and share their work to date with the larger community.

This excerpt from press release provides further information:

The report describes a value-based approach that borrows from analysis done by Professors Ted Bergstrom (UC Santa Barbara) and R. Preston McAfee (Caltech) on journal cost-effectiveness (www.journalprices.com). The UC approach also includes suggestions for annual price increases that are tied to production costs; credits for institutionally-based contributions to the journal, such as editorial labor; and credits for business transaction efficiencies from consortial purchases.

Through the report the libraries ask how an explicit method can be established, validated, and communicated for aligning the purchase or license costs of scholarly journals with the value they contribute to the academy and the costs to create and deliver them. In addition to describing the work done to date, the report provides examples of potential cost savings and declares UC’s intention to pursue value-based prices in their negotiations with journal publishers. In addition, the report invites the academic community to work collectively to refine and improve these and other value-based approaches.

The Long Run

Enthusiasm about new technologies is essential to innovation. There needs to be some fire in the belly of change agents or nothing ever changes. Moreover, the new is always more interesting than the old, which creaks with familiarity. Consequently, when an exciting new idea seizes the imagination of innovators and, later, early adopters (using Rogers’ diffusion of innovations jargon), it is only to be expected that the initial rush of enthusiasm can sometimes dim the cold eye of critical analysis.

Let’s pick on Library 2.0 to illustrate the point, and, in particular, librarian-contributed content instead of user-contributed content. It’s an idea that I find quite appealing, but let’s set that aside for the moment.

Overcoming the technical challenges involved, academic library X sets up on-demand blogs and wikis for staff as both outreach and internal communication tools. There is an initial frenzy of activity, and a number of blogs and wikis are established. Subject specialists start blogging. Perhaps the pace is reasonable for most to begin with, although some fall by the wayside quickly, but over time, with a few exceptions, the postings become more erratic and the time between postings increases. It is unclear whether target faculty read the blogs in any great numbers. Internal blogs follow a similar pattern. Some wikis, both internal and external, are quickly populated, but then become frozen by inactivity; others remain blank; others flourish because they serve a vital need.

Is this a story of success, failure, or the grey zone in between?

The point is this. Successful publishing in new media such as blogs and wikis requires that these tools serve a real purpose and that their contributors make a consistent, steady, and never-ending effort. It also requires that the intended audience understand and regularly use the tools and that, until these new communication channels are well-established, the library vigorously promote them because there is a real danger that, if you build it, they will not come.

Some staff will blog their hearts out irregardless of external reinforcement, but many will need to have their work acknowledged in some meaningful way, such as at evaluation, promotion, and tenure decision points. Easily understandable feedback about tool use, such as good blog-specific or wiki-specific log analysis, is important as well to give writers the sense that they are being read and to help them tailor their message to their audience.

On the user side, it does little good to say "Here’s my RSS feed" to a faculty member who doesn’t know what RSS is and could care less. Of course, some will be hip to RSS, but that may not be the majority. If the library wants RSS feeds to become part of a faculty member’s daily workflow, it is going to have to give that faculty member a good reason for it to be so, such as significant, identified RSS feed content in the faculty member’s field. Then, it is going to have to help the faculty member with the RSS transition by pointing out good RSS readers, providing tactful instruction, and offering ongoing assistance.

In spite of the feel-good glow of early success, it may be prudent not to declare victory too soon after making the leap into a major new technology. It’s a real accomplishment, but dealing with technical puzzles is often not the hardest part. The world of computers and code is a relatively ordered and predictable one; the world of humans is far more complex and unpredictable.

The real test of a new technology is in the long run: Is the innovation needed, viable, and sustainable? Major new technologies often require significant ongoing organizational commitments and a willingness to measure success and failure with objectivity and to take corrective action as required. For participative technologies such as Library 2.0 and institutional repositories, it requires motivating users as well as staff to make behavioral changes that persist long after the excitement of the new wears off.

Managing Digitization Activities, SPEC Kit 294

The Association of Research Libraries has published Managing Digitization Activities, SPEC Kit 294. The table of contents and executive summary are freely available.

Here are some highlights from the announcement:

This survey was distributed to the 123 ARL member libraries in February 2006. Sixty-eight libraries (55%) responded to the survey, of which all but two (97%) reported having engaged in digitization activities. Only one respondent reported having begun digitization activities prior to 1992; five other pioneers followed in 1992. From 1994 through 1998 there was a steady increase in the number of libraries beginning digital initiatives; 30 joined the pioneers at the rate of three to six a year. There was a spike of activity at the turn of the millennium that reached a high in 2000, when nine libraries began digital projects. Subsequently, new start-ups have slowed, with only an additional one to five libraries beginning digitization activities each year.

The primary factor that influenced the start up of digitization activities was the availability of grant funding (39 responses or 59%). Other factors that influenced the commencement of these activities were the addition of new staff with related skills (50%), staff receiving training (44%), the decision to use digitization as a preservation option (42%), and the availability of gift monies (29%). . . . .

Only four libraries reported that their digitization activities are solely ongoing functions; the great majority (60 or 91%) reported that their digitization efforts are a combination of ongoing library functions and discrete, finite projects.

Notre Dame Institutional Digital Repository Phase I Final Report

The University of Notre Dame Libraries have issued a report about their year-long institutional repository pilot project. There is an abbreviated HTML version and a complete PDF version.

From the Executive Summary:

Here is the briefest of summaries regarding what we did, what we learned, and where we think future directions should go:

  1. What we did—In a nutshell we established relationships with a number of content groups across campus: the Kellogg Institute, the Institute for Latino Studies, Art History, Electrical Engineering, Computer Science, Life Science, the Nanovic Institute, the Kaneb Center, the School of Architecture, FTT (Film, Television, and Theater), the Gigot Center for Entrepreneurial Studies, the Institute for Scholarship in the Liberal Arts, the Graduate School, the University Intellectual Property Committee, the Provost’s Office, and General Counsel. Next, we collected content from many of these groups, "cataloged" it, and saved it into three different computer systems: DigiTool, ETD-db, and DSpace. Finally, we aggregated this content into a centralized cache to provide enhanced browsing, searching, and syndication services against the content.
  2. What we learned—We essentially learned four things: 1) metadata matters, 2) preservation now, not later, 3) the IDR requires dedicated people with specific skills, 4) copyright raises the largest number of questions regarding the fulfillment of the goals of the IDR.
  3. Where we are leaning in regards to recommendations—The recommendations take the form of a "Chinese menu" of options, and the options are be grouped into "meals." We recommend the IDR continue and include: 1) continuing to do the Electronic Theses & Dissertations, 2) writing and implementing metadata and preservation policies and procedures, 3) taking the Excellent Undergraduate Research to the next level, and 4) continuing to implement DigiTool. There are quite a number of other options, but they may be deemed too expensive to implement.

Blackwell Synergy Based on Literatum Goes Live

Blackwell Publishing has released a new version of Blackwell Synergy, which utilizes Atypon’s Literatum software.

From the press release:

Blackwell Synergy enables its users to search 1 million articles from over 850 leading scholarly journals across the sciences, social sciences, humanities and medicine. The redesign provides easier navigation, faster loading times and improved access to tools for researchers, as well as meeting the latest accessibility standards (ADA section 508 and W3C’s WAI-AA).

Recently, the University of Chicago Press picked Atypon as a technology partner to provide an e-publishing platform for its online journals.

OCLC Openly Informatics Link Evaluator for Firefox

OCLC Openly Informatics has announced a free link checking plug-in for Firefox called Link Evaluator.

Here a brief description from the Link Evaluator page:

Link Evaluator is a Firefox extension designed to help users evaluate the availability of online resources linked to from a given Web page. When started, it automatically follows all links on the current page, and assesses the responses of each URL (link). . . .

After each link is checked, it is highlighted with a color based on the relative success of the result: green for fully successful, shades of yellow for partly successful, and red for unsuccessful.

It requires Mozilla Firefox version 1.5 (or later).

digitalculturebooks

The University of Michigan Press and the Scholarly Publishing Office of the University of Michigan Library, working together as the Michigan Digital Publishing Initiative, have established digitalculturebooks, which offers free access to digital versions of its published works (print works are fee-based). The imprint focuses on "the social, cultural, and political impact of new media."

The objectives of the imprint are to:

  • develop an open and participatory publishing model that adheres to the highest scholarly standards of review and documentation;
  • study the economics of Open Access publishing;
  • collect data about how reading habits and preferences vary across communities and genres;
  • build community around our content by fostering new modes of collaboration in which the traditional relationship between reader and writer breaks down in creative and productive ways.

Library Journal Academic Newswire notes in its article about digitalculturebooks:

While press officials use the term "open access," the venture is actually more "free access" than open at this stage. Open access typically does not require permission for reuse, only a proper attribution. UM director Phil Pochoda told the LJ Academic Newswire that, while no final decision has been made, the press’s "inclination is to ask authors to request the most restrictive Creative Commons license" for their projects. That license, he noted, requires attribution and would not permit commercial use, such as using it in a subsequent for-sale product, without permission. The Digital Culture Books web site currently reads that "permission must be received for any subsequent distribution."

The imprint’s first publication is The Best of Technology Writing 2006.

(Prior postings about digital presses.)

Has Authorama.com "Set Free" 100 Public Domain Books from Google Book Search?

In a posting on Google Blogoscoped, Philipp Lenssen has announced that he has put up 100 public domain books from Google Book Search on Authorama.

Regarding his action, Lenssen says:

In other words, Google imposes restrictions on these books which the public domain does not impose*. I’m no lawyer, and maybe Google can print whatever guidelines they want onto those books. . . and being no lawyer, most people won’t know if the guidelines are a polite request, or legally enforceable terms**. But as a proof of concept—the concept of the public domain—I’ve now ‘set free’ 100 books I downloaded from Google Book Search by republishing them on my public domain books site, Authorama. I’m not doing this out of disrespect for the Google Books program (which I think is cool, and I’ll credit Google on Authorama) but out of respect for the public domain (which I think is even cooler).

Since Lenssen has retained Google’s usage guidelines in the e-books, it’s unclear how they have been "set free," in spite of the following statement on Authorama’s Books from Google Book Search page:

The following books were downloaded from Google Book Search and are made available here as public domain. You can download, republish, mix and mash these books, for private or public, commercial or non-commercial use.

Leaving aside the above statement, Lenssen’s action appears to violate the following Google usage guideline, where Google asks that users:

Make non-commercial use of the files We designed Google Book Search for use by individuals, and we request that you use these files for personal, non-commercial purposes.

However, in the above guideline, Google uses the word "request," which suggests voluntary, rather than mandatory, compliance. Google also requests attribution and watermark retention.

Maintain attribution The Google ‘watermark’ you see on each file is essential for informing people about this project and helping them find additional materials through Google Book Search. Please do not remove it.

Note the use of the word "please."

It’s not clear how to determine if Google’s watermark remains in the Authorama files, but, given the retention of the usage guidelines, it likely does.

So, do Google’s public domain books really need to be "set free"? In its usage guidelines, Google appears to make compliance requests, not compliance requirements. Are such requests binding or not? If so, the language could be clearer. For example, here’s a possible rewording:

Make non-commercial use of the files Google Book Search is for individual use only, and its files can only be used for personal, non-commercial purposes. All other use is prohibited.

Landmark Digital Humanities Book Is Now Freely Available

A Companion to Digital Humanities is now freely available in digital form.

This important 2004 book was edited by Susan Schreibman, Ray Siemens, and John Unsworth. It includes chapters by such notable experts as Howard Besser, Greg Crane, Susan Hockey, Willard McCarty, Allen H. Renear, Abby Smith, C. M. Sperberg-McQueen, John Unsworth, and Perry Willett (to name just a few).

Scholarly Electronic Publishing Weblog Update (1/8/07)

The latest update of the Scholarly Electronic Publishing Weblog (SEPW) is now available, which provides information about new scholarly literature and resources related to scholarly electronic publishing, such as books, journal articles, magazine articles, newsletters, technical reports, and white papers. Especially interesting are: "Eliminating E-Reserves: One Library’s Experience," "Jean-Noël Jeanneney’s Critique of Google: Private Sector Book Digitization and Digital Library Policy," "Open Access in 2006," Our Cultural Commonwealth: The Final Report of the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities & Social Sciences, "The Research University and Scholarly Publishing: The View from a Provost’s Office," "Self-Archiving and the Copyright Transfer Agreements of ISI-ranked Library and Information Science Journals," "Using the Audit Checklist for the Certification of a Trusted Digital Repository as a Framework for Evaluating Repository Software Applications," and "Why Digital Asset Management? A Case Study."

For weekly updates about news articles, Weblog postings, and other resources related to digital culture (e.g., copyright, digital privacy, digital rights management, and Net neutrality), digital libraries, and scholarly electronic publishing, see the latest DigitalKoans Flashback posting.

Bad Juju: Zombies and Botnets

You may not know it, but your home computer could be under a serious attack from botnets populated by zombie computers, and that spells trouble for your personal data.

According to a New York Times article ("Attack of the Zombie Computers Is Growing Threat") ShadowServer is "now tracking more than 400,000 infected machines and about 1,450 separate I.R.C. control systems, which are called Command & Control servers." Moreover, it states that:

Computer security experts warn that botnet programs are evolving faster than security firms can respond and have now come to represent a fundamental threat to the viability of the commercial Internet. The problem is being compounded, they say, because many Internet service providers are either ignoring or minimizing the problem.

The New York Times piece offers some general advice about how to protect your computer. I’ll give you some quick specifics for PCs, using free programs.

First, let’s see how exposed your computer is to the Net. Go to Shields Up!, click on "Proceed" at the bottom of the page, click on "File Sharing," then click on "All Service Ports." If your computer, doesn’t pass these tests you’ll want to take remedial action.

Second, if you don’t have a software firewall, download and install the free version of Zone Alarm. Under "Firewall," set "Internet Zone Security" to "High."

Third, if you don’t have antivirus software, download and install AVG Anti-Virus Free Edition. Scan for viruses.

Fourth, if you don’t have antispyware software, download and install Ad-Aware SE Personal. Scan for spyware. Update and run it periodically.

Fifth (for DSL/cable users), if you really want to be safe and you don’t have a hardware firewall, buy one and disable the IRC ports: 194 and 6660-7000.

Wasn’t that fun? Now, run Shields Up! again. Hopefully, all is well. If not, tweak.

Keep in mind that free program versions lack features of paid ones. Also keep in mind that suite programs that you pay for often offer variable protection for various functions, and, while a single program may cover all functions, you may be better off mixing and matching single-function programs that are very highly rated by PC Magazine, PC World, and similar publications, keeping in mind that programs from different vendors can interfere with each other and experimentation may be needed to find the right mix.

Source: Markoff, John. "Attack of the Zombie Computers Is Growing Threat." The New York Times, 7 January 2006, 1, 16.