July 2007 – Page 2 – DigitalKoans

Metadata Extraction Tool Version 3.2

The National Library of New Zealand has released version 3.2 of its open-source Metadata Extraction Tool.

Written in Java and XML, the Metadata Extraction Tool has a Windows interface, and it runs under UNIX in command line mode. Batch processing is supported.

Here’s an excerpt from the project home page:

The Tool builds on the Library’s work on digital preservation, and its logical preservation metadata schema. It is designed to:

automatically extracts preservation-related metadata from digital files

output that metadata in a standard format (XML) for use in preservation activities. . . .

The Metadata Extract Tool includes a number of ‘adapters’ that extract metadata from specific file types. Extractors are currently provided for:

Images: BMP, GIF, JPEG and TIFF.

Office documents: MS Word (version 2, 6), Word Perfect, Open Office (version 1), MS Works, MS Excel, MS PowerPoint, and PDF.

Audio and Video: WAV and MP3.

Markup languages: HTML and XML.

If a file type is unknown the tool applies a generic adapter, which extracts data that the host system ‘knows’ about any given file (such as size, filename, and date created).

The ticTOCs Project: Enhancing Table-of-Contents RSS Feeds

The goal JISC-funded ticTOCs Project is to greatly enhance access to and re-use of journal table-of-contents RSS feeds.

Here's an excerpt from ticTOCs in a Nutshell:

ticTOCs intends be a catalyst for change by incorporating existing technology plus Web 2.0 concepts in the smart aggregation, recombination, synthesization, output and reuse of standardised journal Table of Contents (TOC) RSS feeds from numerous fragmented sources (journal publishers). These TOCs, and their content, will be presented in a personalisable and interactive web-based interface that requires little or no understanding, by the user, of the technical or procedural concepts involved. It has been called ticTOCs because in certain instances it will involve the selective ticking of appropriate TOCs, and also because ticTOCs is a memorable name, something which is important in todays online environment.

ticTOCs will incorporate:

A user-friendly web-based, AJAX enabled TOCosphere for the smart aggregation, personalisation, output and reuse of TOC RSS feeds and contents. It will allow users to discover, select, personalise, display, reuse and export (to bibliographic software).

Within this TOCosphere there will be a Directory of TOCs to allow easy selection by title, subject, ISSN, and so on.

Re-use of data this will involve embedding TOCs and combined TOCs in research output showcases, gateways, VREs, websites, etc.

Easy links from a multitude of journals lists to ticTOCs using chicklet subscribe buttons

Data gathered for analysis presents many possibilities.

Community networking possibilities, within the TOCosphere. . . .

The ticTOCs Consortium consists of: the University of Liverpool Library (lead), Heriot-Watt University, CrossRef, ProQuest CSA, Emerald, RefWorks, MIMAS, Cranfield University, Nature Publishing Group, Institute of Physics, SAGE Publishers, Inderscience Publishers, DOAJ (Directory of Open Access Journals), Open J-Gate, and Intute.

Using the Open Archives Initiative Protocol for Metadata Harvesting

Libraries Unlimited has released Using the Open Archives Initiative Protocol for Metadata Harvesting by Timothy W. Cole and Muriel Foulonneau.

Here’s an excerpt from the publisher’s description:

Through a series of case studies, Cole and Foulonneau guide the reader through the process of conceiving, implementing and maintaining an OAI-compliant repository. Its applicability to both institutional archives and discipline based aggregators are covered, with equal attention paid to the technical and organizational aspects of creating and maintaining such repositories.

Urgent: Send a Message to Congress about the NIH Public Access Policy

Peter Suber has pointed out that ALA has an Action Alert that allows you to just fill in a form to send a message to your Congressional representatives about the NIH Public Access Policy.

Under "Compose Message" in the form, I suggest that you shorten the Subject to "Support the NIH Public Access Policy." As an "Issue Area" you might use "Budget" or "Health." Be sure to fill in your salutation and phone number; they are required to send an e-mail even though the form does not show them as required fields.

I’ve made slight modifications to the talking points and created a Web page so that the talking points can simply be cut and pasted into the "Editable text to" section of the form as the message.

ALA Weblogs and Creative Commons Licenses

The American Library Association and its divisions have launched a number of Weblogs in the last few years. What copyright provisions are these digital publications under? Do they use Creative Commons licenses?

As the list below shows, the vast majority of ALA Weblogs have no explicit copyright statement on their homepage. The absence of such a statement does not mean that under U.S. law the Weblogs are not under standard copyright provisions. They are copyrighted, but by who? Unless ALA has a copyright transfer or work-for-hire agreement with Weblog authors, it appears that the author of each posting holds the copyright to that posting, and copyright permissions for uses of postings that exceed fair use would need to be obtained from their authors. (Some Weblogs have a single author.)

One ALA Weblog uses the standard ALA copyright statement (ALA Techsource), one is copyrighted under the name of the Weblog (ACRLog), one is under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States license (YALSA), and three others are under Creative Commons Attribution-NonCommercial-NoDerivs 2.5 licenses (District Dispatch, LITA Blog, and Office for Intellectual Freedom).

Thus, the vast majority of ALA Weblogs are under standard copyright provisions, one is under ALA’s more liberal copyright provisions, and a few are under Creative Commons Licenses that permit noncommercial use without further permission as long as it does not include the creation of derivative works.

AASLblog: No copyright statement.
ACRLog: Copyright ACRLog.
ACRL Podcasts: No copyright statement.
ALSC Blog No copyright statement.
ALA Techsource: Copyright ALA Techsource (Copyright Statement and Release)
CentenniAL: No copyright statement.
COSWL Cause: No copyright statement.
Defining Digital Preservation: No copyright statement.
Digiblog: ALCTS and the Future of Tech Services: No copyright statement.
Digitization Principles: No copyright statement.
District Dispatch:
Creative Commons Attribution-NonCommercial-NoDerivs 2.5.
Electronic Resources Interest Group: No copyright statement.
Emerging Leaders: No copyright statement.
GLBT-RT: No copyright statement.
The Green Kangaroo: No copyright statement.
Hectic Pace: No copyright statement.
ITTS Update: No copyright statement.
LA&M Editors Desk: No copyright statement.
LEADS from LAMA: No copyright statement.
Library Education: No copyright statement.
LITA Blog:
Creative Commons Attribution-NonCommercial-NoDerivs 2.5
(LITA Blog Acceptable Use and Copyright Statement)
MemberBlog: No copyright statement.
Member Participation Task Force: No copyright statement.
Metadata Blog: No copyright statement.
Office for Intellectual Freedom:
Creative Commons Attribution-NonCommercial-NoDerivs 2.5.
PLA Blog: No copyright statement.
Public Programs Post: No copyright statement.
Spectrum: No copyright statement.
Talking Reference & . . .: No copyright statement.
Web Planning Retreat: No copyright statement.
YALSA: Attribution-NonCommercial-NoDerivs 3.0 United States.

ACRLog Urgent Call for Action about NIH Policy Vote

An urgent call for action has been issued on ACRLog about upcoming House and Senate votes on Labor, Health and Human Services appropriations bills that will determine whether NIH-funded researchers are required to make their final manuscripts publicly accessible within twelve months of publication.

Here's an excerpt from the posting:

We need your help to keep the momentum going. The full House of Representatives and the full Senate will vote on their respective measures this summer. The House is expected to convene on Tuesday, July 17. We’re asking that you contact your US Representative and your US Senators by phone or fax as soon as possible and no later than Monday afternoon. Urge them to maintain the Appropriations Committee language. (Find talking points and contact info for your legislators in the ALA Legislative Action Center. It is entirely possible that an amendment will be made on the floor of the House to delete the language in the NIH policy.

Want to know more? Listen to an interview with Heather Joseph of SPARC on the ALA Washington Office District Dispatch blog. Find background on the issue along with tips on communicating effectively with your legislators in the last two issues of ACRL’s Legislative Update and at the Alliance for Taxpayer Access website.

Peter Suber has issued a similar call on Open Access News. Here it is in full:

Tell Congress to support an OA mandate at the NIH

Let me take the unusual step of repeating a call to action from yesterday in case it got buried in the avalanche of news.

The House Appropriations Committee approved language establishing an OA mandate at the NIH. The full House is scheduled to vote on the appropriations bill containing that language on Tuesday, July 17.

Publishers are lobbying hard to delete this language. If you are a US citizen and support public access for publicly-funded research, please ask your representative to support this bill, and to oppose any attempt to amend or strike the language. Contact your representative now, before you forget.

Time is short. Offices are closed on the weekend, but emails and faxes will go through. Send an email or fax right now or telephone before Monday afternoon.

Because the Senate Appropriations Committee approved the same language in June, you should contact your Senators with the same message. But the vote by the full House is in three days, while the vote by the full Senate has not yet been scheduled.

For help in composing your message, see

the talking points from SPARC

the open letter to Congress from 26 US Nobel laureates in science

Then spread the word!

steve: The Art Museum Tagging Project

The steve project has developed open source tagging software for museums called steve tagger that runs on Linux, Macintosh, and Windows platforms (see the Steve Tagger 1.0 Install Guide). You can see how the tagging works at their live system site.

Here’s an excerpt from the About Steve pages that describes the project:

"Steve" is a collaborative research project exploring the potential for user-generated descriptions of the subjects of works of art to improve access to museum collections and encourage engagement with cultural content. We are a group of volunteers, primarily from art museums, who share a common interest in improving access to our collections. We are concerned about barriers to public access to online museum information. Participation in steve is open to anyone with a contribution to make to developing our collective knowledge, whether they formally represent a museum or not.

You can find out more about steve from the November 2006 "Social Tagging and Folksonomy: steve.museum and Access to Art" presentation and from other project documents on the Reference page.

Australian Framework and Action Plan for Digital Heritage Collections

The Collections Council of Australia Ltd. has released Australian Framework and Action Plan for Digital Heritage Collections, Version 0.C3 for comment.

Here's an excerpt from the document:

This is the Collections Council of Australia's plan to prepare an Australian framework for digital heritage collections. It brings together information shared by people working in archives, galleries, libraries and museums at a Summit on Digital Collections held in 2006. It proposes an Action Plan to address issues shared by the Australian collections sector in relation to current and future management of digital heritage collections.

Update on the DSpace Foundation

Michele Kimpton, Executive Director of the DSpace Foundation, gave gave a talk about the foundation at the DSpace UK & Ireland User Group meeting in early July.

Her PowerPoint presentation is now available.

Source: Lewis, Stuart. "Presentations from Recent DSpace UK & Ireland User Group Meeting," Unilever Centre for Molecular Informatics, Cambridge—Jim Downing, 11 July 2007.

Crazy Bosses

Queen of Hearts

Let’s hope that you never have a crazy boss. But—take my word for it—they’re out there. If you feel that you’ve gone down the rabbit hole and are faced each day with a cross between the Mad Hatter and the Queen of Hearts, then, if you can’t just quit, you might want to pick up a copy of Stanley Bing’s Crazy Bosses.

You’ll meet a variety of crazy bosses, including the disaster hunter, the narcissist, the paranoid, the wimp, and, my personal favorite, the bully.

Bing offers humorous, but sage, advice for how to deal with these crazies.

Scholarly Electronic Publishing Weblog Update (7/11/07)

The latest update of the Scholarly Electronic Publishing Weblog (SEPW) is now available, which provides information about new scholarly literature and resources related to scholarly electronic publishing, such as books, journal articles, magazine articles, technical reports, and white papers.

Especially interesting are: "Content Recruitment for Institutional Repositories (IR's)," DSpace How-To Guide: Tips and Tricks for Managing Common DSpace Chores (Now Serving DSpace 1.4.2 and Manakin 1.1), "Going All the Way: How Hindawi Became an Open Access Publisher," "Library Access to Scholarship," "The OA Interviews: Stevan Harnad," "Open Access and Accuracy: Author-archived Manuscripts vs. Published Articles," "Problems and Opportunities (Blizzards and Beauty)," Report of the Sustainability Guidelines for Australian Repositories Project (SUGAR), "Society Publishing, the Internet and Open Access: Shifting Mission-Orientation from Content Holding to Certification and Navigation Services?," Towards an Open Source Repository and Preservation System: Recommendations on the Implementation of an Open Source Digital Archival and Preservation System and on Related Software Development, and "What a Difference a Publisher Makes."

For weekly updates about news articles, Weblog postings, and other resources related to digital culture (e.g., copyright, digital privacy, digital rights management, and Net neutrality), digital libraries, and scholarly electronic publishing, see the latest DigitalKoans Flashback posting.

Web/Web 2.0 Tools and Techniques

Hereâ€™s a list of a few overviews of Web/Web 2.0 tools and techniques that developers may find useful.

Obituary: Peter Lyman

Peter Lyman, former University Librarian at the University of California, Berkeley and professor emeritus at Berkeley’s School of Information, has died of brain cancer. He was 66 years old.

Here’s an excerpt from the press release:

In 2005, Lyman became the director of the Digital Youth Project, a three-year collaborative investigation founded by the John D. and Catherine T. MacArthur Foundation of how kids use digital media in their everyday lives—at home and in libraries, after-school programs and public places. . . .

Lyman was born in San Francisco in 1940. He earned a B.A. in philosophy from Stanford University in 1962, his M.A. in political science from UC Berkeley in 1963, and his Ph.D. in political science from Stanford in 1972.

He was one of the founders of James Madison College, a residential college at Michigan State University with a public policy focus and was a faculty member there from 1967 to 1987. He also was a visiting professor at Stanford and UC Santa Cruz.

In 1987 Lyman moved to the University of Southern California (USC), where he founded the Center for Scholarly Technology and served as its executive director. He also was associate dean for library technology at that university before becoming USC’s university librarian in 1991. At USC, he helped envision and oversee the creation of a new, technologically advanced undergraduate library.

He returned to UC Berkeley in 1994 to serve as the campus’s seventh university librarian until 1998. He also joined the School of Information Management & Systems (now the School of Information) as a professor in 1994. . . .

Lyman became an emeritus professor in 2006. He served on the editorial boards of the numerous academic journals relating to information technology and society as well as on the board of directors of Sage Publications, the Council on Library and Information Resources, the Art History Information Project at the Getty Trust, and the Internet Archive.

Obituary: Martha E. Williams

Martha E. Williams, long-time editor of the Annual Review of Information Science and Technology and former President of the American Society for Information Science and Technology, has died at age 72.

The funeral home web site obituary has been posted on ASIS-L.

Code4Lib Journal Established

The newly established Code4Lib Journal has issued a call for papers.

Here’s an excerpt from the call:

The Code4Lib Journal (C4LJ) will provide a forum to foster community and share information among those interested in the intersection of libraries, technology, and the future.

Submissions are currently being accepted for the first issue of this promising new journal. Please submit articles, abstracts, or proposals for articles to c4lj-articles@googlegroups.com (a private list read only by C4LJ editors) by Friday, August 31, 2007. Publication of the first issue is planned for late December 2007.

Possible topics for articles include, but are not limited to:

* Practical applications of library technology. Both actual and
hypothetical applications invited.
* Technology projects (failed, successful, proposed, or
in-progress), how they were done, and challenges faced
* Case studies
* Best practices
* Reviews
* Comparisons of third party software or libraries
* Analyses of library metadata for use with technology
* Project management and communication within the library environment
* Assessment and user studies . . . .

The goal of the journal is to promote professional communication by minimizing the barriers to publication. While articles in the journal should be of a high quality, they need not follow any formal structure or guidelines. Writers should aim for the middle ground between, on the one hand, blog or mailing-list posts, and, on the other hand, articles in traditional journals. . . .

The Journal will be electronic only, and at least initially, edited rather than refereed. . . .

Code4Lib Journal Editorial Committee

Carol Bean
Jonathan Brinley
Edward Corrado
Tom Keays
Emily Lynema
Eric Lease Morgan
Ron Peterson
Jonathan Rochkind
Jodi Schneider
Dan Scott
Ken Varnum

Archivists’ Toolkit 1.1 Beta (v. 1.0.19) Released

The Archivists’ Toolkit 1.1 Beta (v. 1.0.19) has been released. This version can connect Archivists’ Toolkit clients to MySQL, MS SQLServer, and Oracle database backends.

For more information on the Archivists’ Toolkit, see the "Archivists’ Toolkit Beta 1.1 Released" DigitalKoans posting.

Publisher Mergers: Walter de Gruyter Buys K. G. Saur Verlag

In yet another scholarly publishing company merger, Walter de Gruyter has announced that it has acquired K. G. Saur und Max Niemeyer.

Here’s an excerpt from the press release:

Walter de Gruyter GmbH & Co. KG has with immediate effect acquired the complete publishing programme of K. G. Saur Verlag GmbH, which since 2005 has also included the programme of Max Niemeyer Verlag. Through this acquisition Walter de Gruyter will become the market leader in the subject areas classical studies, philosophy, German studies, linguistics and English and Romance studies, as well as in library sciences and general library reference works.

For an analysis of the effect of publisher mergers on serials prices, see the works of Dr. Mark J. McCabe.

Index Data Releases Open Source Pazpar2 Z39.50 Client

Index Data has released Version 1.0.1 of Pazpar2, an open source Z39.50 client.

Here’s an excerpt from the press release:

Pazpar2 . . . can be viewed either as a high-performance metasearching middleware or a Z39.50 client with a webservice interface, depending on your perspective and needs. It is a fairly compact C program—a resident daemon—that incorporates the best we know how to do in terms of providing high performance, user-oriented federated searching. . . .

One cool thing it does is search many databases in parallel, and do it fast, without unduly loading up the user interface. . . It retrieves a set of records from each target, and performs merging, deduplication, ranking/sorting, and pulls browse facets from them. . . .

It doesn’t know anything about data models, so you can handle exotic data sources if you need to. . . you use XSLT to normalize data into an internal model—we provide examples for MARC21 and a DC-esque internal model, and configure ranking, facets, sorting, etc., from that. . . .

An Ecological Approach to Repository and Service Interactions

UKOLN and JISC CETIS have released An Ecological Approach to Repository and Service Interactions, Draft Version 0.9 for comment.

Here’s an excerpt from the "Not the Executive Summary" section:

This work began with the need to express something of how and why repositories and services interact. As a community we have well understood technical models and architectures that provide mechanisms for interoperability. The actual interactions that occur, however, are not widely understood and knowledge about them is not often shared. This is in part because we tend to share in the abstract through architectures and use cases, articulating interactions or connections requires an engagement with specific details. . . .

Ecology is the study of systems that are complex, dynamic, and full of interacting entities and processes. Although the nature of these interactions and processes may be highly detailed, a higher level view of them is accessible and intuitive. We think that ecology and the ecosystems it studies may offer a useful analogy to inform the task of understanding and articulating the interactions between users, repositories, and services and the information environments in which they take place. This report outlines some concepts from ecology that may be useful and suggests some definitions for a common conversation about the use of this metaphor.

We hope that this report suggests an additional way to conceptualise and analyse interactions and provide a common vocabulary for an ecological approach. It should as a minimum provoke and support some useful discussions about networks and communities.

British Library Licenses Turning the Pages Toolkit

The British Library has announced that it is now licensing its Turning the Pages Toolkit to libraries and museums. You can see the software in action at their Turning the Pages Web site.

Here’s an excerpt from the press release:

From today, libraries around the World will be able to license the award-winning Turning the Pages software used by the British Library to bring some of the world’s most rare and valuable books online.

Since its launch in 2004, Turning the Pages has grown to become one of the most popular resources at the British Library, allowing the Library to bring iconic treasures such as the Lindisfarne Gospels, Leonardo da Vinci’s Notebooks and Mercator’s Atlas of Europe online for everyone to see. With the launch of Turning the Pages 2.0, and a completely re-built software platform developed by Armadillo Systems, May 2007 also sees launch of a new "toolkit" that allows other libraries and museums around the World to create their own Turning the Pages gallery. . . .

Michael Stocking, Managing Director of Armadillo Systems and developer of the Turning the Pages software said "As well as making it easy for our customers to create their own collections, we also wanted to enhance the Turning the Pages experience. We have migrated the software to a new platform that places the book in a 3-D environment so, as well as being able to examine the book as a piece of text, users can now also examine it as an object. They can now look at the book from different angles, zoom in and even look at two books, side-by-side."

Curation of Scientific Data: Challenges for Institutions and Their Repositories Podcast

A podcast of Chris Rusbridge’s "Curation of Scientific Data: Challenges for Institutions and their Repositories" presentation at The Adaptable Repository conference is now available. Rusbridge is Director of the Digital Curation Centre in the UK.

The PowerPoint for the presentation is also available.

CLIR Receives Mellon Grant to Study Mass Digitization

According to a O’Reilly Radar posting, the Council on Library and Information Resources has been awarded a grant from the Mellon Foundation to study mass digitization efforts.

Here’s an excerpt from the posting that describes the grant’s objectives:

Assess selected large scale digitization programs by exploring their efficacy and utility for conducting scholarship, in multiple fields or disciplines (humanities, sciences, etc.).

Write and issue a report with findings and recommendations for improving the design of mass digitization projects.

Create a Collegium that can serve in the long-term as an advisory group to mass digitization efforts, helping to assure and obtain the highest possible data quality and utility.

Convene a series of meetings amongst scholars, libraries, publishers, and digitizing organizations to discuss ways of achieving these quality and design improvements.

How Many Creative Commons Licenses Are in Use?

In his "Creative Commons Statistics from the CC-Monitor Project" iCommons Summit presentation, Giorgos Cheliotis of the School of Information Systems at Singapore Management University estimates that there must be more than 60,000,000 Creative Commons licenses in use.

Based on backlink search data from Google and Yahoo, he also provides the following license breakdown highlights:

70% of the licenses allow non-commercial use only (NC)

Share-Alike (SA) also a very popular attribute, present in over 50% fCC-licensed items (though SA is anyhow self-propagating)

25% of the licenses include the ND [no derivative] restriction

Introducing the Networked Print Book

if:book reports that Manolis Kelaidis made a big splash at the O’Reilly Tools of Change for Publishing conference with his networked paper book.

Here’s a an excerpt from the posting:

Manolis Kelaidis, a designer at the Royal College of Art in London, has found a way to make printed pages digitally interactive. His "blueBook" prototype is a paper book with circuits embedded in each page and with text printed with conductive ink. When you touch a "linked" word on the page and your finger completes a circuit, sending a signal to a processor in the back cover which communicates by Bluetooth with a nearby computer, bringing up information on the screen.

Here’s an excerpt from a jusTaText posting about the demo:

Yes, he had a printed and bound book which communicated with his laptop. He simply touched the page, and the laptop reacted. It brought up pictures of the Mona Lisa. It translated Chinese. It played a piece of music. Kelaidis suggested that a library of such books might cross-refer, i.e. touching a section in one book might change the colors of the spines of related books on your shelves. Imagine.