January 2008 – DigitalKoans

Harmonization of Metadata Standards

PROLEARN has released Harmonization of Metadata Standards.

Here's an excerpt from the "Introduction":

Today there is a plethora of metadata specifications (such as IEEE LOM, Dublin Core, METS, MODS, MPEG-7, etc.), many of which are useful in whole or part for activities related to teaching and learning. While each specification in itself is designed to increase system interoperability, we are increasingly seeing systems that need to work with more than one of these specifications. Adding support for an additional specification generally presents a significant amount of added complexity in implementation. The reason for this is a lack of harmonization between specifications. . . .

Existing solutions to the metadata harmonization issue are few—systems are either limited to a single specification, or implement ad-hoc solutions that only work in that particular environment. There are many examples of "mappings" between specifications that provide partial solutions to the problem, but generally fail due to low-fidelity translations and lack of generality (i.e. the mapping only works for limited parts of specifications). Another solution is to create a top-level data model that encompasses the common aspects of all the specifications. This has proven to be feasible in relatively well-constrained domains such as resource aggregation. . . In the field of general metadata, where there is no such common ground, such an approach is substantially less likely to be successful. . . .

The deliverable begins with a short introduction to metadata in Section 3. Section 4 discusses a set of metadata specifications that are highly relevant to learning and teaching. Section 5 forms the core of the deliverable and analyses the harmonization issues among a chosen set of specifications. Section 6 generalizes the analysis in Section 5 and makes a deeper analysis of the relationship between IEEE LOM and Dublin Core. Section 7, finally, points to possible ways to address the identified harmonization issues.

Broadband in the U.S.: Mission Accomplished?

The U.S. National Telecommunications and Information Administration will shortly release a report, Networked Nation: Broadband in America, that critics say presents too optimistic a picture of broadband access in the U.S. Read more about it at "Study: U.S. Broadband Goal Nearly Reached."

Meanwhile, EDUCAUSE has released A Blueprint for Big Broadband: An EDUCAUSE White Paper, which says that: "The United States is facing a crisis in broadband connectivity."

Here's an excerpt from the EDUCAUSE report's "Executive Summary":

While other nations are preparing for the future, the United States is not. Most developed nations are deploying "big broadband" networks (100 Mbps) that provide faster connections at cheaper prices than those available in the United States. Japan has already announced a national commitment to build fiber networks to every home and business, and countries that have smaller economies and more rural territory than the United States (e.g., Finland, Sweden, and Canada) have better broadband services available.

Why is the United States so far behind? The failure of the United States to keep pace is the direct result of our failure to adopt a national broadband policy. The United States has taken a deregulatory approach under the assumption that the market will build enough capacity to meet the demand. While these steps may have had some positive influence, they are not sufficient. . . .

For these reasons, this paper proposes the creation of a new federal Universal Broadband Fund (UBF) that, together with matching funds from the states and the private and/or public sector, should be used to build open, big broadband networks of at least 100 Mbps (scalable upwards to 1 Gbps) to every home and business by 2012. U.S. state governors and foreign heads of state have found the resources to subsidize broadband deployment; the U.S. federal government should as well.

E-Print Preservation: SHERPA DP: Final Report of the SHERPA DP Project

JISC has released SHERPA DP: Final Report of the SHERPA DP Project.

Here's an excerpt from the "Executive Summary":

The SHERPA DP project (2005–2007) investigated the preservation of digital resources stored by institutional repositories participating in the SHERPA project. An emphasis was placed on the preservation of e-prints—research papers stored in an electronic format, with some support for other types of content, such as electronic theses and dissertations.

The project began with an investigation of the method that institutional repositories, as Content Providers, may interact with Service Providers. The resulting model, framed around the OAIS, established a Co-operating archive relationship, in which data and metadata is transferred into a preservation repository subsequent to it being made available. . . .

The Arts & Humanities Data Service produced a demonstrator of a Preservation Service, to investigate the operation of the preservation service and accepted responsibility for the preservation of the digital objects for a three-year period (two years of project funding, plus one year).

The most notable development of the Preservation Service demonstrator was the creation of a reusable service framework that allows the integration of a disparate collection of software tools and standards. The project adopted Fedora as the basis for the preservation repository and built a technical infrastructure necessary to harvest metadata, transfer data, and perform relevant preservation activities. Appropriate software tools and standards were selected, including JHOVE and DROID as software tools to validate data objects; METS as a packaging standard; and PREMIS as a basis on which to create preservation metadata. . . .

A number of requirements were identified that were essential for establishing a disaggregated service for preservation, most notably some method of interoperating with partner institutions and he establishment of appropriate preservation policies. . . . In its role as a Preservation Service, the AHDS developed a repository-independent framework to support the EPrints and DSpace-based repositories, using OAI-PMH as common method of connecting to partner institutions and extracting digital objects.

EU Court Says EU Countries Do Not Have to Reveal the Identity of Internet Users in Civil Copyright Cases

The European Court of Justice has ruled that EU countries do not have to force ISPs to reveal the names of users associated with IP addresses in civil copyright cases. The court said: "Community law does not require the member states, in order to ensure the effective protection of copyright, to lay down an obligation to disclose personal data in the context of civil proceedings."

JISC Programme Synthesis Study: Supporting Digital Preservation and Asset Management in Institutions

JISC has published JISC Programme Synthesis Study: Supporting Digital Preservation and Asset Management in Institutions: A Review of the 4-04 Programme on Digital Preservation and Asset Management in Institutions for the JISC Information Environment: Part II: Programme Synthesis.. The report covers a number of projects, including LIFE, MANDATE, PARADIGM, PRESERV, and SHERPA DP.

Here's an excerpt from UKOLN News:

Written by Maureen Pennock, DCC researcher at UKOLN, the study provides a comprehensive and categorised overview of the outputs from the entire programme. Categories include training, costs and business models, life cycles, repositories, case studies, and assessment and surveys. Each category includes detailed information on project outputs and references a number of re-usable project-generated tools that range from software services to checklists and guidance.

Columbia University and Microsoft Book Digitization Project

The Columbia University Libraries have announced that they will work with Microsoft to digitize a "large number of books" that are in the public domain.

Here's an excerpt from the press release:

Columbia University and Microsoft Corp. are collaborating on an initiative to digitize a large number of books from Columbia University Libraries and make them available to Internet users. With the support of the Open Content Alliance (OCA), publicly available print materials in Columbia Libraries will be scanned, digitized, and indexed to make them readily accessible through Live Search Books. . . .

Columbia University Libraries is playing a key role in book selection and in setting quality standards for the digitized materials. Microsoft will digitize selected portions of the Libraries’ great collections of American history, literature, and humanities works, with the specific areas to be decided mutually by Microsoft and Columbia during the early phase of the project.

Microsoft will give the Library high-quality digital images of all the materials, allowing the Library to provide worldwide access through its own digital library and to share the content with non-commercial academic initiatives and non-profit organizations.

Read more about it at "Columbia University Joins Microsoft Scan Plan."

How Big Should Statutory Damages Be for Copyright Violations?: Report on a Roundtable about Section 104 of the PRO IP Act

In "Roundtable on Copyright Damages: 'What Are We Doing Here?'," Sherwin Siy reports on an important roundtable discussion about Section 104 of the PRO IP Act.

Here's an excerpt:

My problem with the provision then was that no one present at the hearing was particularly keen on it—neither the Department of Justice nor the Chamber of Commerce were pushing it particularly hard. Nor was it really clear that this provision did much good to improve the state of copyright law. It has been fairly clear that this is something that the RIAA wants—it would allow them to recover a much larger sum in statutory damages. For instance, if a 10-song album were infringed, the statutory damages would not range from $750 to $150,000, as they do today, but could be as high as $7500 to $1.5 million.

Three Strikes and You're Out: A Kinder, Gentler Internet Disconnection Policy Emerges in France

Last November, it was reported that France intended to cut off Internet access to illegal downloaders after one warning from their ISP and a second offense. Now, it appears that violators will receive two warnings from the government, with a service cut-off after the third offense. Action on the bill is expected this summer.

Stewardship of Digital Research Data: A Framework of Principles and Guidelines

The Research Information Network (RIN) has published Stewardship of Digital Research Data: A Framework of Principles and Guidelines: Responsibilities of Research Institutions and Funders, Data Managers, Learned Societies and Publishers.

Here's an excerpt from the Web page describing the document:

Research data are an increasingly important and expensive output of the scholarly research process, across all disciplines. . . . But we shall realise the value of data only if we move beyond research policies, practices and support systems developed in a different era. We need new approaches to managing and providing access to research data.

In order to address these issues, the RIN established a group to produce a framework of key principles and guidelines, and we consulted on a draft document in 2007. The framework is founded on the fundamental policy objective that ideas and knowledge, including data, derived from publicly-funded research should be made available for public use, interrogation, and scrutiny, as widely, rapidly and effectively as practicable. . . .

The framework is structured around five broad principles which provide a guide to the development of policy and practice for a range of key players: universities, research institutions, libraries and other information providers, publishers, and research funders as well as researchers themselves. Each of these principles serves as a basis for a series of questions which serve a practical purpose by pointing to how the various players might address the challenges of effective data stewardship.

Detailed Notes and PowerPoints from the ALCTS Electronic Resources Interest Group Midwinter Meeting

Jennifer W. Lang has posted very detailed notes about the 2008 Midwinter meeting of the ALCTS Electronic Resources Interest Group.

Meeting speakers included Nicole Pelsinsky of Serials Solutions ("Making E-Resources Management More Manageable"), Timothy Savage of OCLC ("Automated E-Resource Cataloging"), and Peter Fletcher of the UCLA Library Cataloging and Metadata Center ("Provider Neutral Record for Remote Access Electronic Integrating Resources").

How to Harvest OAI-PMH Records with the Freeware MarcEdit Program

Terry Reese has posted step-by-step instructions about how to harvest OAI-PMH records from the University of Michigan Libraries' MBooks digital books collection using her MarcEdit freeware program. The data can either be converted to the MARC format or stored as is. MarcEdit also has a Z39.50 client as well as crosswalks, such as MARC to Dublin Core and MARC to EAD.

Copyright Troubles for SeeqPod and The Pirate Bay Search Engines

It is anticipated that the Swedish government will soon charge The Pirate Bay, a torrent search engine, with copyright violations. The Pirate Bay has received over 4,000 pages of evidence related to possible violations from the government. It has been reported that The Pirate Bay serves as many as 10 million peer computers, providing access to about one million torrents.

This news comes hard on the heels of Warner Music Group's suit against SeeqPod, a digital music search engine. The SeeqPod case will likely be determined by the court's interpretation of the Digital Millennium Copyright Act's "safe harbor" provision, with SeeqPod claiming immunity and Warner claiming that it does not apply.

Ruby-on-Rails/Solr OPAC: Version .1 of Blacklight Released

Bess Sadler has released version .1 of Blacklight, an open source "next generation library catalog written in ruby, using solr as the underlying search engine."

Against Intellectual Monopoly Freely Available

The forthcoming book Against Intellectual Monopoly, which will be published by Cambridge University Press, is now freely available in digital form.

Here's an excerpt from the introduction:

Our reasoning proceeds along the following lines. Everyone wants a monopoly. No one wants to compete against his own customers, or against imitators. Currently patents and copyrights grant producers of certain ideas a monopoly. Certainly few people do something in exchange for nothing. Creators of new goods are not different from producers of old ones: they want to be compensated for their effort. However, it is a long and dangerous jump from the assertion that innovators deserve compensation for their efforts to the conclusion that patents and copyrights, that is monopoly, are the best or the only way of providing that reward. Statements such as "A patent is the way of rewarding somebody for coming up with a worthy commercial idea" abound in the business, legal and economic press. As we shall see there are many other ways in which innovators are rewarded, even substantially, and most of them are better for society than the monopoly power patents and copyright currently bestow. Since innovators may be rewarded even without patents and copyright, we should ask: is it true that intellectual property achieves the intended purpose of creating incentives for innovation and creation that offset their considerable harm?

This book examines both the evidence and the theory. Our conclusion is that creators’ property rights can be well protected in the absence of intellectual property, and that the latter does not increase either innovation or creation. They are an unnecessary evil.

REPOMAN-L (Institutional Repository Managers' Mailing List) Launched

Richard Griscom, University of Pennsylvania, and Leah Vanderjagt, University of Alberta, have launched REPOMAN-L (Institutional Repository Managers' Mailing List).

Here's an excerpt from the announcement:

We have created REPOMAN-L (Institutional Repository Managers' Mailing List) as an open forum for the discussion of issues, great and small, that confront repository managers. We hope that you will subscribe and participate enthusiastically, and use this list for problem-solving and sharing of advice; for example:

to poll the group on practices at their institutions

to ask about any aspect of development from policy to outreach

initiatives to software evaluation

to share links to useful tools and references

to explore rationale around decisions you're making about your repository. . . .

The list is purposefully unaffiliated with any institution, initiative, repository software platform, or conceptual idea such as open access; the list would of course not exclude discussion of these areas, but we ask subscribers to consider initiating these discussions on lists set up specifically for the topics and then bring summaries of relevance to this list.

Are There 200,000 "Duplicate" Articles in Journals Indexed by Medline?

Based on a recent study published in Nature, it is possible that there may be as many as 200,000 duplicate articles (either articles that were published in multiple journals or plagiarized) in journals indexed by Medline. To conduct the study, Mounir Errami and Harold Garner utilized the eTBLAST software to analyze samples of Medline article abstracts in order to estimate the prevalence of duplicate articles.

Duplicate detection is an issue of great concern to both publishers and scholars. The CrossCheck project is allowing eight publishers to test the duplicate checking as part of the editorial process in a closed-access environment. In the project's home page, it states:

Currently, existing PD [plagiarism detection] systems do not index the majority of scholarly/professional content because it is inaccessible to crawlers directed at the open web. The only scholarly literature that is currently indexed by PD systems is that which is available openly (e.g. OA, Archived or illegitimately posted copies) or that which has been made available via third-party aggregators (e.g. ProQuest). This, in turn, means that any publisher who is interested in employing PD systems in their editorial work-flow is unable to do so effectively. Even if a particular publisher doesn't have a problem with plagiarized manuscripts, they should have an interest in making sure that their own published content is not plagiarized or otherwise illegitimately copied.

In order for CrossRef members to use existing PD systems, there needs to be a mechanism through which PD system vendors can, under acceptable terms & conditions, create and use databases of relevant scholarly and professional content.

Open access advocates have pointed out that one advantage of OA is that it allows the unrestricted analysis and manipulation of the full text of freely available works. Open access makes it possible for all interested parties, including scholars and others who might not have access to closed duplicate verification databases, to conduct whatever analysis as they wish and to make the results public without having to consider potential business impacts.

MPAA Now Says That College Students Account for 15%, Not 44%, of Illegal Movie Downloads

The Motion Picture Association of America has said that a 2005 study that claimed that college students accounted for 44% of illegal downloads of movies is incorrect: the correct number is 15%. The MPAA had used the higher figure to argue for measures that would address higher education downloading abuse.

Meanwhile, the EFF Deeplinks blog is reminding its readers ("Troubling 'Digital Theft Prevention' Requirements Remain in Higher Education Bill) that the College Opportunity and Affordability Act of 2007, which the House may take up in February, still contains this wording asking institutions to "develop a plan for offering alternatives to illegal downloading or peer-to-peer distribution of intellectual property as well as a plan to explore technology-based deterrents to prevent such illegal activity."

Alpha Version of OAI-PMH Metadata Analysis Tool Released

The Greenstone Digital Library project has released an alpha version of an OAI-PHM metadata analysis tool that can be used to "generate statistics and visualisations of OAI repositories." Several sample reports are available, including one for the University of Illinois IDEAL repository.

Cultural Industries in Europe Committee Votes Down Copyright Filtering and Term Extension Amendments

The European Parliament's Cultural Industries in Europe Committee has voted against amendments to the Cultural industries in the Context of the Lisbon Strategy report that would have filtered the Internet, removed or blocked infringing content, terminated the connectivity of infringers, and extended the term of copyright protection. The report will next be voted on in a European Parliament plenary meeting.

University of Minnesota Libraries Tutorial on Author Rights

The University of Minnesota Libraries have released a brief (about six minutes) Adobe Presenter overview of author rights issues aimed at faculty and other researchers.

International Study of Peer Review

The Publishing Research Consortium has released "Peer Review in Scholarly Journals: Perspective of the Scholarly Community—An International Study."

Here's an excerpt from the "Executive Summary":

The survey thus paints a picture of academics committed to peer review, with the vast majority believing that it helps scientific communication and in particular that it improves the quality of published papers. They are willing to play their part in carrying out review, though it is worrying that the most productive reviewers appear to be overloaded. Many of them are in fact willing to go further than at present and take on responsibility for reviewing authors’ data. Within this picture of overall satisfaction there are, however, some sizeable pockets of discontent. This discontent does not always translate into support for alternative methods of peer review; in fact some of those most positive about the benefits of peer review were also the most supportive of post-publication review. Overall, there was substantial minority support for post-publication review as a supplement to formal peer review, but much less support for open review as an alternative to blinded review.

Read more about it at "Peer Review Study."

Book to Be Published by MIT Press Undergoing Blog-Based Open Peer Review

Noah Wardrip-Fruin's draft of Expressive Processing: Digital Fictions, Computer Games, and Software Studies, which will be published by MIT Press, is undergoing an open peer-review process on the Grand Text Auto Weblog using a new plug-in version of CommentPress. The book is also undergoing a conventional peer-review process.

Copy Belgium: Canadian Recording Industry Association Asks for Copyright Filtering of the Internet

According to "Canadian Copyright Lobby Seeking Mandated ISP Filtering," the Canadian Recording Industry Association is asking the Canadian government to consider copyright filtering of the Internet.

Here's an excerpt:

[CRIA's] Henderson cites with approval several initiatives to move toward ISP filtering of content, pointing to a French report, comments from the UK that such legislation could be forthcoming, and the AT&T negotiations in the U.S. Later in the conversation, the group is asked what their dream legislation would look like. The first response? ISP liability, with the respondent pointing to Belgium as an example of an ideal model ("the file sharing issue will go away there as ISPs take down people"). Last summer, a Belgian court ordered an ISP to install filtering software to identify and block copyrighted content (the decision is currently being appealed).

If this reflects the current strategy—and there is reason to believe it does—it marks a dramatic change in the lobbying efforts. It suggests that not only are these groups seeking a Canadian DMCA, but they would like Industry Minister Jim Prentice to go even further by enacting constitutionally-dubious legislation requiring ISPs to identify and filter out content that is alleged to infringe copyright.

Presentations from eResearch Australasia 2007

Presentations from eResearch Australasia 2007 are now available.

Here are selected presentations:

Humanities Cyberinfrastructure: The TextGrid Project

The Humanities-oriented TextGrid Project is part of the larger German D-Grid initiative.

Here's an excerpt from the About TextGrid page:

TextGrid aims to create a community grid for the collaborative editing, annotation, analysis and publication of specialist texts. It thus forms a cornerstone in the emerging e-Humanities. . . .

Despite modern information technology and a clear thrust towards collaboration, text scientists still mostly work in local systems and project-oriented applications. Current initiatives lack integration with already existing text corpora, and they remain unconnected to resources such as dictionaries, lexica, secondary literature and tools. . . .

Integrated tools that satisfy the specific requirements of text sciences could transform the way scholars process, analyse, annotate, edit and publish text data. Working towards this vision, TextGrid aims at building a virtual workbench based on e-Science methods.

The installation of a grid-enabled architecture is obvious for two reasons. On the one hand, past and current initiatives for digitising and accessioning texts already accrued a considerable data volume, which exceeds multiple terabytes. Grids are capable of handling these data volumes. Also the dispersal of the community as well as the scattering of resources and tools call for establishing a Community Grid. This establishes a platform for connecting the experts and integrating the initiatives worldwide. The TextGrid community is equipped with a set of powerful software tools based on existing solutions and embracing the grid paradigm.