Are There 200,000 "Duplicate" Articles in Journals Indexed by Medline?

Based on a recent study published in Nature, it is possible that there may be as many as 200,000 duplicate articles (either articles that were published in multiple journals or plagiarized) in journals indexed by Medline. To conduct the study, Mounir Errami and Harold Garner utilized the eTBLAST software to analyze samples of Medline article abstracts in order to estimate the prevalence of duplicate articles.

Duplicate detection is an issue of great concern to both publishers and scholars. The CrossCheck project is allowing eight publishers to test the duplicate checking as part of the editorial process in a closed-access environment. In the project's home page, it states:

Currently, existing PD [plagiarism detection] systems do not index the majority of scholarly/professional content because it is inaccessible to crawlers directed at the open web. The only scholarly literature that is currently indexed by PD systems is that which is available openly (e.g. OA, Archived or illegitimately posted copies) or that which has been made available via third-party aggregators (e.g. ProQuest). This, in turn, means that any publisher who is interested in employing PD systems in their editorial work-flow is unable to do so effectively. Even if a particular publisher doesn't have a problem with plagiarized manuscripts, they should have an interest in making sure that their own published content is not plagiarized or otherwise illegitimately copied.

In order for CrossRef members to use existing PD systems, there needs to be a mechanism through which PD system vendors can, under acceptable terms & conditions, create and use databases of relevant scholarly and professional content.

Open access advocates have pointed out that one advantage of OA is that it allows the unrestricted analysis and manipulation of the full text of freely available works. Open access makes it possible for all interested parties, including scholars and others who might not have access to closed duplicate verification databases, to conduct whatever analysis as they wish and to make the results public without having to consider potential business impacts.

Read more about it at: "Copycat Articles Seem Rife in Science Journals, a Digital Sleuth Finds" and "How Many Papers Are Just Duplicates?"