DSpace Can Support One Million Items

A paper by researchers from the National Library of Medicine ("Testing the Scalability of a DSpace-based Archive") finds that DSpace can support an archive with a million items. The tested system "is built upon MIT's DSpace software (Version 1.4), with some modifications and enhancements to better facilitate batch based processing."

Here's an excerpt from the conclusion:

We conclude that the version of DSpace used in SPER (with MySQL database) shows acceptable ingest performance for a million-item archive. . . .

The experimental results shown here pertain to items with mostly one or two monochrome TIFF images, though a few items have up to 100 images. However, a number of inferences may be derived from these results.

  • No real problems were found in ingesting a million items to the archive, using a Sun X4500 server machine, in terms of either performance or reliability of the SPER/DSpace software architecture and implementation. . . .
  • With the increase in archive size, the average ingest time of an item increases in a smooth and predictable way.
  • With increasing number of TIFF images, the ingest time (per item) increases by three to four percent for each additional image.
  • If color TIFF images were used, the ingest times would increase slightly due to the overhead of copying additional data to the upload area, and to the archive's asset storage. However, other archival overheads should not change.