The National Library of New Zealand has released version 3.2 of its open-source Metadata Extraction Tool.
Written in Java and XML, the Metadata Extraction Tool has a Windows interface, and it runs under UNIX in command line mode. Batch processing is supported.
Here’s an excerpt from the project home page:
The Tool builds on the Library’s work on digital preservation, and its logical preservation metadata schema. It is designed to:
- automatically extracts preservation-related metadata from digital files
- output that metadata in a standard format (XML) for use in preservation activities. . . .
The Metadata Extract Tool includes a number of ‘adapters’ that extract metadata from specific file types. Extractors are currently provided for:
- Images: BMP, GIF, JPEG and TIFF.
- Office documents: MS Word (version 2, 6), Word Perfect, Open Office (version 1), MS Works, MS Excel, MS PowerPoint, and PDF.
- Audio and Video: WAV and MP3.
- Markup languages: HTML and XML.
If a file type is unknown the tool applies a generic adapter, which extracts data that the host system ‘knows’ about any given file (such as size, filename, and date created).