JISC has released Enhanced Ingest to Digital E-Research Repositories: Final Report.
Here's an excerpt:
The project developed a demonstrator that implemented an enhanced deposit and ingest process to a digital repository based on Fedora. The process incorporates the SWORD API for deposit, and accepts deposits that contain multiple files (packaged as a zip file). The workflow performs preservation actions (e.g. capturing PREMIS metadata, format migration), extraction of resource discovery metadata (for text-based formats such as PDF, MS Word, HTML), and capture of publisher self-archiving policies (for post-prints). The resources are ingested into the repository following an atomistic model—individual files and directories correspond to individual digital objects, and relationships between them (i.e. the membership relation between files/directories) are represented as RDF statements. The workflow was constructed from a variety of components developed by other projects.