NIH has released "NIH Manuscript Collection Optimized for Text-Mining and More."
Here's an excerpt:
You can download the entire PMC collection of NIH-supported author manuscripts as a package in either XML or plain text formats. The collection will encompass all NIH manuscripts posted to PMC since July 2008. While the public can access the articles' full text and accompanying figures, tables, and multimedia on the PMC Web site, the newly available article packages include full text only, in a form that facilitates text-mining.