Petr Knoth, Nancy Pontika and Lucas Anastasiou have published "Releasing 1.8 Million Open Access Publications from Publisher Systems for Text and Data Mining" in LSE Impact of Social Sciences.
Here's an excerpt:
Text and data mining offers an opportunity to improve the way we access and analyse the outputs of academic research. But the technical infrastructure of the current scholarly communication system is not yet ready to support TDM to its full potential, even for open access outputs. To address this problem, Petr Knoth, Nancy Pontika and Lucas Anastasiou have developed the CORE Publisher Connector, a toolkit service designed to assist text miners in accessing content though a single machine interface. The Connector aims to solve the heterogeneity among publisher APIs and assist text miners with data collection, provide a centralised point of access to all openly available scientific publications, and provide a high-performance, constantly updated access interface.