Supported by JISC funding, the Intute Repository Search project is developing increasingly sophisticated search capabilities for document discovery in UK repositories, and it has released two search engines for testing (conceptual search and text mining based search).
Here's an excerpt from the press release:
Search services harvest the metadata and full-text out-put from institutional repositories, making the aggregated content searchable and browsable via a single interface. Intute Repository Search currently searches over 95 UK institutional repositories that are taken from the Directory of Open Access Repositories, OpenDOAR.
The development path of this project involves simple metadata search, full-text indexing of documents, text-mining of full-text documents, automatic subject classification, term-based document classification, query expansion, clustering of results and browsing/visualisation of the search results. User group requirements have been integrated into the project's development iterations to ensure that the project adequately reflects what researchers want from a service such as Intute Repository Search.
Two complementary advanced search and browse services have been developed for user testing. One is Autonomy IDOL (www.autonomy.com/content/Products/products-idol-server/index.en.html) and the other is using components developed by NaCTeM (www.nactem.ac.uk).
Autonomy IDOL relates to the conceptual feature of the service. This allows users to search for documents most closely matched to their query, read the overview and abstract of those documents and also have the opportunity to view documents relating to the query's search results. The result is a richer contextual search facility for users who want to view documents that are ranked according to their relation to the query.
NaCTeM has developed the text mining component. This allows users to take advantage of the TerMine service (www.nactem.ac.uk/software/termine/) among others, to automatically discover term associations within texts that are harvested from UK HE institutional repositories. By extracting information that would have otherwise been difficult or impossible to identify in a large number of documents, users can view documents that are linked with each other via salient concepts in a way that may lead to the answer of existing research questions or the creation of new ones. This then allows for a more meaningful and personalised search facility for users who are looking for specific patterns and connections between terms, within the collective resource of Intute Repository Search.