Skip to Main Content
Wellesley College Research Guides

Text Analysis & Text Mining

Policies for Mining Licensed Content

If you wish to undertake a text or data mining project with content from the Library's licensed databases, please contact a Librarian to investigate options, which may include negotiating with the vendor or purchasing access to the data. Although many database licenses prohibit text and data mining and the use of software such as scripts, agents, or robots, we are actively negotiating text mining rights with database vendors. Unauthorized text or data mining in violation of our licenses can result in loss of access for the entire Wellesley College community.

Please also see our Best Practice Tips for mining licensed databases.

Wellesley College Library Databases That Support Text Mining

Resource Details
Adam Matthew  Primary source collections spanning the 15th to 21st centuries and containing millions of pages. Adam Matthew allows data mining/text analysis free of charge for fair use/academic research. Secure online access to the data via an API can be provided on submission of an information form. Librarians may contact Adam Matthew via info@amdigital.co.uk to discuss data extraction from the main collection website by automated software. See the Adam Matthew Text Mining/Data Mining Statement for more information.
Early English Books Online (EEBO) Digital facsimile page images of virtually every work printed in the English-speaking world from 1473 to 1700, as well as some items printed after 1700. 25,000+ selected texts from the EEBO corpus are available to download for text analysis through the Text Creation Partnership
Eighteenth-Century Collections Online (ECCO) Digital facsimile page images of significant English-language and foreign-language titles printed in the United Kingdom during the 18th century, along with thousands of important works from the Americas; includes books, pamphlets, broadsides, and ephemera. All data is available for bulk download for text mining through the Text Creation Partnership
JSTOR JSTOR's Data for Research self-service site provides datasets for the journals, books, research reports, and pamphlets in the JSTOR digital library at no cost to researchers and libraries. Researchers may create a dataset of up to 25,000 documents (metadata and/or n-grams) using the self-service option. (See How to Create a Dataset.) Large and full-text datasets are provided by request and require an agreement about the use of the data.
Oxford English Dictionary (OED) Oxford University Press offers a free prototype API or a developer plan to access all data and functionality. See API FAQ. Read more about research partnerships that use OED datasets

Science Direct

Researchers at subscribing academic institutions can text mine subscribed full-text ScienceDirect content via the Elsevier APIs for non-commercial purposes. 
Women Writers Online A full-text collection of early women’s writing in English, including full transcriptions of texts published between 1526 and 1850, focusing on materials that are rare or inaccessible.

Best Practice Tips

  • We can help you contact database publishers. If you want access to information that you can’t easily access through a database, we can help you get in touch with the right people at the database publisher and identify what options you might have. Options may include choices of delivery methods and negotiations regarding authorized uses. We can also provide assistance throughout the process of working with the publisher.

  • This may take time. If you are considering a text or data mining project, you should contact us early in your process, as publishers may be slow to respond to requests for data and negotiations may take time, particularly since this is not a topic that all publishers have considered.

  • Publishers may charge for access to their data. Some publishers offer access to text and data mining projects only if the user pays an additional fee. Alternatively, some may sell their data separately specifically for this purpose. We can help you to identify the most economical and efficient method of getting the access you need.

  • Open Access alternatives may exist. Depending on the nature of your research, there may be Open Access journals, databases or datasets that you can use. We can help you to identify whether such a source would work for your project. See the Free Sources for Text Mining page for some examples.

 

Acknowledgement

Creative Commons LicenseThe policies and best practice tips on this page were adapted from the Text & Data Mining Guide created by Boston College Libraries. Licensed under a Creative Commons Attribution 4.0 International License.