Library Research Guides: Text Analysis & Text Mining: Overview

What is Text Analysis and Text Mining?

Text mining and analysis is a form of data mining performed on text-based data sets. It involves the computational analysis of large quantities of digital information.

Using specialized software, researchers can extract data, identify trends, look for patterns and better understand the relationships of terms within and between documents. Analysis might focus on word frequency, words that frequently appear near each other, contextual information for key words, common phrases and other patterns.

Materials to be analyzed range from websites (such as publicly available Facebook posts) to 16th c. manuscripts.

"Text Mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources... The difference between regular data mining and text mining is that in text mining the patterns are extracted from natural language text rather than from structured databases of facts." (Marti Hearst, What is Text Mining?)

Introductions to Text Analysis

Illinois University Library and Penn State University Libraries have good brief overviews of text analysis methods.

Background on Text Analysis
An excellent, succinct introduction to the what and why of text analysis, from the University of Alberta Methodica: Digital Text Methods site.
Tooling Up for Digital Humanities: Text Analysis
A basic introduction to text analysis approaches, including stylometry, content-based analysis, and studies of metadata.
Data Mining and Text Analysis (Johanna Drucker)
Seven ways humanists are using computers to understand text (Ted Underwood)
Text Analytics 101 (John Laudun)

See the Terminology & Projects page for more information on text mining concepts and practices, as well as example projects.

See the Tools & Tutorials page for more hands-on guides to doing text analysis.

Policies for Mining Licensed Content

If you wish to undertake a text or data mining project with content from the Library's licensed databases, please contact a Librarian to investigate options, which may include negotiating with the vendor or purchasing access to the data. Although many database licenses prohibit text and data mining and the use of software such as scripts, agents, or robots, we are actively negotiating text mining rights with database vendors. Unauthorized text or data mining in violation of our licenses can result in loss of access for the entire Wellesley College community.

Please also see our Best Practice Tips for mining licensed databases.

Acknowledgement

This overview was adapted from the original created by Boston College Libraries and is licensed under a Creative Commons Attribution 4.0 International License.

Text Analysis & Text Mining

Student Library Research Awards

Contact Us

Other Text Analysis Guides

What is Text Analysis and Text Mining?

Introductions to Text Analysis

Policies for Mining Licensed Content

Acknowledgement