Library Research Guides: Text Analysis & Text Mining: Terminology & Projects

A Sampling of Text Analysis Projects

The Viral Text Project: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines
Ryan Cordell and David Smith of Northeastern University's NULab use computational linguistics tools to analyze large databases of historical newspapers and investigate how news stories, short fiction, and poetry went "viral" in the 19th-century United States.
EarlyPrint
A collaborative project of Northwestern University and Washington University built around the approximately 25,000 texts from Early English Books online that are currently available through the Text Creation Partnership (TCP). The TCP transcriptions have been partially corrected and users are invited to contribute further corrections. The texts are also enriched with bibliographical, structural, and linguistic metadata. A Lab component offers a range of tools for text analysis as well as examples.
Robots Reading Vogue
A series of projects based on data mining in 2,700 covers and 400,000 pages of Vogue Magazine, published over more than a century. Analyses include searching and comparing word usage in ads, articles, and texts with an n-gram search; sorting ads by date, frequency, and industry; finding themes based on how often certain words appear close to one another (topic modeling); and others.
Rescuing Lost History: Using Big Data to Recover Black Women’s Lived Experiences
This study "employs Latent Dirichlet allocation (LDA) algorithms and comparative text mining to search 800,000 periodicals in JSTOR (Journal Storage) and HathiTrust from 1746 to 2014 to identify the types of conversations that emerge about Black women’s shared experience over time."
"Everything on Paper Will Be Used Against Me:" Quantifying Kissinger
Text analysis, visualization and historical interpretation of the National Security Archive's Kissinger Correspondence
Six Degrees of Francis Bacon
A digital reconstruction of the early modern social network of England created by text mining the Oxford Dictionary of National Biography.

Text Analysis Terminology

Text Analysis Glossary
A nice overview of key ideas (from Methodica: Digital Text Methods Commons at the University of Alberta)
Text and Data Mining Glossary
A 1-page introduction to some key concepts (from Elsevier)
Glossary of Digital Humanities Terms
An introduction to commonly used terms in text analysis and other areas of the Digital Humanities. From the Folger Shakespeare Library
What Are N-grams?
A brief introduction from a data scientist
Topic Modeling: A Basic Introduction
Explains the basic concepts of topic modeling, introduces some topic modeling tools, and points to additional resources.