Ryan Cordell and David Smith of Northeastern University's NULab use computational linguistics tools to analyze large databases of historical newspapers and investigate how news stories, short fiction, and poetry went "viral" in the 19th-century United States.
A collaborative project of Northwestern University and Washington University built around the approximately 25,000 texts from Early English Books online that are currently available through the Text Creation Partnership (TCP). The TCP transcriptions have been partially corrected and users are invited to contribute further corrections. The texts are also enriched with bibliographical, structural, and linguistic metadata. A Lab component offers a range of tools for text analysis as well as examples.
A series of projects based on data mining in 2,700 covers and 400,000 pages of Vogue Magazine, published over more than a century. Analyses include searching and comparing word usage in ads, articles, and texts with an n-gram search; sorting ads by date, frequency, and industry; finding themes based on how often certain words appear close to one another (topic modeling); and others.
This study "employs Latent Dirichlet allocation (LDA) algorithms and comparative text mining to search 800,000 periodicals in JSTOR (Journal Storage) and HathiTrust from 1746 to 2014 to identify the types of conversations that emerge about Black women’s shared experience over time."