Digital Scholarship Resource Guide: Text analysis (part 4 of 7) (Library of Congress)

Library of Congress: Digital Scholarship Resource Guide: Text analysis (part 4 of 7). “Clean OCR, good metadata, and richly encoded text open up the possibility for different kinds of computer-assisted text analysis. With instructions from humans (“code”), computers can identify information and patterns across large sets of texts that human researchers would be hard-pressed to discover unaided. For example, computers can find out which words in a corpus are used most and least frequently, which words occur near each other often, what linguistic features are typical of a particular author or genre, or how the mood of a plot changes throughout a novel. Franco Moretti describes this kind of analysis as ‘distant reading’, a play on the traditional critical method ‘close reading’. Distant reading implies not the page-by-page study of a few texts, but the aggregation and analysis of large amounts of data.”