Texts as networks: How many words are sufficient to identify an author? (Phys .org)

Phys. org: Texts as networks: How many words are sufficient to identify an author?. “People are more original than they think—this is suggested by a literary text analysis method of stylometry proposed by scientists from the Institute of Nuclear Physics Polish Academy of Sciences. The author’s individuality can be seen in the connections between no more than a dozen words in an English text. It turns out that in Slavic languages, authorship identification requires even fewer words, and is more certain.”

Knowledge@Wharton: Using a Company’s Own Words to Assess Its Risks

Knowledge@Wharton: Using a Company’s Own Words to Assess Its Risks. “When analysts or academics want to assess the risks that a company faces, they usually look at macroeconomic factors or internal firm metrics such as a declining sales trend to calculate those risks. But research from Wharton doctoral candidate Alejandro Lopez Lira takes a different approach. He asked this question: What if, instead of letting the outside world tell us what risks a company faces, we let the company tell us itself? After all, a company knows its business best. Lopez Lira used machine learning to read through the annual reports of all U.S. public companies to find out which risks they identified as the most serious ones they face. And the results can be surprising.”

University of Arizona College of Science: Lum. AI (Tuscon)

Tuscon: University of Arizona College of Science: Lum. AI. “Researchers worldwide publish 2.5 million journal articles each year, adding to the tens of millions of scholarly articles in circulation. For a researcher or clinician, developing a holistic understanding of a field — for example, the systematic matching of genomic alterations in a tumor with proper drug treatments — is an immense task. Now imagine that those researchers, faced with trying to understand the various mechanisms and cellular processes involved in a specific tumor type, had a new tool: an automated system that could review all that literature — analyzing each academic paper in seconds — and extract key information that could help them generate easily interpretable answers and conclusions.”

Washington Post: Step aside Edison, Tesla and Bell. New measurement shows when U.S. inventors were most influential.

Washington Post: Step aside Edison, Tesla and Bell. New measurement shows when U.S. inventors were most influential.. “The U.S. patent office has stockpiled the text to more than 10 million patents. But that’s often all they have: an enormous amount of text. Many early patents lack any form of citation or industry specification, which researchers could use to understand the history of American invention. Now a team of economists has created a clever algorithm that processes that text — often the only consistent data we have for many of the country’s most famous inventions — to create a measure of the influential inventors and industries of the past 180 years.”

Virtual Victorians: Using 21st-century technology to evaluate 19th-century texts (Princeton University)

Princeton University: Virtual Victorians: Using 21st-century technology to evaluate 19th-century texts. “In the 19th century, printing technology changed the way readers experienced texts. Today, students and researchers are using digital technology to access historical literary texts in new ways and finding surprising echoes of the past in their own lives.”

Revisiting the Disputed Federalist Papers: Historical Forensics with the Chaos Game Representation and AI (Wolfram Blog)

Wolfram Blog: Revisiting the Disputed Federalist Papers: Historical Forensics with the Chaos Game Representation and AI. “In 1944 Douglass Adair published ‘The Authorship of the Disputed Federalist Papers,’ wherein he proposed that [James] Madison had been the author of all 12. It was not until 1963, however, that a statistical analysis was performed. In ‘Inference in an Authorship Problem,’ Frederick Mosteller and David Wallace concurred that Madison had indeed been the author of all of them. An excellent account of their work, written much later, is Mosteller’s ‘Who Wrote the Disputed Federalist Papers, Hamilton or Madison?.’ His work on this had its beginnings also in the 1940s, but it was not until the era of ‘modern’ computers that the statistical computations needed could realistically be carried out.”

Kaylin Walker: Tidy Text Mining Beer Reviews

Kaylin Walker: Tidy Text Mining Beer Reviews. “BeerAdvocate.com was scraped for a sample of beer reviews, resulting in a dataset of 31,550 beers and their brewery, beer style, ABV, total numerical ratings, number of text reviews, and a sample of review text. Review text was gathered only for beers with at least 5 text reviews. A minimum of 2000 characters of review text were collected for those beers, with total length ranging from 2000 to 5000 characters.”