Wolfram Blog: Automated Authorship Verification: Did We Really Write Those Blogs We Said We Wrote?. “Several Months Ago… I wrote a blog post about the disputed Federalist Papers. These were the 12 essays (out of a total of 85) with authorship claimed by both Alexander Hamilton and James Madison. Ever since the landmark statistical study by Mosteller and Wallace published in 1963, the consensus opinion has been that all 12 were written by Madison (the Adair article of 1944, which also takes this position, discusses the long history of competing authorship claims for these essays). The field of work that gave rise to the methods used often goes by the name of ‘stylometry,’ and it lies behind most methods for determining authorship from text alone (that is to say, in the absence of other information such as a physical typewritten or handwritten note). In the case of the disputed essays, the pool size, at just two, is as small as can be. Even so, these essays have been regarded as difficult for authorship attribution due to many statistical similarities in style shared by Hamilton and Madison.”
Engadget: Amazon’s Textract AI can read millions of pages in a few hours. “Amazon has launched a new offering called Textract for its Web Services customers, and it’s like optical character recognition on steroids. It more than just extracts text from documents like its name implies — Amazon says it can actually identify different document formats and their contents so it can process them properly.” Apparently not available all over the US yet.
Phys .org: New research helps visualise sentiment and stance in social media. “How can you find and make sense of opinions and emotions in the vast amount of texts in social media? Kostiantyn Kucher’s research helps visualise for instance public opinions on political issues in tweets over time. In the future, analysis and visualisation of sentiment and stance could contribute to such tasks as detection of hate speech and fake news.”
EdTech Magazine: Digital Library Opens Avenues for Data Analysis in Academic Research. “At the HathiTrust Digital Library, there are no carrels, no tables, no card catalog and no reference desk. There’s almost nothing physical at all. This collection of nearly 17 million digitized volumes from dozens of campus libraries exists entirely online. An estimated 95 percent of those volumes were originally scanned by Google when it partnered with universities to create its Google Books project starting in 2002, says Mike Furlough, executive director of HathiTrust at the University of Michigan.”
Phys. org: Texts as networks: How many words are sufficient to identify an author?. “People are more original than they think—this is suggested by a literary text analysis method of stylometry proposed by scientists from the Institute of Nuclear Physics Polish Academy of Sciences. The author’s individuality can be seen in the connections between no more than a dozen words in an English text. It turns out that in Slavic languages, authorship identification requires even fewer words, and is more certain.”
Knowledge@Wharton: Using a Company’s Own Words to Assess Its Risks. “When analysts or academics want to assess the risks that a company faces, they usually look at macroeconomic factors or internal firm metrics such as a declining sales trend to calculate those risks. But research from Wharton doctoral candidate Alejandro Lopez Lira takes a different approach. He asked this question: What if, instead of letting the outside world tell us what risks a company faces, we let the company tell us itself? After all, a company knows its business best. Lopez Lira used machine learning to read through the annual reports of all U.S. public companies to find out which risks they identified as the most serious ones they face. And the results can be surprising.”
Tuscon: University of Arizona College of Science: Lum. AI. “Researchers worldwide publish 2.5 million journal articles each year, adding to the tens of millions of scholarly articles in circulation. For a researcher or clinician, developing a holistic understanding of a field — for example, the systematic matching of genomic alterations in a tumor with proper drug treatments — is an immense task. Now imagine that those researchers, faced with trying to understand the various mechanisms and cellular processes involved in a specific tumor type, had a new tool: an automated system that could review all that literature — analyzing each academic paper in seconds — and extract key information that could help them generate easily interpretable answers and conclusions.”