Dartmouth: Using Social Media Big Data to Combat Prescription Drug Crisis

Dartmouth: Using Social Media Big Data to Combat Prescription Drug Crisis. “Researchers at Dartmouth, Stanford University, and IBM Research, conducted a critical review of existing literature to determine whether social media big data can be used to understand communication and behavioral patterns related to prescription drug abuse. Their study found that with proper research methods and attention to privacy and ethical issues, social media big data can reveal important information concerning drug abuse, such as user-reported side effects, drug cravings, emotional states, and risky behaviors.”

Poynter: For American fact-checkers working around gaps in government data, a lesson from Argentina

Poynter: For American fact-checkers working around gaps in government data, a lesson from Argentina. “Gaps in information frustrate the work of fact-checkers. But what about when a government agency creates them? ‘To know that the data has been tracked in the past and is maybe still tracked currently and is not being released — that just seems like a step backward,’ said Angie Holan, editor of PolitiFact (a project of the Poynter-owned Tampa Bay Times). Her concern stems from a recent change to the FBI’s 2016 crime report, which FiveThirtyEight reports now has close to 70 percent fewer tables than the 2015 version. Among the data tables missing in the report — the first to come out under the Trump administration’s FBI — is specific information about arrests, homicides and the only national estimate of gang-related murders.”

Linux Journal: Slicing Scientific Data

Linux Journal: Slicing Scientific Data. “I’ve covered scientific software in previous articles that either analyzes image information or actually generates image data for further analysis. In this article, I introduce a tool that you can use to analyze images generated as part of medical diagnostic work. In several diagnostic medical tests, complex three-dimensional images are generated that need to be visualized and analyzed. This is where 3D Slicer steps into the workflow. 3D Slicer is a very powerful tool for dissecting, analyzing and visualizing this type of complex 3D imaging data. It is fully open source, and it’s available not only on Linux, but also on Windows and Mac OS X.”

Business Insider: How a nerdy Swedish database startup with $80m in funding cracked the Paradise Papers

Business Insider: How a nerdy Swedish database startup with $80m in funding cracked the Paradise Papers. “Emil Eifrem was driving home from his goddaughter’s fifth birthday party in Gothenburg, Sweden, when his phone started buzzing. A stream of notifications alerted him to the Paradise Papers, a massive leak which showed how the world’s richest people use offshore havens to shield their wealth. ‘I switched seats with my wife,’ he said. ‘We turned on the radio, and as I’m sitting in the car I’m pulling up my laptop, trying to hotspot. I knew what my Monday would be like.’ Over the next 24 hours, Eifrem knew he’d be fielding a bunch of interview requests about the leaks.”

Phys. org: Web-based system automatically evaluates proposals from far-flung data scientists

Phys.org: Web-based system automatically evaluates proposals from far-flung data scientists. “In the analysis of big data sets, the first step is usually the identification of “features”—data points with particular predictive power or analytic utility. Choosing features usually requires some human intuition. For instance, a sales database might contain revenues and date ranges, but it might take a human to recognize that average revenues—revenues divided by the sizes of the ranges—is the really useful metric. MIT researchers have developed a new collaboration tool, dubbed FeatureHub, intended to make feature identification more efficient and effective. With FeatureHub, data scientists and experts on particular topics could log on to a central site and spend an hour or two reviewing a problem and proposing features. Software then tests myriad combinations of features against target data, to determine which are most useful for a given predictive task.”

The Next Web: Companies are collecting a mountain of data. What should they do with it?

The Next Web: Companies are collecting a mountain of data. What should they do with it?. “From our tweets and status updates to our Yelp reviews and Amazon product ratings, the internet-connected portion of the human race generates 2.5 quintillion bytes of computer data every single day. That’s 2.5 million one-terabyte hard drives filled every 24 hours. The takeaway is clear: in 2017, there’s more data than there’s ever been, and there’s only more on the way. So what are savvy companies doing to harness the data that their human users shed on a daily basis?”

Library of Congress: Announcing the Library of Congress Congressional Data Challenge

Library of Congress: Announcing the Library of Congress Congressional Data Challenge. “Today we launch a Congressional Data Challenge, a competition asking participants to leverage legislative data sets on congress.gov and other platforms to develop digital projects that analyze, interpret or share congressional data in user-friendly ways. ‘There is so much information now available online about our legislative process, and that is a great thing,’ said Librarian of Congress Carla Hayden. ‘But it can also be overwhelming and sometimes intimidating. We are asking citizen coders to explore ways to analyze, interpret or share this information in user-friendly ways. I hope this challenge will spark an interest in the legislative process and also a spirit of information sharing by the tech-savvy and digital humanities pioneers who answer the call. I can’t wait to see what you come up with.’ “