Poynter: For American fact-checkers working around gaps in government data, a lesson from Argentina. “Gaps in information frustrate the work of fact-checkers. But what about when a government agency creates them? ‘To know that the data has been tracked in the past and is maybe still tracked currently and is not being released — that just seems like a step backward,’ said Angie Holan, editor of PolitiFact (a project of the Poynter-owned Tampa Bay Times). Her concern stems from a recent change to the FBI’s 2016 crime report, which FiveThirtyEight reports now has close to 70 percent fewer tables than the 2015 version. Among the data tables missing in the report — the first to come out under the Trump administration’s FBI — is specific information about arrests, homicides and the only national estimate of gang-related murders.”
Washington Post: FBI database for gun buyers missing millions of records. “The FBI’s background-check system is missing millions of records of criminal convictions, mental illness diagnoses and other flags that would keep guns out of potentially dangerous hands, a gap that contributed to the shooting deaths of 26 people in a Texas church this week. Experts who study the data say government agencies responsible for maintaining such records have long failed to forward them into federal databases used for gun background checks — systemic breakdowns that have lingered for decades as officials decided they were too costly and time-consuming to fix.”
NIH: NIH Clinical Center provides one of the largest publicly available chest x-ray datasets to scientific community . “The NIH Clinical Center recently released over 100,000 anonymized chest x-ray images and their corresponding data to the scientific community. The release will allow researchers across the country and around the world to freely access the datasets and increase their ability to teach computers how to detect and diagnose disease. Ultimately, this artificial intelligence mechanism can lead to clinicians making better diagnostic decisions for patients.”
Business Insider: How a nerdy Swedish database startup with $80m in funding cracked the Paradise Papers. “Emil Eifrem was driving home from his goddaughter’s fifth birthday party in Gothenburg, Sweden, when his phone started buzzing. A stream of notifications alerted him to the Paradise Papers, a massive leak which showed how the world’s richest people use offshore havens to shield their wealth. ‘I switched seats with my wife,’ he said. ‘We turned on the radio, and as I’m sitting in the car I’m pulling up my laptop, trying to hotspot. I knew what my Monday would be like.’ Over the next 24 hours, Eifrem knew he’d be fielding a bunch of interview requests about the leaks.”
Curbed Philadelphia: New Atlas tool has everything you need to know about Philly properties. “Searching for homes and vacant lots in Philly is about to get a whole lot easier with the launch of Atlas, a new online mapping tool that pools nearly every bit of information about one property into one place…. Atlas now compiles everything one needs to know about a single address into one comprehensive database, including deed information, permits, 311 data, crime statistics, zoning appeals, and the registered community organization (RCO) that’s associated with that property. It also includes historic imagery of the site, dating as far back as 1860.”
Phys.org: Web-based system automatically evaluates proposals from far-flung data scientists. “In the analysis of big data sets, the first step is usually the identification of “features”—data points with particular predictive power or analytic utility. Choosing features usually requires some human intuition. For instance, a sales database might contain revenues and date ranges, but it might take a human to recognize that average revenues—revenues divided by the sizes of the ranges—is the really useful metric. MIT researchers have developed a new collaboration tool, dubbed FeatureHub, intended to make feature identification more efficient and effective. With FeatureHub, data scientists and experts on particular topics could log on to a central site and spend an hour or two reviewing a problem and proposing features. Software then tests myriad combinations of features against target data, to determine which are most useful for a given predictive task.”
There is a new subreddit for open source databases. Very small and not much here yet, but I subscribed in a blink.