EurekAlert: New software, HyperTools, transforms complex data into visualizable shapes . “Every dataset in the observable universe has a fundamental geometry or shape to it, but that structure can be highly complicated. To make it easier to visualize complicated datasets, a Dartmouth research team has created HyperTools– an open-source software package that leverages a suite of mathematical techniques to gain intuitions about high-dimensional datasets through the underlying geometric structures they reflect. The findings are published in the Journal of Machine Learning Research.”
Santa Fe Institute: New online class offers tools for tackling fundamental questions. “For more than a century, scientists have been using probability and statistics to measure the natural world. They want to make sense of data and find meaningful signals in the noise. But in the last few years, classical statistics have started to seem a little threadbare. Researchers now have access to large datasets, which are driving new insights in disciplines ranging from biology to ecology to economics…. The data have changed. Maybe it’s time our data analysis tools did, too. That’s one of the core ideas behind ‘Algorithmic Information Dynamics,’ a new online course offered through SFI’s online education portal, Complexity Explorer.” The course is not free, but it only costs $50 and that includes a textbook.
PR Newswire: Pulsar Launches “Google Trends” for Social Media (PRESS RELEASE.) “Audience intelligence platform Pulsar has launched a new product that lets users map real-time and historical trends instantly with access to 12 years worth of public data, from the very first Tweet back in March 2006 to today…. TRENDS is the first Pulsar product to be commercialised with a freemium model: users are able to access real-time TRENDS for free and only have to purchase a subscription to access historical TRENDS. Subscribers have access to unlimited data and unlimited queries for a flat 12 month fee which makes the user experience extremely flexible.”
ScienceBlog: Computer Searches Telescope Data For Evidence Of Distant Planets. “As part of an effort to identify distant planets hospitable to life, NASA has established a crowdsourcing project in which volunteers search telescopic images for evidence of debris disks around stars, which are good indicators of exoplanets. Using the results of that project, researchers at MIT have now trained a machine-learning system to search for debris disks itself. The scale of the search demands automation: There are nearly 750 million possible light sources in the data accumulated through NASA’s Wide-Field Infrared Survey Explorer (WISE) mission alone.”
Linux Insider: SpaceChain, Arch Aim to Archive Human Knowledge in Space. “SpaceChain on Monday announced that it has entered a partnership with the Arch Mission Foundation to use open source technology to launch an ambitious project involving the storage of large data sets in spacecraft and on other planets. Arch Mission will load large quantities of data onto SpaceChain’s satellite vehicles with the eventual aim of storing data on other planets.” This is from a couple of weeks ago but I had not seen it before.
Phys.org: Big data hype hasn’t led to tangible results in the social sciences, expert says . “Despite the great progress in basic research, such as speech recognition and image processing, success stories of existing big data applications in the social sciences are scarce. As early as 2014, big data plummeted from the “Peak of Inflated Expectations” to the “Trough of Disillusionment” phase in the Gartner Hype Cycle. In the basic sciences, the focus is on the technical prerequisites for efficiently recording and storing large quantities of data and automatically processing them. Artificial intelligence methods such as machine learning have great potential here. Only the social sciences have so far benefited little from this, and even seem to be losing ground to other disciplines. I notice that instead of drawing benefit from the flood of data for their empirical research, social scientists are often overwhelmed by the opportunities that arise.”
FreeCodeCamp: We just released 3 years of freeCodeCamp chat history as Open Data — all 5 million messages of it. “This dataset is a record of activity from freeCodeCamp’s most popular chatroom, the general chatroom, which the Gitter team has told me is the most active room on all of Gitter. The dataset contains posts from learners, bots, moderators, and contributors between December 31, 2014 and December 9, 2017.”