South China Morning Post: Baidu creates ‘world’s largest’ Chinese natural language processing database

South China Morning Post: Baidu creates ‘world’s largest’ Chinese natural language processing database. “Chinese search engine giant Baidu has launched what it says is the world’s largest Chinese natural language processing (NLP) database, among several other artificial intelligence (AI) products, as it seeks to diversify its revenue sources. NLP is a branch of AI involved in making computers understand the way humans naturally talk and type online, turning such information into structured data for further analysis.”

Neowin: New wiki project – Abstract Wikipedia – will boost content across languages

Neowin: New wiki project – Abstract Wikipedia – will boost content across languages. “The project was first proposed in a 22-page paper by Denny Vrandečić, founder of Wikidata, earlier this year. He had floated a new idea that would allow contributors to create content using abstract notation which could then be translated to different natural languages, balancing out content more evenly, no matter the language you speak.” My head would absolutely not wrap around this until I saw a page of examples.

IEEE Spectrum: Natural Language Processing Dates Back to Kabbalist Mystics

IEEE Spectrum: Natural Language Processing Dates Back to Kabbalist Mystics. “While specific technologies have changed over time, the basic idea of treating language as a material that can be artificially manipulated by rule-based systems has been pursued by many people in many cultures and for many different reasons. These historical experiments reveal the promise and perils of attempting to simulate human language in non-human ways—and they hold lessons for today’s practitioners of cutting-edge NLP techniques. The story begins in medieval Spain.”

CNET: Google search engine will better understand natural speech, not just keywords

CNET: Google search engine will better understand natural speech, not just keywords. “Google’s search engine will now better understand your confusing search queries, the company said Friday. Google said it’s updating the tool to improve analysis of natural language. The idea is to let people type in queries that reflect how they speak in real life, instead of entering a string of keywords they think the software is more likely to understand.” I’m a little nonplussed by this; natural language searching has been a thing for a long time. Remember Ask Jeeves? Remember Electric Monk?

Quartz: The emails that brought down Enron still shape our daily lives

Quartz: The emails that brought down Enron still shape our daily lives. “The Enron Corpus, as the collection is known, has been used in more than 100 projects since that research team presented it to the public in 2004. As the biggest public collection of natural written language in an organizational setting, it has been used to study everything from statistics to artificial intelligence to email attachment habits. An online art project by two Brooklyn artists will send every single one of the emails to your personal inbox, a process which (depending on the frequency of emails you request) will take anywhere from seven days to seven years.”

G Suite Updates: Visualize data instantly with machine learning in Google Sheets

G Suite Updates: Visualize data instantly with machine learning in Google Sheets. “Explore in Sheets, powered by machine intelligence, helps teams gain insights from data, instantly. Simply ask questions—in words, not formulas—to quickly analyze your data. For example, you can ask ‘what is the distribution of products sold?’ or ‘what are average sales on Sundays?’ and Explore will help you find the answers.”

Tool Translates Natural Language to SQL Queries

Geektime has a writeup on a tool that translates natural language questions into SQL queries. “Kueri’s system enables developers to implant a unique search box within apps. The search box knows how to take questions from end users in natural language … and translate them into SQL queries in real time. The app can run the queries through the database and display the results to the user. In addition, in order to make it even easier for the end user, it facilitates automatic completion during typing, with completions of words and smart suggestions according to the context of the search and database.”

Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages

Nifty! Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages. “Microblogging platforms such as Twitter provide active communication channels during mass convergence and emergency events such as earthquakes, typhoons. During the sudden onset of a crisis situation, affected people post useful information on Twitter that can be used for situational awareness and other humanitarian disaster response efforts, if processed timely and effectively. Processing social media information pose multiple challenges such as parsing noisy, brief and informal messages, learning information categories from the incoming stream of messages and classifying them into different classes among others. One of the basic necessities of many of these tasks is the availability of data, in particular human-annotated data. In this paper, we present human-annotated Twitter corpora collected during 19 different crises that took place between 2013 and 2015. To demonstrate the utility of the annotations, we train machine learning classifiers. Moreover, we publish first largest word2vec word embeddings trained on 52 million crisis-related tweets. To deal with tweets language issues, we present human-annotated normalized lexical resources for different lexical variations.”

MIT Technology Review: How To Prevent a Plague of Dumb Chatbots

MIT Technology Review: How to prevent a plague of dumb chatbots. “You can now chat with all sorts of bots through a number of messaging services including Kik, WeChat, Telegram, and now, Facebook Messenger. Some are simply meant to entertain, but a growing number are designed to do something useful. You can now book a flight, peruse the latest tech headlines, and even buy a hamburger from Burger King by typing messages to a virtual helper. Startups are racing to offer tools for speeding the development, management, and ‘monetization’ of these virtual butlers.”

New Site: Finding Movies by Describing Scenes

TechHive has a story about a new tool that lets you find movies by describing scenes. “Valossa, claims to page through a given movie on a scene-by-scene basis, identifying more than one thousand concepts (places, objects, and themes) from any video stream. The technology allows you to search using natural-language queries, and there’s even a beta version of the technology that allows voice searches using the Alexa digital assistant in Amazon’s Echo connected speaker.”

I gave it a quick whirl. It flunked Michelle Yeoh on a motorcycle (Silver Hawk, very silly movie, but Michelle Yeoh jumps the Great Wall of China on a motorcycle), but passed Whoopi Goldberg is a nun (Sister Act I and II, though the sequel was listed first for some reason) and Barbara Stanwyck on a ship (finding both The Lady Eve and Titanic).