What Happened When Google Threw All Voice Data To The Blender. Answer: SpeechStew (Analytics India)

Analytics India: What Happened When Google Threw All Voice Data To The Blender. Answer: SpeechStew. “Training large models is a massive challenge as it requires collecting and annotating vast amounts of data. It is particularly challenging in the case of speech recognition models. To overcome this challenge, a team from Google Research and Google Brain have introduced an AI model, SpeechStew. The model is trained on a combination of datasets to achieve state-of-the-art results on various speech recognition benchmarks.”

The Register: How Facebook uses public videos to train, deploy machine-learning models and harvest those eyeballs

The Register: How Facebook uses public videos to train, deploy machine-learning models and harvest those eyeballs . “Facebook this week revealed an internal project to create machine-learning models that can understand visual, audio, and written content from videos publicly uploaded to its social network. One of the models, known as Generalized Data Transformations (GDT), is now used on Instagram. Users viewing short video recordings, or Reels, can quickly find other Reels they might like to watch, thanks to an AI-powered recommender system that picks similar clips that might be interesting.”

Engadget: Google wants you to train its AI by lip syncing ‘Dance Monkey’ by Tones and I

Engadget: Google wants you to train its AI by lip syncing ‘Dance Monkey’ by Tones and I. “Google is asking users to help teach its AI how to speak. A new ‘Experiments with Google’ called LipSync asks users to lip sync a small part of ‘Dance Monkey’ by Tones and I, Android Police reports. LipSync, which is built by YouTube for Chrome on desktop, will score your performance. It will then feed the video to Google’s AI — it doesn’t record any audio.”

Carnegie Mellon University: Live-Streamed Game Collects Sounds To Help Train Home-Based Artificial Intelligence

Carnegie Mellon University: Live-Streamed Game Collects Sounds To Help Train Home-Based Artificial Intelligence . “From yawning to closing the fridge door, a lot of sounds occur within the home. Such sounds could be useful for home-based artificial intelligence applications, but training that AI requires a robust and diverse set of samples. A video game developed by Carnegie Mellon University researchers leverages live streaming to collect sound donations from players that will populate an open-source database.”

The Register: MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs

The Register: MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs. “The training set, built by the university, has been used to teach machine-learning models to automatically identify and list the people and objects depicted in still images. For example, if you show one of these systems a photo of a park, it might tell you about the children, adults, pets, picnic spreads, grass, and trees present in the snap. Thanks to MIT’s cavalier approach when assembling its training set, though, these systems may also label women as whores or bitches, and Black and Asian people with derogatory language. The database also contained close-up pictures of female genitalia labeled with the C-word.”

CNET: Your face mask selfies could be training the next facial recognition tool

CNET: Your face mask selfies could be training the next facial recognition tool. “Your face mask selfies aren’t just getting seen by your friends and family — they’re also getting collected by researchers looking to use them to improve facial recognition algorithms. CNET found thousands of face-masked selfies up for grabs in public data sets, with pictures taken directly from Instagram.”

Quantum Stat: 100s of datasets for machine learning developers (and counting)

From last month, but I just learned about it today. Quantum Stat: 100s of datasets for machine learning developers (and counting). “With the advent of deep learning and the necessity for more and diverse data, researchers are constantly hunting for the most up-to-date datasets that can help train their ML model. Currently, NLP data seems to be scattered across several 3rd party libraries, Reddit, or in the research arms of big tech. And while these mediums are useful, there doesn’t seem to be a central hub for housing NLP data that can be easily reached and searched by the ML engineer. As a result, we’ve created the ‘Big Bad NLP Database,’ the world’s largest data library in natural language processing:”

The Next Web: This AI needs your help to identify child abusers by their hands

The Next Web: This AI needs your help to identify child abusers by their hands. “The research team has appealed to the public for help with the project. They want anyone over the age of 18 to upload photos of their hands through a new smartphone app. The images will then be added to a database that’s used to develop the hand comparison algorithms. The researchers say they need more than 5,000 images to prove beyond a reasonable doubt whether our hands are truly unique. They promise that everyone who participates will remain anonymous. The images will never shared with external agencies, and will be destroyed at the end of the project.”

The Next Web: Google’s new AI language model can comprehend entire books

The Next Web: Google’s new AI language model can comprehend entire books . “One of the prime challenges of a language-based AI model is to understand the context of the surrounding content. To solve this problem, Google has introduced a new model called Reformer, which understands the context of 1 million lines using just 16GB space. The company built this to solve problems of its old model Transformer — a neural network that compares words in a paragraph to each other to understand the relationship between them.”

The Horizons Tracker: The New Database To Help Robots Learn

The Horizons Tracker: The New Database To Help Robots Learn. “Databases such as ImageNet have long been the bedrock of the AI revolution we’re experiencing today. With 14 million or so images, they provide a vast repository of content with which to train algorithms. It’s a trick that roboticists are attempting to replicate with a new database, known as RoboNet.”

CNN: How your poop can help train AI

Behold, the GROSSEST THING I HAVE EVER PUT IN RESEARCHBUZZ, from CNN: How your poop can help train AI. “The next time you go to the bathroom, a couple startups are hoping you’ll snap a photo before you flush. For scientific reasons, of course. No, really. Two companies — Auggi, a gut-health startup that’s building an app for people to track gastrointestinal issues, and Seed Health, which works on applying microbes to human health and sells probiotics — are soliciting poop photos from anyone who wants to send them.”

Inside the 1TB ImageNet data set used to train the world’s AI: Nude kids, drunken frat parties, porno stars, and more (The Register)

The Register: Inside the 1TB ImageNet data set used to train the world’s AI: Nude kids, drunken frat parties, porno stars, and more. “ImageNet – a data set used to train AI systems around the world – contains photos of naked children, families on the beach, college parties, porn actresses, and more, scraped from the web to train computers without those individuals’ explicit consent. The library consists of 14 million images, each placed into categories that describe what’s pictured in each scene. This pairing of information – images and labels – is used to teach artificially intelligent applications to recognize things and people caught on camera.”