9 Best Articles in 2021
microsoft.com
Microsoft researchers achieve new conversational speech recognition milestone
microsoft.com
6 min read · 35 saves · From 2017 · By Xuedong Huang, Technical Fellow, Microsoft Last year, Microsoft’s speech and dialog research group announced a milestone in reaching human parity on the Switchboard conversational speech…
Mozilla
Announcing the Initial Release of Mozilla’s Open Source Speech Recognition Model and Voice Dataset
Mozilla
5 min read · 66 saves · From 2017 · With the holiday, gift-giving season upon us, many people are about to experience the ease and power of new speech-enabled devices. Technical advancements have fueled the growth of speech interfaces…
In Language
PNASNews
Racial disparities in automated speech recognition
PNASNews
34 saves · 2020-03-24 · Automated speech recognition (ASR) systems are now used in a variety of applications to convert spoken language to text, from virtual assistants, to closed captioning, to hands-free computing. By analyzing a large corpus of sociolinguistic interviews with white and African American speakers, we demonstrate large racial disparities in the performance of five popular commercial ASR systems. Our results point to hurdles faced by African Americans in using increasingly widespread tools driven by speech recognition technology. More generally, our work illustrates the need to audit emerging machine-learning systems to ensure they are broadly inclusive.
Automated speech recognition (ASR) systems, which use sophisticated machine-learning algorithms to convert spoken language to text, have become increasingly widespread, powering popular virtual assistants, facilitating automated closed captioning, and enabling digital dictation platforms for health care. Over the last several years, the quality of these systems has dramatically improved, due both to advances in deep learning and to the collection of large-scale datasets used to train the systems. There is concern, however, that these tools do not work equally well for all subgroups of the population. Here, we examine the ability of five state-of-the-art ASR systems—developed by Amazon, Apple, Google, IBM, and Microsoft—to transcribe structured interviews conducted with 42 white speakers and 73 black speakers. In total, this corpus spans five US cities and consists of 19.8 h of audio matched on the age and gender of the speaker. We found that all five ASR systems exhibited substantial racial disparities, with an average word error rate (WER) of 0.35 for black speakers compared with 0.19 for white speakers. We trace these disparities to the underlying acoustic models used by the ASR systems as the race gap was equally large on a subset of identical phrases spoken by black and white individuals in our corpus. We conclude by proposing strategies—such as using more diverse training datasets that include African American Vernacular English—to reduce these performance differences and ensure speech recognition technology is inclusive.
github.com
GitHub - facebookresearch/wav2letter: Facebook AI Research Automatic Speech Recognition Toolkit
github.com
13 saves · From 2018 · wav2letter - Facebook AI Research Automatic Speech Recognition Toolkit
awni.github.io
Speech Recognition Is Not Solved
awni.github.io
5 min read · 26 saves · From 2017 · Ever since Deep Learning hit the scene in speech recognition, word error rates
have fallen dramatically. But despite articles you may have read, we still
don’t have human-level speech recognition. Speech recognizers have many failure
modes. Acknowledging these and taking steps towards solving them is critical to
progress. It’s the only way to go from ASR
which works for some people, most of the time to ASR which works for all
people, all of the time.
Google Cloud
Speech API - Speech Recognition — Google Cloud Platform
Google Cloud
2 min read · 10 saves · From 2016 · Cloud Speech API provides fast and accurate speech recognition, converting audio, either from a microphone or from a file, to text in over 80 languages and variants.
Business Insider
Microsoft's AI is getting crazily good at speech recognition
Business Insider
1 min read · 19 saves · From 2017 · Microsoft's speech recognition efforts have hit...
Real Python
The Ultimate Guide To Speech Recognition With Python
Real Python
20+ min read · 21 saves · From 2018 · An in-depth tutorial on speech recognition with Python. Learn which speech recognition library gives the best results and build a full-featured "Guess The Word" game with it.
Google AI
An All-Neural On-Device Speech Recognizer
Google AI
4 min read · 27 saves · 2019-03-12 · Posted by Johan Schalkwyk, Google Fellow, Speech Team In 2012, speech recognition research showed significant accuracy improvements with ...
Trending
livecodestream.dev
How-to Control your React App with your Voice
livecodestream.dev
5 min read · 11 saves · 2020-11-22 · Learn how to build voice-activated interfaces using the Chrome Speech Recognition API.
VentureBeat
Microsoft’s new settings let users contribute recordings to improve its speech recognition systems
VentureBeat
2 min read · Jan 15th · Kyle Wiggers · Microsoft's new settings allow users to contribute voice clips that'll be used to improve the company's AI speech technologies.
The Wall Street Journal
WeChat Becomes a Powerful Surveillance Tool Everywhere in China
The Wall Street Journal
21 saves · 2020-12-22 · China’s do-everything app, WeChat, has become one of the most powerful tools in Beijing’s arsenal for monitoring the public, censoring speech and punishing people who voice discontent with the…
scitechdaily.com
Light-Based Processor Chips Advance Machine Learning
scitechdaily.com
Jan 10th · International team of researchers uses photonic networks for pattern recognition. In the digital age, data traffic is growing at an exponential rate. The demands on computing power for applications in artificial intelligence such as pattern and speech recognition in particular, or for self-drivin
More like this
Google AI
An All-Neural On-Device Speech Recognizer
Google AI
4 min read · 27 saves · 2019-03-12 · Posted by Johan Schalkwyk, Google Fellow, Speech Team In 2012, speech recognition research showed significant accuracy improvements with ...
VentureBeat
Google’s speech recognition technology now has a 4.9% word error rate
VentureBeat
2 min read · 22 saves · From 2017 · Google CEO Sundar Pichai today announced that the company’s speech recognition technology now has achieved a 4.9 percent word error rate. Put another way, Google transcribes every 20th word…
Guardian Tech
Speech recognition is tech's next giant leap, says Google
Guardian Tech
4 min read · 15 saves · From 2018 · Samuel Gibbs · Company says spoken word already essential in developing countries with low literacy rates
talater.com
annyang! Easily add speech recognition to your site
talater.com
1 min read · 24 saves · From 2015 · annyang is a JavaScript SpeechRecognition library that makes adding voice commands to your site super-easy. Let your users control your site with their voice.
blogs.microsoft.com
Microsoft researchers achieve speech recognition milestone
blogs.microsoft.com
5 min read · 10 saves · From 2016 · Microsoft researchers have reached a milestone in the quest for computers to understand speech as well as humans. Xuedong Huang, the company’s chief speech scientist, reports that in a recent … Read more »
Google Cloud
Cloud Speech-to-Text
Google Cloud
2 min read · 17 saves · From 2018 · Cloud Speech-to-Text provides fast and accurate speech recognition, converting audio, either from a microphone or from a file, to text in over 120 languages and variants.
code.fb.com
Wav2letter++, the fastest open source speech system, and flashlight
code.fb.com
1 min read · 43 saves · From 2018 · Wav2letter++ is the fastest state-of-the-art end-to-end speech recognition system available. We're also releasing flashlight, a fast, flexible ML library.
medium.com
AI and Speech Recognition: A Primer for Chatbots
medium.com
5 min read · 13 saves · From 2016 · Francesco Corea · An analysis of conversational interfaces and startups working on speech recognition and chatbots.
In Brain
The Guardian
Neuroscientists decode brain speech signals into actual sentences
The Guardian
3 min read · 37 saves · 2019-07-30 · Study funded by Facebook aims to improve communication with paralysed patients