Top 5 Speech Recognition AI Platforms

Speech recognition technology is a form of AI. Automatic Speech Recognition (ASR) enables machines to register a set of words they hear used in spoken language, match them to a text transcription, and then search multiple transcribed possible responses for the best fit to provide a spoken reply and take appropriate action. As the technology […]

Written by Clive Reffell

Jan 21, 2021

Speech recognition technology is a form of AI. Automatic Speech Recognition (ASR) enables machines to register a set of words they hear used in spoken language, match them to a text transcription, and then search multiple transcribed possible responses for the best fit to provide a spoken reply and take appropriate action. As the technology becomes more commonplace, this article takes a look at five of the top platforms that provide speech and text ASR datasets for commercial exploitation. Voice recognition, to be clear, is a separate matter and concerns the technology for identifying a person based on their voice.

Smart technology is increasingly creating opportunities for the integration of speech recognition tools to improve customer/end-user experience. More and more smart gadgets and smart devices are coming to the market place with various speech and voice enabled tools. A previous CSW article took a look at advances in the automotive, healthcare, domestic appliance and banking sectors. The combined speech and voice recognition market is forecast to grow at a compound annual growth rate of 17.2% from 2019 to reach $26.8 billion by 2025.

On one hand, increasing use of voice control systems for appliances, vehicles, banking services and using our smartphones is making people more familiar with the technology and building higher levels of expectation. Covid has also accelerated the trend to low-touch technology. On the other, the higher cost of smart devices, the still relatively low awareness of different functionalities of speech enabled devices, and sometimes a lack of accuracy of voice-enabled devices to recognize regional accents and dialects, can breed apathy among those that don’t yet use it and frustration among those that do. 

Here are five top platforms that provide B2B services, built on using a crowdsourcing model, to support the growing commercial use of ASR (Automatic Speech Recognition) technology.


Automatic Speech RecognitionAmelia is the world’s largest privately held AI software company delivering cognitive, conversational ASR dataset solutions for business. Amelia streamlines IT operations, automates processes, increases workforce productivity and improves customer satisfaction through teaming humans with digital employees to unleash creativity and deliver business value at scale. The digital employees are capable of examining masses more data, and infinitely faster, than a human operative. Yet human ingenuity can be required to make less obvious decisions of what to look at, and to associate what appear to be disparate factors.

Head office is  in New York City with offices in 15 countries. Amelia aims to deliver improved bottom line results for more than 500 of the world’s leading brands across IT services, financial services and banking, insurance, telecommunications, retail, manufacturing, healthcare and other sectors.


DefinedCrowd is a provider of high-quality c and an overarching infrastructure of solutions for training artificial intelligence, all focused on making AI smarter. Its head office is in Seattle, Washington State, US; other offices are in Lisbon and Porto in Portugal, and Tokyo in Japan.

Innovative Solutions for Call Centres use Crowdsourced Data Sets for Better Automated Speech RecognitionThe company sources, structures, and enriches Automatic Speech Recognition datasets that empower their clients to launch AI products faster and with quality. By combining machine learning and human intelligence, the company’s goal is to create a natural interaction between people and machines towards a smarter future. Speech data is transcribed by a separate crowd as it increases the level of accuracy, and then annotation of the text is carried out by a further crowd.

By leveraging its proprietary Neevo crowd of over 300,000 global contributors, plus market-leading workflow automation, DefinedCrowd focuses on spoken ASR datasets, neuro-linguistic programming (NLP), computer vision and translation to fuel world-class AI models. DefinedCrowd’s high-quality data  is available in a variety of delivery options, including off-the-shelf data and customized collection, and in over 50 languages to help global AI initiatives drive business goals.


Appen provides high-quality training data to confidently deploy world-class AI. Remote work is changing how the world does business, and Appen is a sector pioneer. They help their clients enhance best-in-class speech-operated products and services around the world, including search engines, social media platforms, voice recognition systems, sentiment analysis, and eCommerce sites.

They do this from their base in Sydney, Australia, through tapping in to their crowd of more than one million people to help clients meet the ever-changing needs of their customers through employing international diversity and flexibility. Annotators are readily available 24/7 for simple microtask annotations that don’t require a particular skill set, or custom crowds of skilled annotaters can be recruited for specific task ASR datasets. For work involving sensitive or confidential information, specially identified and certified annotators can be located at one of Appen’s secure facilities to focus on the task.


ASR datasets providerFor more than 20 years, Lionbridge has helped some of the world’s largest technology companies connect with their global customers through improved Automatic Speech Recognition. Data annotation is the essential process of labeling data to make it usable for AI systems, and Lionbridge AI annotates ASR datasers in text, images, videos and audio in more than 300 languages and dialects.

Through their platform they orchestrate a crowd of over one million professional annotators, qualified linguists and in-country language speakers across six continents. At any time there are between 30,000-50,000 members of this advanced community deployed in any of more than 5,000 cities, partnering with brands to create culturally rich customer experiences, using colloquial phrasing and local dialects.

In November 2020, the private equity owner of Lionbridge Technologies announced it was selling Lionbridge AI, the data annotation division, to Canadian IT and communications company TELUS for approximately CAD 1.2bn (USD 935m). TELUS brands itself as a “digital customer experience (CX) innovator,” and said TELUS International had acquired Lionbridge AI to “support important AI applications as demand for high-quality, multilingual data annotation continues to increase.” We are not yet aware of any rebranding proposals.

Headquartered in Waltham, Massachusetts, Lionbridge AI operates around the world, including in the US, Ireland, Finland, India, UK, Japan, Denmark, Costa Rica and South Korea.


The provision of training data for any business to develop and add a speech recognition facility to its customer service is a prerequisite essential. Speechocean, founded in 2005 in Beijing, provides large speech and text ASR databases and data-related services in 110 languages and accents covering 70 countries. 

ASR datasetsThey provide services for the design, collection, transcription, annotation and validation of data, covering requirements in technical fields of speech recognition, speech synthesis, computer vision, lexicon, image recognition, machine translation, web search and natural language processing. 

In addititon to ASR datasets they also provide image and video data collection and annotation services, including facial and expression images, optical character recognition (OCR) and handwriting, self-driving vehicles, and many more AI applications.

Speechocean’s clients are largely industrial enterprises and scientific research institutions. Its crowd of data providers are recruited through a membership scheme in which they acquire points. Triple points are awarded for voice data resources involving emotional speech, rare dialects and ethnic-minority-languages.

We have a Crowd Session, a live virtual roundtable event, covering aspects of Crowdsourcing AI and Speech Recognition on February 4. More information is available and Registration is open.

About Author

About Author

Clive Reffell

Clive has worked with Crowdsourcing Week on sourcing and creating content since May 2016. With knowledge and experience gained in a 30+ year marketing career based in London, UK, he operates as an independent crowdfunding advisor helping SMEs and startups to run successful crowdfunding projects, and with wider social media and content marketing issues.

You may also like

Top Real Estate Crowdfunding Platforms in North America

Top Real Estate Crowdfunding Platforms in North America

Real estate crowdfunding offers a chance for retail investors to be involved in the property market without the hefty upfront costs of buying a whole building, and then without the requirement to maintain any part of a building. Real estate investments can be made...

Speak Your Mind


Submit a Comment

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.