Microsoft (MSFT)’s DefinedCrowd is Connecting Native Speakers to Machine Learning

Microsoft Campus
A building on the Microsoft Headquarters campus is pictured July 17, 2014 in Redmond, Washington. (Stephen Brashear/Getty Images)

DefinedCrowd, part of the Microsoft Corporation (NASDAQ:MSFT)’s Accelerator group of startups has been filling up a void that is in the data and machine learning community. The startup has been providing new real-time feeds of some rich language data, which is actually verified by some reputable people around the world.

The need is stemming from the Catch-22, which is often associated with deep data analysis. It says that you have to understand the data you have so that you can be able to analyze it, but then you must also analyze it so as to understand it.

Daniela Braga, the co-founder and chief scientist of DefinedCrowd said in the AI market, Siri and other assistant features such as Cortana required huge amounts of voice recordings. Transcriptions of the voices are needed, intents and also empathy labels for the voices. So crowd input can come in and help provide an extra refinement that no other machine is capable of doing.

Amy Du, co-founder, and CEO also said that in the event of a sentiment analysis learning model, and you ate trying to make the machine learn about a user who is making a particular set of tweets, then are the happy or excited? In such cases, the difference will be very minute, and that is where the technique for crowdsourcing comes into play.

From there, the grammar is then standardized, and other slang words like ‘u’ get changed to their original words and any emoji’s are removed out. Rather users are then given options to score some phrases or sentences from scores between 0-5. After the scoring input, some users can score the same phrase, and we move on from that. This all, however, depends on the complexity of the data, but if all things are put into the plan and native speakers start to chime in periodically, then the data can be changed in enough time for updates.

Du said that the company was partnering with some international universities for their project. She goes on to say that they usually begin with the linguistics department of the University, and they try to start a relationship with the local language ambassador, to build trust. After creating a base, they can then bring in other students from the region. Braga also pointed out that the company now were present in 30 countries, and they were hoping to enter into 50 countries by July. She said they had a steady number of 50-100 people in each country.

Du has worked in tech consulting for many years now, and her expertise is all in helping major companies with crowdsourcing. Braga, on the other hand, started as a linguistics professor before she partnered with Microsoft on some NLP-related projects such as Cortana.