Yahoo! Inc. (YHOO) Makes its AI Software Open Source

Yahoo Sunnyvale

Following the lead of other tech giants, yesterday Yahoo! Inc. (NASDAQ:YHOO) published the source code of its AI engine called CaffeOnSpark allowing anyone with a computer to use or modify it.

CaffeOnSpark, like most other recent AI projects, is based on deep learning, a machine learning method useful in helping machines recognize and sort through different kinds of user-generated data. Yahoo’s main assets in AI development is their photo website Flicker where users post billions of photographs they have taken.

Yahoo engineers wanted to make Flicker’s search function more reliable so that instead of merely searching for the descriptions and tags people had uploaded on the website, the program would know how to recognize specific common characteristics of photographs. It could achieve this by teaching its AI software what, for example, certain animals look like through deep learning.

While Yahoo no longer seems like a cutting edge tech company to most, its release of the open source code of their AI project follows the lead of many other companies. Recently, Google published its deep learning framework TensorFlow and Microsoft followed soon by releasing its framework CNTK. Facebook published the designs for the computer servers which run their latest AI projects while Chinese search giant Baidu released their deep learning training software and promised to follow it with more AI tools in the future.

There is a clear appetite in the industry for AI innovations and companies are competing with each other by releasing more and more complex open source code. It might seem counter intuitive to publish AI project on open source platform where they can be analyzed and even copied by competitors but companies like Google and Facebook hope that by making their technology public they will be able to attract more people to contribute to them and thus speed up development.

One of the major drawbacks of deep learning is that it requires huge amounts of data being moved around. If you want to teach an AI search engine, for example, how to recognize photographs of dogs, you have to first feed it a huge number of different images before it learns what a dog looks like. This usually requires taking data from an existing database and transferring it to a new set of servers.

This is precisely what Yahoo engineers did not want to do with Flicker’s huge image database. CaffeOnSpark, as the name suggests, combines two different existing programs: the deep learning framework Caffe and the data-crunching system Spark that can run on top of big data platform Hadoop (which is used Facebook and Twitter among others). What this effectively means is that Yahoo has developed a way to run deep learning processes on top of existing databases without having to transfer the data anywhere else.

Although few people see Yahoo as a serious competitor in the rush to develop better and better AI software, other companies will undoubtedly benefit from learning from their innovation.