Yahoo! (now Oath) has had a history of open sourcing some huge infrastructure components in the past. Examples of this are Hadoop and Storm which have each spawned entire industries. Oath has now open sourced Vespa the platform it uses to serve requests for popular sites like Yahoo Sports and Yahoo Finance. The importance of this cannot be underestimated.
Brief Overview of Vespa
The underpinnings of the system are made for high-uptime operations. There is an admin/ config cluster that controls the other Vespa clusters. There is a stateless Java container cluster that processes incoming requests. In order to feed responses back to the user, Vespa also has a content management and query cluster in the back-end. Think of this as when you go to Flickr and need to do a keyword search. Vespa can take that request, process data across billions of pieces of content, then serve search results along with recommendations in tens of milliseconds.
This is an extremely powerful application and something very few organizations have the internal teams to be able to develop and maintain. Oath says Vespa, for example, helps serve over 3 billion native ad requests per day and has reached peaks of over 140,000 requests per second.
Here is the architectural diagram for Vespa:
The key here is that the system is able to take requests as input and then distribute the processing and response quickly and efficiently. The Vespa response framework can utilize hand-built or deep learning based algorithms to craft ranking based responses.
Getting Started with Vespa
We checked out the Vespa documentation and it was surprisingly complete even with a docker enabled quick-start. One can start using text or JSON data on a HDFS Hadoop cluster, for example, to load content and start performing searches. While this is not going to replace Elastic Search overnight, there are likely some uses of ES where Vespa is a much better fit for.