Wednesday, 3.12.2014, 11:30
GraphLab started as research project at Carnegie Mellon University were our main goal was implementing machine learning methods in large scale, (and writing papers about it!). Despite thousands of users of our open source software (and many papers written), the experience of setting up the system and using a distributed system was quite frustrating, where our target audience was mainly double PhDs with machine learning and graph theory & distributed systems background.
When forming a company around GraphLab, we have decided to switch our focus and make machine learning much more accessible for the masses. In this talk I will summarize our experience in this direction. I will describe data structures we found useful for many commonly repeating computation patterns in machine learning, including support for dense data, graph (sparse) data, text records and even images. I will describe how we tackle the problem of wrapping up machine learning algorithms in a way which is user friendly to non experts, while algorithmic details can be tweaked layer. During the talk I will show many demos of cool applications that run on GraphLab.
About GraphLab: GraphLab is a popular open source big data analytics project initiated in 2009 at Carnegie Mellon University. GraphLab implements numerous machine learning algorithms using a highly efficient multicore machines and over clusters. A year and a half ago we have formed a company in Seattle to backup the open source project, and grown to a machine learning research group of around 12 PhDs. GraphLab is free for academic usages, and has a major open source component.
Danny is holding a PhD in distributed systems at the Hebrew University (advised by Prof. Danny Dolev and Prof. Dahlia Malkhi), and was a postdoc at the machine learning department at Carngie Mellon University (advised by Prof. Carlos Guestrin and Prof. Joe Hellerstein from Berkeley). Danny was involved in the GraphLab project from its early stages and is a co-founder of the GraphLab company. His research interests are distributed algorithms and large scale machine learning.