Industrial Project 234313

Serverless for Big Data Processing
in the Cloud

Let's Start

Introduction


Serverless architectures are application designs that incorporate third-party “Backend as a Service” (BaaS) services, and/or that include custom code run in managed, ephemeral containers on a
“Function as a Service” (FaaS) platform

PyWren is a python package that allows users’ nonoptimized code to run on thousands of cores using AWS Lambda.

IBM PyWren is an advanced PyWren extension created especially for IBM Cloud Functions service.




Goals


Demonstrate embarrassingly parallel algorithms with PyWren over IBM Cloud Functions.
The first candidate is Monte Carlo simulations.
Hyperparameter Optimizations - PoC of using IBM PyWren for hyperparameter optimizations with FaceBook FastText learning algorithm.
Code improvements to IBM PyWren and contribute code directly against open source repository.



Methodology


Working with open source Git repositories.

Write all project code with Python.

Get experienced with IBM Cloud services especially IBM Cloud Functions using IBM PyWren Python package.

Demonstrate embarrassingly parallel algorithms with PyWren over IBM Cloud Functions:

  • Demonstrating Monte Carlo simulations over IBM Cloud Functions using IBM PyWren.

  • Demonstrating how to exploit IBM PyWren to Hyperparameters Optimization for machine learning models.




Achievements


Creating Two Monte Carlo simulations:

  • Classic Monte Carlo simulation – estimating 𝜋 value.

  • Stock daily predictions using Monte Carlo simulation.

  • Writing an extensive report details all of our work, the usage of IBM Cloud Functions and IBM PyWren, an evaluation of performance and our conclusions.

  • The various code segments used for this section are available on GitHub including instructions and explanations.

Hyperparameters Optimization:

  • Creating a generic Python class that can evaluate any learning model using lists of hyperparameters sets over IBM – Cloud Functions.

  • Exploit IBM PyWren properties in order to achieve a “Two Dimension“ parallelism.

  • Writing an extensive report details all of our work, the usage of IBM Cloud Functions and IBM PyWren, an evaluation of performance and our conclusions.





Conclusions


Here are some of our main conclusions:

  • In a programmer perspective, it was very intuitive to use and implement a complex idea using IBM PyWren Python package thanks its simple API.
  • As the operation becomes less trivial and requires more computation power it takes less instances of it that IBM Cloud Functions using IBM PyWren performs better than a local machine.
  • Hyperparameters distributed over actions in IBM Cloud Functions make this process much more effective and faster for the programmer.
  • The time of executions when evaluating hyperparameters over IBM Cloud Functions mostly depends on the longest evaluation and not on number of hyperparameters to evaluate or the k - fold cross validation value.
  • No matter what algorithm the programmer choose in order to preform hyperparameters tuning, the algorithm can use IBM cloud Functions and IBM PyWren Python package in order to do it faster, more accurate and without super expensive hardware by simply provide an evaluation function and make analysis on the returned results by his own algorithm.



Blog


You can read more about our work in this IBM Cloud Blog.


Team


Industrial Supervisor

Gil Vernik, IBM

Computer Engineering Student

Ido Yehezkel

Academic Coordinator

Prof. Roy Friedman

Computer Engineering Student

Ohad Zohar

Academic Assistant

Liran Funaro