Technical Report MSC-2016-03

Title: Extracting Code from Programming Tutorial Videos
Authors: Shir Yadid
Supervisors: Eran Yahav
PDFCurrently accessibly only within the Technion network
Abstract: The amount of programming tutorial videos on the web increases on a daily basis. Video hosting sites such as YouTube host millions of video lectures, with many programming tutorials for various languages and platforms. Automatically understanding the content of such videos is desirable for many purposes, including search, targeting of ads, and referrals to semantically related content. We present a novel approach for extracting code from videos. Our technique extracts and recognizes code directly from the video, and is based on the following ideas: (i) consolidating code across frames to improve precision of extraction, (ii) a combination of statistical language models for applying corrections at different levels, allowing us to perform corrections by choosing the most likely token, combination of tokens that forms a likely line structure, and combination of lines that lead to a likely code fragment in the language.

We have implemented our approach in a tool called ACE, and used it to extract code from 40 Android video tutorials on YouTube. Our evaluation shows that ACE extracts code with high precision, enabling deep indexing of video tutorials.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2016
To the main CS technical reports page

Computer science department, Technion