Technical Report PHD-2013-02

TR#:PHD-2013-02
Class:PHD
Title: Learning to Predict the Future using Web Knowledge and Dynamics
Authors: Kira Radinsky
Supervisors: Shaul Markovitch, Nir Ailon
PDFPHD-2013-02.pdf
Abstract: Mark Twain famously said that "the past does not repeat itself, but it rhymes." In the spirit of this re ection, we present novel algorithms and methods for leveraging large-scale digital histories and human knowledge mined from the Web to make real-time predictions about the likelihoods of future human and natural events of interest. The Web is a dynamic being, with constantly updating content, which is entangled with sophisticated user behaviors and interactions. Some of these behaviors have the ability to convey current trends in the present, e.g., economical growth (predicting automobile sales based on query volume [Varian and Choi, 2009]), popular movies [Kalev, 2011], and political unrest [Asur and Huberman, 2010; Joshi et al., 2010; Mishne, 2006]. We mine the ever-changing Web content and user Web behavior. We show that, not only the dynamics itself can be predicted, but also that it can be used for future real-world event prediction. We mine decades of news reports (1851 - 2010) from the New York Times (NYT), and describe how we can learn to predict the future by generalizing sets of concrete transitions in sequences of reported news events. In addition to the news corpora, we leverage data from freely available Web resources, including Wikipedia, FreeBase, OpenCyc, and GeoNames, via the LinkedData platform [Bizer et al., 2009]. The goal is to build predictive models that generalize from specific sets of sequences of events to provide likelihoods of future outcomes, based on patterns of evidence observed in near-term Web activities. We propose the methods as a means of generating actionable forecasts in advance of the occurrence of target events in the world. This thesis is one of the rst works to demonstrate general, unrestricted artificial-intelligence prediction capacity. We present methods derived from heterogeneous Web sources to make knowledge-intensive reasoning about causality and future event prediction, using both automatic feature extraction and novel algorithms for generalizing over historical examples.
CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi/2013/PHD/PHD-2013-02), rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the PHD technical reports of 2013
To the main CS technical reports page

Computer science department, Technion
admin