Technical Report PHD-2015-05

TR#:PHD-2015-05
Class:PHD
Title: Sample Complexity of Training Markov Chains
Authors: Roman Talyansky
Supervisors: Alon Itai
PDFPHD-2015-05.pdf
Abstract: Markov chains have gained wide popularity in a diverse spectrum of applications, where there is a need to model processes, changing along the time axis. Markovian models appear extensively in thermodynamics and statistical mechanics, testing and telecommunication, finance and economics. A Markov chain (MC) consists of the set of states, the initial state probabilities, the set of edges and the transition probabilities that define probabilities to traverse edges. MCs are learned from a training data – sequences s of MC states. Training data are produced by a MC, called concept MC M (c) that cannot be observed directly. Let dist be a distance between MCs and ε, δ, θ between 0 and 0.5. The goal of learning is to use to construct a learned MC M (ℓ) that is ε -close to M (ℓ) with confidence at least (1 – δ): Pr(dist(M (c),M (ℓ))< ε) > 1 – δ . To develop sample complexity lower bounds we use methods of Information Theory: distinguishing between two random variables and strongly typical sequences. We also use matrix perturbation analysis. We develop three bounds on the sample complexity of learning MCs. The first bound is expressed by the terms of the concept MC. This bound is theoretic, because in practice the concept MC is not known and thus the bound cannot be calculated. The second bound is expressed by the terms of the MC, that is learned from a sequence over the states of the concept MC and distributed according to the concept MC, but involves heavy computations. Finally the last bound is practical, since it is both expressed by the terms of the learned MC and may be calculated efficiently. Markov chains have gained wide popularity in a diverse spectrum of applications, where there is a need to model processes, changing along the time axis. Markovian models appear extensively in thermodynamics and statistical mechanics, testing and telecommunication, finance and economics. A Markov chain (MC) consists of the set of states, the initial state probabilities, the set of edges and the transition probabilities that define probabilities to traverse edges. MCs are learned from a training data – sequences s of MC states. Training data are produced by a MC, called concept MC M (c) that cannot be observed directly. Let dist be a distance between MCs and ε, δ, θ between 0 and 0.5. The goal of learning is to use to construct a learned MC M (ℓ) that is ε -close to M (ℓ) with confidence at least (1 – δ): Pr(dist(M (c),M (ℓ))< ε) > 1 – δ . To develop sample complexity lower bounds we use methods of Information Theory: distinguishing between two random variables and strongly typical sequences. We also use matrix perturbation analysis. We develop three bounds on the sample complexity of learning MCs. The first bound is expressed by the terms of the concept MC. This bound is theoretic, because in practice the concept MC is not known and thus the bound cannot be calculated. The second bound is expressed by the terms of the MC, that is learned from a sequence over the states of the concept MC and distributed according to the concept MC, but involves heavy computations. Finally the last bound is practical, since it is both expressed by the terms of the learned MC and may be calculated efficiently.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi/2015/PHD/PHD-2015-05), rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the PHD technical reports of 2015
To the main CS technical reports page

Computer science department, Technion
admin