Abstract:
An interesting class of problems in machine learning arises
when data points have internal structure. Such problems occur,
for example, when analysing biochemical data where each data
point represents a molecule and where one would like to be able
to predict whether unseen molecules are `active' (e.g. a useful
drug or not). In this case the atom-bond relations and other
structural properties of the molecule are clearly important. A
natural representation for such problems uses first-order Horn
logic, or Prolog programs, to describe both the data and the
predictors.
The talk will review recent work on characterizing the
complexity of such machine learning problems, developing
algorithms for tractable cases, and developing effective systems
based on these. Theoretical analysis is done in a model where,
in addition to looking at the data given to it, the learning
algorithm can ask questions, e.g. show a new molecule and ask
whether it is active. The analysis uses properties of the
underlying logic that lead to new algorithms as well as
non-trivial upper and lower bounds for the complexity of
learning. Implementing systems for logic learning also requires
efficient solutions for a matching problem known as subsumption.
Our system LogAnH (that can work with or without the additional
learner's questions) embeds the learning algorithm together
with fast subsumption tests. The system has been successfully
applied to several challenging datasets.
The talk will address a general computer science audience. No
special knowledge in machine learning or logic will be assumed.