Abstract:
I will describe the Biozon system (biozon.org) which is a
knowledge resource of heterogeneous biological data. Informally,
Biozon can be described as Amazon and Google, combined together
and applied to the diverse biological knowledge domain.
This resource merges the holdings of more than 20 molecular
biology databases and contains more than 100 million biological
documents and 6.5 billion relations between them. The database
relies on a novel graph database infrastructure and a new
approach to data integration, with substantial implications on
knowledge discovery in biology through complex and fuzzy
searches, as well as on data propagation through the graph
structure and emergent data topologies. Biozon also integrates
first-of-a-kind biological ranking system which resembles the
methods implemented in Google.
If time permits I will also talk about other research projects
in my lab, such as pathway prediction, domain-based protein
hierarchy, detection of semantically significant domain
architectures and novel embedding techniques that we have
developed to construct a complete "road map" of the protein
universe.