Tal Shor, M.Sc. Thesis Seminar
Wednesday, 7.2.2018, 11:30
The rapid digitization of genealogical and medical records enables the assembly of extremely large pedigree records spanning millions of individuals. Such pedigrees provide the opportunity to answer genetic and epidemiological questions in scales much larger than previously possible. Linear mixed models (LMMs) are often used for analysis of pedigree data. However, LMMs cannot naturally scale to large pedigrees spanning millions of individuals, owing to their steep computational and storage requirements. Here we propose a novel modeling framework called Sparse Cholesky factorIzation LMM (SciLMM), that alleviates these difficulties by exploiting the sparsity patterns found in large pedigree data. The proposed framework can construct a matrix of genetic relationships between trillions of pairs of individuals in several hours, and can fit the corresponding LMM in several days. We demonstrate the capabilities of SciLMM via simulation studies and by estimating the heritability of longevity in a very large pedigree spanning millions of individuals and over five centuries of human history. The SciLMM framework enables the analysis of extremely large pedigrees that was not previously possible.