Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype

Yin, Bojian; Balvert, Marleen; van der Spek, Rick; Dutilh, Bas; Bohte, Sander; Veldink, Jan; Schönhuth, Alexander

doi:10.1101/533679

B. Yin (Bojian), M. Balvert (Marleen), R.A.A. van der Spek (Rick), B.E. Dutilh (Bas), S.M. Bohte (Sander), J. Veldink (Jan) and A. Schönhuth (Alexander)

2019-01-29

Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype

Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease caused by aberrations in the genome. While several disease-causing variants have been identified, a major part of heritability remains unexplained. ALS is believed to have a complex genetic basis where non-additive combinations of variants constitute disease, which cannot be picked up using the linear models employed in classical genotype-phenotype association studies. Deep learning on the other hand is highly promising for identifying such complex relations. We therefore developed a deep-learning based approach for the classification of ALS patients versus healthy individuals from the Dutch cohort of the ProjectMinE dataset. Based on recent insight that regulatory regions on the genome play a major role in ALS, we employ a two-step approach: first promoter regions that are likely associated to ALS are identified, and second individuals are classified based on their genotype in the selected genomic regions. Both steps employ a deep convolutional neural network. The network architecture accounts for the structure of genome data by applying convolution only to parts of the data where this makes sense from a genomics perspective. Our approach identifies potential ALS-associated genetic variants, and generally outperforms other classification methods. Test results support the hypothesis that ALS is caused by non-additive combinations of variants. Our method can be applied to large-scale whole genome data. We consider this a first step towards genotype-phenotype association with deep learning that is tailored to genomics and can deal with genome-sized data.

Additional Metadata
Keywords	Genetics
Persistent URL	doi.org/10.1101/533679
Organisation	Machine Learning
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Yin, B., Balvert, M., van der Spek, R., Dutilh, B., Bohte, S., Veldink, J., & Schönhuth, A. (2019). Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype. doi:10.1101/533679

View at Publisher

Free Full Text ( Final Version , 422kb )

See Also
article Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype B. Yin (Bojian), M. Balvert (Marleen), R.A.A. van der Spek (Rick), B.E. Dutilh (Bas), S.M. Bohte (Sander), J. Veldink (Jan) and A. Schönhuth (Alexander)

Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype

Publication

Publication

article
Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype

Address

CWI researchers

Questions or comments?

Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype

Publication

Publication

article Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype

Workflow

Workflow

Add Content

article
Using the structure of genome data in the design of deep neural networks for predicting amyotrophic lateral sclerosis from genotype