Genome sequence analysis with MonetDB: a case study on Ebola virus diversity
Presented at the Datenbanksysteme in Business, Technologie und Web (March 2015), Hamburg
Next-generation sequencing (NGS) technology has led the life sciences into the big data era. Today, sequencing genomes takes little time and cost, but results in terabytes of data to be stored and analysed. Biologists are often exposed to excessively time consuming and error-prone data management and analysis hurdles. In this paper, we propose a database management system (DBMS) based approach to accelerate and substantially simplify genome sequence analysis. We have extended MonetDB, an open-source column-based DBMS, with a BAM module, which enables easy, flexible, and rapid management and analysis of sequence alignment data stored as Sequence Alignment/Map (SAM/BAM) files. We describe the main features of MonetDB/BAM using a case study on Ebola virus genomes.
|Keywords||MonetDB, DBMS, gnome sequence, ebola|
|THEME||Life Sciences (theme 5), Information (theme 2)|
|Project||The SciLens-II Infrastructure, Big Data at work|
|Conference||Datenbanksysteme in Business, Technologie und Web|
|Grant||This work was funded by the The Netherlands Organisation for Scientific Research (NWO); grant id nwo/621.016.201 - The Scilens-II Infrastructure, Big Data at work|
Cijvat, C.P, Manegold, S, Kersten, M.L, Klau, G.W, Schönhuth, A, Marschall, T, & Zhang, Y. (2015). Genome sequence analysis with MonetDB: a case study on Ebola virus diversity. In Proceedings of Datenbanksysteme in Business, Technologie und Web 2015 (BTW 2015) in: Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI) (pp. 143–150).