2016-08-01
Multi-Hypothesis Parsing of Tabular Data in Comma-Separated Values (CSV) Files
Publication
Publication
Tabular data on the web comes in various formats and shapes. Preparing data for data
analysis and integration requires manual steps which go beyond simple parsing of the
data. The preparation includes steps like correct configuration of the parser, removing
of meaningless rows, casting of data types and reshaping of the table structure. The
goal of this thesis is the development of a robust and modular system which is able
to automatically transform messy CSV data sources into a tidy tabular data structure.
The highly diverse corpus of CSV files from the UK open data hub will serve as a basis
for the evaluation of the system.
Additional Metadata | |
---|---|
P.A. Boncz (Peter) , H.F. Mühleisen (Hannes) , R. Hoekstra (Rik) | |
Organisation | Database Architectures |
Doehmen, T. (2016, August). Multi-Hypothesis Parsing of Tabular Data in Comma-Separated Values (CSV) Files. |