Published August 24, 2020 | Version 1
Dataset Open

Tab2Know evaluation data

  • 1. VU Amsterdam

Description

Evaluation data for the paper "Tab2Know: Building a Knowledge Base from Tables in Scientific Papers" published at ISWC2020.

For code, see https://github.com/karmaresearch/tab2know .

 

This resource contains the following files:

- `venues.txt`: The venues that were use for selecting PDFs from the [Semantic Scholar Open Research Corpus](http://s2-public-api-prod.us-west-2.elasticbeanstalk.com/corpus/) that were published in the last 5 years.

- `extracted-tables.tar.gz`: All tables that we extracted using [Tabula](https://github.com/tabulapdf/tabula) from these PDFs.

- `sample-400.tar.gz`: A sample of these tables which we used for annotation.

- `ontology.ttl`: The annotation ontology in Turtle format.

- `all_metadata.jsonl`: Annotations for this sample in the JSON format described below.

- `labelqueries.csv`: The label queries used for weak annotation, created using the annotation interface. This CSV file contains 6 columns: a numeric ID, the label query template name (`template`), the template slots (`slots`), the label type (`label`), the annotation value (`value`), and a toggle for the interface (`enabled`).

- `labelqueries-sparql-templates.zip`: The label query templates. These are SPARQL queries with slots of the form `{{slot}}`. The templates in `labelqueries.csv` refer to these files.

- `rules.txt`: Datalog rules that we used for entity resolution.

- `tab2know-graph.nt.gz`: The final RDF graph that contains all extracted table structures, predicted table and column classes, and resolved entity links.

Files

labelqueries-sparql-templates.zip

Files (7.6 GB)

Name Size Download all
md5:7111ce13f51f048390024884ed0a1dff
389.0 kB Download
md5:0577aeefde307462b09ad3ab62efe01d
7.1 GB Download
md5:b691c5e80793c552ce8284d6ce16c971
9.6 kB Preview Download
md5:23cac8f13fc601893885aa55a966aa26
8.8 kB Preview Download
md5:c878e0ea0c625093a4850e13f11dcf0b
17.2 kB Download
md5:9e49b0272524b1fc7b077d871197ccf9
7.2 kB Preview Download
md5:36864076ab5fae44101beba90656b9a1
3.5 kB Preview Download
md5:5e9cca142ef7217bf627ec50a5f4559b
13.4 MB Download
md5:5937454b8bb5df1ccf636fc10d17e245
469.3 MB Download
md5:fe8348c0633f2299a1948a8b485dfe4f
145 Bytes Preview Download