Extracting N-ary Facts from Wikipedia Table Clusters

Kruit, Benno; Boncz, Peter; Urbani, Jacopo

doi:10.1145/3340531.3412027

B.B. Kruit (Benno), P.A. Boncz (Peter) and J. Urbani (Jacopo)

2020-10-19

Extracting N-ary Facts from Wikipedia Table Clusters

Presented at the The 29th ACM International Conference on Information and Knowledge Management (October 2020), Virtual Event Ireland

Tables in Wikipedia articles contain a wealth of knowledge that would be useful for many applications if it were structured in a more coherent, queryable form. An important problem is that many of such tables contain the same type of knowledge, but have different layouts and/or schemata. Moreover, some tables refer to entities that we can link to Knowledge Bases (KBs), while others do not. Finally, some tables express entity-attribute relations, while others contain more complex n-ary relations. We propose a novel knowledge extraction technique that tackles these problems. Our method first transforms and clusters similar tables into fewer unified ones to overcome the problem of table diversity. Then, the unified tables are linked to the KB so that knowledge about popular entities propagates to the unpopular ones. Finally, our method applies a technique that relies on functional dependencies to judiciously interpret the table and extract n-ary relations. Our experiments over 1.5M Wikipedia tables show that our clustering can group many semantically similar tables. This leads to the extraction of many novel n-ary relations.

Additional Metadata
Keywords	data integration, knowledge extraction, n-ary relation, wikipedia tables
Persistent URL	doi.org/10.1145/3340531.3412027
Conference	The 29th ACM International Conference on Information and Knowledge Management
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Kruit, B., Boncz, P., & Urbani, J. (2020). Extracting N-ary Facts from Wikipedia Table Clusters. In International Conference on Information and Knowledge Management, Proceedings (pp. 655–664). doi:10.1145/3340531.3412027

View at Publisher

Free Full Text ( Final Version , 1mb )

See Also
software takco B.B. Kruit (Benno), P.A. Boncz (Peter) and J. Urbani (Jacopo)

Extracting N-ary Facts from Wikipedia Table Clusters

Publication

Publication

software
takco

Address

Publishing at CWI

Questions or comments?

Extracting N-ary Facts from Wikipedia Table Clusters

Publication

Publication

software takco

Workflow

Workflow

Add Content

software
takco