Tables in Wikipedia articles contain a wealth of knowledge that would be useful for many applications if it were structured in a more coherent, queryable form. An important problem is that many of such tables contain the same type of knowledge, but have different layouts and/or schemata. Moreover, some tables refer to entities that we can link to Knowledge Bases (KBs), while others do not. Finally, some tables express entity-attribute relations, while others contain more complex n-ary relations. We propose a novel knowledge extraction technique that tackles these problems. Our method first transforms and clusters similar tables into fewer unified ones to overcome the problem of table diversity. Then, the unified tables are linked to the KB so that knowledge about popular entities propagates to the unpopular ones. Finally, our method applies a technique that relies on functional dependencies to judiciously interpret the table and extract n-ary relations. Our experiments over 1.5M Wikipedia tables show that our clustering can group many semantically similar tables. This leads to the extraction of many novel n-ary relations.

, , ,
doi.org/10.1145/3340531.3412027
The 29th ACM International Conference on Information and Knowledge Management
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Kruit, B., Boncz, P., & Urbani, J. (2020). Extracting N-ary Facts from Wikipedia Table Clusters. In International Conference on Information and Knowledge Management, Proceedings (pp. 655–664). doi:10.1145/3340531.3412027