Metadata matters in dense table retrieval

Gomm, Daniel; Hulsebos, Madelon

D. Gomm (Daniel) and M Hulsebos (Madelon)

2025-01-21

Metadata matters in dense table retrieval

Presented at the ELLIS workshop on Representation Learning and Generative Models for Structured Data (February 2025), Amsterdam, The Netherlands

Recent advances in Large Language Models (LLMs) have enabled powerful systems that perform tasks by reasoning over tabular data [9, 10, 13, 7, 4]. While these systems typically assume relevant data is provided with a query, real-world use cases are mostly open-domain, meaning they receive a query without context regarding the underlying tables. Retrieving relevant tables is typically done over dense embeddings of serialized tables [5]. Yet, there is limited understanding of the effectiveness of different inputs and serialization methods for using such offthe-shelf text-embedding models for table retrieval. In this work, we show that different serialization strategies result in significant variations in retrieval performance. Additionally, we surface shortcomings in commonly used benchmarks applied in open-domain settings, motivating further study and refinement.

Additional Metadata
Keywords	Table retrieval, Retrieval augmented generation, Table embedding, Open-domain table retrieval, Table serialization
Conference	ELLIS workshop on Representation Learning and Generative Models for Structured Data
Organisation	Database Architectures
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Gomm, D.& Hulsebos, M. (2025, January 21). Metadata matters in dense table retrieval.

Free Full Text ( Final Version , 218kb )

Metadata matters in dense table retrieval

Publication

Publication

Address

CWI researchers

Questions or comments?

Metadata matters in dense table retrieval

Publication

Publication

Workflow

Workflow

Add Content