2025-01-21
Metadata matters in dense table retrieval
Publication
Publication
Recent advances in Large Language Models (LLMs) have enabled powerful systems that perform tasks by reasoning over tabular data [9, 10, 13, 7, 4]. While these systems typically assume relevant data is provided with a query, real-world use cases are mostly open-domain, meaning they receive a query without context regarding the underlying tables. Retrieving relevant tables is typically done over dense embeddings of serialized tables [5]. Yet, there is limited understanding of the effectiveness of different inputs and serialization methods for using such offthe-shelf text-embedding models for table retrieval. In this work, we show that different serialization strategies result in significant variations in retrieval performance. Additionally, we surface shortcomings in commonly used benchmarks applied in open-domain settings, motivating further study and refinement.
| Additional Metadata | |
|---|---|
| , , , , | |
| ELLIS workshop on Representation Learning and Generative Models for Structured Data | |
| Organisation | Database Architectures |
|
Gomm, D., & Hulsebos, M. (2025). Metadata matters in dense table retrieval. |
|