2025-11-20
SQaLe: A text-to-SQL dataset generation pipeline grounded in real schemas
Publication
Publication
Composable pipeline for curating large-scale text-to-SQL corpora by extending database schemas, synthesising natural-language questions, and validating SQL programs with LLMs. The dataset can be accessed under trl-lab/SQaLe-text-to-SQL-dataset/ on Hugging Face Datasets.
| Additional Metadata | |
|---|---|
| Democratizing Insight Retrieval from (Semi-)Structured Data | |
| www.gnu.org/licenses/gpl-3.0.en.html | |
| Organisation | Database Architectures |
|
Wolff, C., Gomm, D., & Hulsebos, M. (2025). SQaLe: A text-to-SQL dataset generation pipeline grounded in real schemas. |
|