2025-11-01
SQaLe: A large-scale semi-synthetic dataset
Publication
Publication
SQALE is a large-scale, semi-synthetic Text-to-SQL dataset grounded in real-world database schemas. It was designed to push the boundaries of natural language to SQL generation, combining realistic schema diversity, complex query structures, and linguistically varied natural language questions. The code for the generation pipeline of this dataset can be accessed on GitHub.
| Additional Metadata | |
|---|---|
| Democratizing Insight Retrieval from (Semi-)Structured Data | |
| opensource.org/license/MIT | |
| Organisation | Database Architectures |
|
Wolff, C., Gomm, D., & Hulsebos, M. (2025). SQaLe: A large-scale semi-synthetic dataset. |
|