Solutions

Company

Blog

Contact Sales

Solutions

Company

Blog

Contact Sales

Nov 21, 2025

RAG for Text-to-SQL: From Synthetic Queries to Agentic Reasoning

Author	Audience
Gaia Oganesian — AI Engineer	Technical

Introduction

The task of translating natural language into executable SQL queries over complex databases has evolved rapidly with the rise of large language models (LLMs). However, even state-of-the-art models face major limitations: lack of schema awareness, difficulty handling long contexts, and inefficiencies when reasoning over large or complex databases. Retrieval-Augmented Generation (RAG) has emerged as a compelling strategy to address these issues, enabling models to condition generation on retrieved context such as schema subgraphs, examples, or compressed database metadata.

I have analysed three recent and influential approaches that push the boundaries of RAG for Text-to-SQL: ICRL, ReFoRCE, and TAG. Each tackles the central challenge of grounding LLMs in structured database knowledge while maximizing query generation accuracy, efficiency, and generality.

1. In-Context Reinforcement Learning-Based Retrieval-Augmented Generation for Text-to-SQL (ICRL)

🔗 Paper: In-Context Reinforcement Learning-Based Retrieval-Augmented Generation for Text-to-SQL

The In-Context Reinforcement Learning (ICRL) framework is a RAG approach for Text-to-SQL that constructs database schema graphs and uses iterative feedback to generate complex synthetic queries, enabling efficient retrieval of relevant database schemas from large-scale industrial databases. The model employs a reward function to encourage complexity in synthetic query generation and leverages LLM-aided schema pooling to achieve superior performance in both schema retrieval and SQL generation tasks.

ICRL works in four main steps:

(A) Graph Construction:

A schema graph is built: Root → Databases → Tables → Foreign key links.
Random walks through this graph define related schema paths (called traversals).

(B) Synthetic Query Generation: For each schema traversal:

A base LLM generates a synthetic NL question and corresponding SQL.
These are stored in a Knowledge Base (KB) as (question, schema, SQL) triplets.

In-Context RL Loop:

Reward function scores SQL complexity (joins, filters, aggregation).
A Feedback LLM gives instructions to make the NL question more complex.
This feedback is added to the prompt → the base LLM regenerates a better query.
Loop continues to improve query quality.

(D) Retrieval & Inference: Given a user query:

Retrieve top-k schema paths from KB (via embeddings).
An LLM selects the most relevant schema from the candidates.
Final LLM uses this schema + KB examples to generate the final SQL query.

Key Contributions and Benefits:

Complex Query Generation Capability: Generates sophisticated synthetic queries with diverse SQL operators including joins, aggregations, and conditional logic, better representing real-world human queries compared to simple prompting approaches.
Cost-Effective RAG Implementation: Provides substantial cost reductions by intelligently retrieving relevant schemas instead of passing entire database contexts to LLMs.
Scalable and Generalisable Architecture: Built on prompting-based modules without requiring specialised fine-tuning, making it adaptable to other problems requiring iterative refinement on top of LLMs.
Minimal LLM Dependency:

Challenges:

High LLM Dependency and Limited Fine-tuning Context: Since the approach doesn’t use fine-tuned LLMs for SQL generation, models may lack information or understanding about specific databases in context, potentially affecting performance on domain-specific queries.
Dependency on Reward Function Design: The effectiveness of the ICRL framework heavily relies on the carefully curated keyword buckets and complexity scores, which may need adjustment for different use cases or domains.

2. Robust Agent for Text-to-SQL with Self-Refinement (ReFoRCE)

🔗 Paper: ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Consensus, Enforcement, and Column Exploration

ReFoRCE (Refine, Format Restrict, Compress, Explore) is a modular, multi-stage Text-to-SQL agent designed to generate executable, high-accuracy SQL for complex, real-world databases (it tops the Spider 2.0 leaderboard). Developed by UCSD and Snowflake AI Research, it enhances SQL generation through a modular architecture that integrates schema compression, self-refinement, and execution-guided voting.

ReFoRCE works in four main steps: (1) Database Information Compression: It compresses database information by grouping related tables, linking schemas using an LLM, and keeping only the most useful column details and sample rows. (2) Candidate Generation with Self-Refinement: It generates SQL candidates and then refines them, fixing errors or removing queries that return empty results. (3) Majority Voting and Result Selection: It uses majority voting to pick the best SQL output from multiple high-confidence candidates. (4) Iterative Column Exploration: If the model isn’t confident, it triggers column-by-column exploration to better handle tricky questions involving complex databases or nested structures.

Key Contributions and Benefits:

Structured agent design: Modular architecture allows flexible debugging, inspection, and targeted improvements.
Schema-aware reasoning: Handles complex/nested DBs through exploration and compression.
Model-agnosticity: Can plug in any LLM — including Arctic, GPT-4, or o3 — allowing easy upgrades or efficiency tradeoffs.
Self-correcting: Built-in feedback loop improves resilience to LLM generation errors.
Voting for Robustness: Increases reliability by selecting from multiple independently generated candidates.

Challenges:

High LLM Dependency: ReFoRCE relies entirely on external LLM models for SQL generation and reasoning. It does not involve any model training or fine-tuning of its own, which may limit portability, adaptability, and control over performance.
Latency: Multiple rounds of generation, refinement, and execution increase runtime.
Limited Generalisation to Simpler Tasks: While ReFoRCE excels on complex, long-context datasets like Spider 2.0, it underperforms on simpler benchmarks such as BIRD, which require precise schema-level reasoning rather than multi-hop logic.

3. Table-Augmented Generation (TAG)

🔗 Paper: Text2SQL is Not Enough: Unifying AI and Databases with TAG

TAG is unified and general-purpose paradigm for answering natural language questions over databases, generalising existing approaches like Text2SQL and RAG to support a wider range of queries that require semantic reasoning, world knowledge, and structured execution.

An example TAG implementation for answering the user’s natural language question over a table about movies. The TAG pipeline proceeds in three stages: query synthesis, query execution, and answer generation:

Key Contributions and Benefits:

TAG unifies Text2SQL and RAG: both are just subsets of the full TAG pipeline.
Enables richer query types: including those needing world knowledge and semantic reasoning (using LMs).
Supports advanced logic: through iterative or recursive LM use during generation.
Demonstrates practical implementation with LOTUS: hand-written pipelines combine relational and semantic operators to achieve much higher accuracy.
Lays foundation for Agentic Data Assistants: TAG structure supports multi-hop agent workflows for interactive data reasoning.

Challenges:

High LLM Dependency: every step relies on LMs (including filtering, ranking, or aggregating),making the system sensitive to model changes and quality, while also introducing challenges in latency, cost, and overall stability.

Conclusion

Retrieval-Augmented Generation is reshaping how LLMs interact with structured data, particularly in the high-stakes task of Text-to-SQL translation. The systems examined (ICRL, ReFoRCE and TAG) reveal a clear shift from generic prompting to structured, retrieval-aware, and self-improving pipelines. Each brings unique strengths: ICRL emphasises scalable schema retrieval via synthetic data and reinforcement learning; ReFoRCE pushes modular design and refinement for robust execution; and TAG broadens the scope entirely, treating Text-to-SQL as part of a larger agentic reasoning framework.

Several key trends emerge from this landscape:

Rise of Self-Improving Loops: Whether through reinforcement learning (ICRL) or self-refinement (ReFoRCE), modern systems increasingly include feedback mechanisms to improve generation quality over time.
Schema Compression and Targeted Retrieval: Avoiding full database context is now standard, efficient retrieval of schema subsets via graphs or embeddings is essential for scalability and cost control.
Execution-Guided Reasoning: SQL generation is no longer just a decoding task, execution signals are used to refine or select outputs, boosting correctness.
Agentic Architectures: TAG and similar approaches are laying the groundwork for multi-step, modular agents that can reason, retrieve, execute, and adapt dynamically across tasks.
LLM-First but Not LLM-Only: While LLMs remain central, there’s a growing move toward combining them with symbolic reasoning, voting mechanisms, and domain-specific heuristics to improve reliability and interpretability.

Despite their progress, all three approaches face a common bottleneck: LLM dependency. Their performance, latency, and cost are tightly coupled to the underlying language models, which limits portability and introduces engineering overhead.

Going forward, the most promising directions lie at the intersection of these trends: hybrid architectures that combine retrieval, reasoning, and execution in modular ways, while minimising LLM calls and incorporating domain knowledge more natively.

References

[1] Toteja, R., Sarkar, A. and Comar, P.M., 2025, January. In-context reinforcement learning with retrieval-augmented generation for Text-to-SQL. In Proceedings of the 31st International Conference on Computational Linguistics (pp. 10390–10397).

[2] Deng, M., Ramachandran, A., Xu, C., Hu, L., Yao, Z., Datta, A. and Zhang, H., 2025. ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Consensus Enforcement, and Column Exploration. arXiv preprint arXiv:2502.00675.

[3] Biswal, A., Patel, L., Jha, S., Kamsetty, A., Liu, S., Gonzalez, J.E., Guestrin, C. and Zaharia, M., 2024. Text2sql is not enough: Unifying ai and databases with tag. arXiv preprint arXiv:2408.14717.

Turn Complexity into Clarity & Action

Book a call with our team to explore how Engine AI can transform your data into actionable insights that drive decisions — in weeks, not months.

Contact Sales

Turn Complexity into Clarity & Action

Book a call with our team to explore how Engine AI can transform your data into actionable insights that drive decisions — in weeks, not months.

Contact Sales

Turn Complexity into Clarity & Action

Book a call with our team to explore how Engine AI can transform your data into actionable insights that drive decisions — in weeks, not months.

Contact Sales

London

22a St James's Square
London SW1Y 4 JH
United Kingdom

Lisbon

Av. Duque de Loulé 12
1050-093 Lisbon
Portugal

Get in Touch

Cookie Policy

London

22a St James's Square
London SW1Y 4 JH
United Kingdom

Lisbon

Av. Duque de Loulé 12
1050-093 Lisbon
Portugal

Get in Touch

Cookie Policy

London

22a St James's Square
London SW1Y 4 JH
United Kingdom

Lisbon

Av. Duque de Loulé 12
1050-093 Lisbon
Portugal

Get in Touch

Cookie Policy