Bilkent University
Department of Computer Engineering
CS 590/690 SEMINAR

 

E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL

 

Hasan Alp Caferoğlu
Master Student
(Supervisor: Prof. Dr. Özgür Ulusoy)
Computer Engineering Department
Bilkent University

Abstract: Translating Natural Language Queries into Structured Query Language (Text-to-SQL or NLQ-to-SQL) is a critical task extensively studied by both the natural language processing and database communities, aimed at providing a natural language interface to databases (NLIDB) and lowering the barrier for non-experts. Despite recent advancements made through the use of Large Language Models (LLMs), significant challenges remain. These include handling complex database schemas, resolving ambiguity in user queries, and generating SQL queries with intricate structures that accurately reflect the user's intent. In this work, we introduce E-SQL, a novel pipeline specifically designed to address these challenges through direct schema linking and candidate predicate augmentation. E-SQL enhances the natural language query by incorporating relevant database items (i.e., tables, columns, and values) and conditions directly into the question, bridging the gap between the query and the database structure. The pipeline leverages candidate predicate augmentation to mitigate erroneous or incomplete predicates in generated SQLs. We further investigate the impact of schema filtering, a technique widely explored in previous work, and demonstrate its diminishing returns when applied alongside advanced large language models. Comprehensive evaluations on the BIRD benchmark illustrate that E-SQL achieves competitive performance, particularly excelling in complex queries with a 66.29% execution accuracy on the test set. All code required to reproduce the reported results is publicly available on our GitHub repository.

 

DATE: November 18, Monday @ 16:30 Place: EA 502