Logo image
Innovating SQL Automation: Evaluating Open-Source Large Language Models with a Dual-Stage Approach for Corporate Data Solutions
Conference proceeding

Innovating SQL Automation: Evaluating Open-Source Large Language Models with a Dual-Stage Approach for Corporate Data Solutions

Erhan Arslan and Eugen Harinda
2024 9th International Conference on Computer Science and Engineering (UBMK). 26-28 October 2024, Turkiye, pp.68-73
2024 9th International Conference on Computer Science and Engineering (UBMK) (Antalya, Turkiye, 26/10/2024–28/10/2024)
11/12/2024

Abstract

Accuracy Automated Reporting Computational modeling Corporate Data Management Filtering Industries Iterative methods Large language models Natural languages Query Automation Robustness Structured Query Language Text-to-SQL Transforms
As Large Language Models (LLMs) continue to advance in their ability to process natural language, their potential to transform industries and reshape the future of human-computer interaction becomes increasingly evident. This study evaluates the application of Large Language Models (LLM) to automate SQL query generation from natural language inputs in enterprise environments. We investigated the feasibility of using open-source LLMs, including Mistral, CodeLlama, Phi-3, and DeepSeek Coder, by fine-tuning them with a custom dataset reflecting company-specific data tables. This dataset was iteratively constructed using a baseline LLM that allows fine-tuning to address unique enterprise data structures and use cases. A two-stage filtering and refinement mechanism was implemented to improve query accuracy. The first stage identifies relevant tables and the second stage adds an iterative error correction step to improve the SQL query generation process. The resulting system significantly reduced query errors and increased accuracy by 84%,88%,81%, and 90% for Mistral, CodeLlama, Phi-3, and DeepSeek Coder LLM. However, challenges remain in resource consumption and logical error handling. Future work will focus on improving contextual understanding and integrating advanced AI techniques to further enhance robustness and applicability.
url
Innovating SQL Automation: Evaluating Open-Source Large Language Models with a Dual-Stage Approach for Corporate Data SolutionsView
Published (Version of record)Publisher may require payment for access

Metrics

24 Record Views

Details

Logo image

Usage Policy