Innovating SQL Automation: Evaluating Open-Source Large Language Models with a Dual-Stage Approach for Corporate Data Solutions

Erhan Arslan; Eugen Harinda

doi:10.1109/UBMK63289.2024.10773417

Back

Conference proceeding

Innovating SQL Automation: Evaluating Open-Source Large Language Models with a Dual-Stage Approach for Corporate Data Solutions

Erhan Arslan and Eugen Harinda

2024 9th International Conference on Computer Science and Engineering (UBMK). 26-28 October 2024, Turkiye, pp.68-73

2024 9th International Conference on Computer Science and Engineering (UBMK) (Antalya, Turkiye, 26/10/2024–28/10/2024)

11/12/2024

DOI: https://doi.org/10.1109/UBMK63289.2024.10773417

Abstract

Accuracy

Automated Reporting

Computational modeling

Corporate Data Management

Filtering

Industries

Iterative methods

Large language models

Natural languages

Query Automation

Robustness

Structured Query Language

Text-to-SQL

Transforms

As Large Language Models (LLMs) continue to advance in their ability to process natural language, their potential to transform industries and reshape the future of human-computer interaction becomes increasingly evident. This study evaluates the application of Large Language Models (LLM) to automate SQL query generation from natural language inputs in enterprise environments. We investigated the feasibility of using open-source LLMs, including Mistral, CodeLlama, Phi-3, and DeepSeek Coder, by fine-tuning them with a custom dataset reflecting company-specific data tables. This dataset was iteratively constructed using a baseline LLM that allows fine-tuning to address unique enterprise data structures and use cases. A two-stage filtering and refinement mechanism was implemented to improve query accuracy. The first stage identifies relevant tables and the second stage adds an iterative error correction step to improve the SQL query generation process. The resulting system significantly reduced query errors and increased accuracy by 84%,88%,81%, and 90% for Mistral, CodeLlama, Phi-3, and DeepSeek Coder LLM. However, challenges remain in resource consumption and logical error handling. Future work will focus on improving contextual understanding and integrating advanced AI techniques to further enhance robustness and applicability.

Files and links (1)

url

Innovating SQL Automation: Evaluating Open-Source Large Language Models with a Dual-Stage Approach for Corporate Data SolutionsView

Published (Version of record)Publisher may require payment for access

Metrics

21 Record Views

Details

Title: Innovating SQL Automation: Evaluating Open-Source Large Language Models with a Dual-Stage Approach for Corporate Data Solutions
Creators - without role: Erhan Arslan - University of Greater Manchester
Eugen Harinda - University of Greater Manchester, School of Arts and Creative Technologies
Publication Details: 2024 9th International Conference on Computer Science and Engineering (UBMK). 26-28 October 2024, Turkiye, pp.68-73
Conference: 2024 9th International Conference on Computer Science and Engineering (UBMK) (Antalya, Turkiye, 26/10/2024–28/10/2024)
Publisher: IEEE
Grant note: University of Bolton (10.13039/100010042)
Identifiers: 9927506808841; 2768-0592; 2521-1641; 9798350365894; 2521-1641
Academic Unit: School of Arts and Creative Technologies
Language: English
Resource Type: Conference proceeding

Innovating SQL Automation: Evaluating Open-Source Large Language Models with a Dual-Stage Approach for Corporate Data Solutions

Abstract

Files and links (1)

Metrics

Details

University of Greater Manchester Social media

Usage Policy