International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 5 | Views: 109 | Weekly Hits: ⮙4 | Monthly Hits: ⮙4

Research Paper | Information Technology | United States of America | Volume 13 Issue 10, October 2024 | Popularity: 5.7 / 10


     

Knowledge Discovery in Databases Utilizing Large Language Models

Satyam Chauhan


Abstract: Converting natural language questions into executable SQL commands, known as text-to-SQL parsing, has seen a surge in interest recently. Advanced models like GPT-4 and Claude-2 have demonstrated significant potential in this area. However, existing benchmarks such as Spider and Wiki SQL primarily focus on simple database schemas with limited data, highlighting a disconnect between academic research and practical applications. To bridge this gap, we introduce BIRD, a comprehensive benchmark for large-scale database text-to-SQL tasks. BIRD includes 12,751 text-to-SQL pairs across 95 databases, totaling 33.4 GB and covering 37 diverse professional domains. Our focus on real-world database values brings forth new challenges, such as dealing with noisy or incomplete data, aligning natural language questions with external knowledge in the database, and improving SQL efficiency for large datasets. Addressing these issues requires text-to-SQL models to go beyond traditional semantic parsing to better understand database content. Experimental findings emphasize the critical role of database values in generating accurate SQL queries for extensive data. Even state-of-the-art models like GPT-4 achieve only 54.89% accuracy in execution, far from the 92.96% human benchmark, underscoring ongoing challenges in the field. Additionally, our analysis of query efficiency provides insights into crafting optimized SQL queries for industrial use cases. We believe BIRD will play a crucial role in advancing real-world text-to-SQL applications. The leaderboard and source code can be accessed at BIRD Benchmark. As data complexity increases and the demand for rapid data retrieval grows, integrating AI models, especially Large Language Models (LLMs), to assist users in generating SQL queries from natural language is becoming increasingly important. This research outlines a system where LLMs effectively combine with metadata-driven approaches such as mapping connections, segment definitions, and business logic?to enable intuitive SQL query generation. The system's setup, benefits, and foundational patterns are demonstrated through test datasets and a Power BI presentation.


Keywords: Large Language Models (LLMs), Metadata-Driven Methods, SQL Query Generation, Natural Language Processing (NLP), Information Retrieval, System Architecture, Data Management, Power BI, Artificial Intelligence (AI), Business Logic Integration, Data Visualization, Complex Data Sets, Query Validation, Machine Learning


Edition: Volume 13 Issue 10, October 2024


Pages: 1886 - 1894


DOI: https://www.doi.org/10.21275/MS241026170018



Make Sure to Disable the Pop-Up Blocker of Web Browser




Text copied to Clipboard!
Satyam Chauhan, "Knowledge Discovery in Databases Utilizing Large Language Models", International Journal of Science and Research (IJSR), Volume 13 Issue 10, October 2024, pp. 1886-1894, https://www.ijsr.net/getabstract.php?paperid=MS241026170018, DOI: https://www.doi.org/10.21275/MS241026170018



Similar Articles

Downloads: 0

Research Paper, Information Technology, India, Volume 13 Issue 1, January 2024

Pages: 661 - 664

Revolutionizing Public Health: A Blockchain - Based System for Secure Genetic and Medical Data Management

Kunal Dhanda, Sweta Sehrawat

Share this Article

Downloads: 0

Analysis Study Research Paper, Information Technology, United States of America, Volume 13 Issue 10, October 2024

Pages: 1425 - 1428

Enhancing Patient Care through Improved Provider Data Quality: An AI - Driven Data Solution

Jerry John Thayil

Share this Article

Downloads: 0

Research Paper, Information Technology, United States of America, Volume 13 Issue 11, November 2024

Pages: 616 - 619

Building a Global Cross - Regional Data Platform to Centralize Data for a Global Enterprise

Shreesha Hegde Kukkuhalli

Share this Article

Downloads: 0

Research Paper, Information Technology, India, Volume 11 Issue 3, March 2022

Pages: 1642 - 1649

Data Integration Strategies in Hybrid Cloud Environments

Sai Kumar Reddy Thumburu

Share this Article

Downloads: 0

Research Paper, Information Technology, India, Volume 12 Issue 11, November 2023

Pages: 2234 - 2241

Data Ethics in CRM: Privacy and Transparency Issues

Venkat Raviteja Boppana

Share this Article



Top