Downloads: 5 | Views: 109 | Weekly Hits: ⮙4 | Monthly Hits: ⮙4
Research Paper | Information Technology | United States of America | Volume 13 Issue 10, October 2024 | Popularity: 5.7 / 10
Knowledge Discovery in Databases Utilizing Large Language Models
Satyam Chauhan
Abstract: Converting natural language questions into executable SQL commands, known as text-to-SQL parsing, has seen a surge in interest recently. Advanced models like GPT-4 and Claude-2 have demonstrated significant potential in this area. However, existing benchmarks such as Spider and Wiki SQL primarily focus on simple database schemas with limited data, highlighting a disconnect between academic research and practical applications. To bridge this gap, we introduce BIRD, a comprehensive benchmark for large-scale database text-to-SQL tasks. BIRD includes 12,751 text-to-SQL pairs across 95 databases, totaling 33.4 GB and covering 37 diverse professional domains. Our focus on real-world database values brings forth new challenges, such as dealing with noisy or incomplete data, aligning natural language questions with external knowledge in the database, and improving SQL efficiency for large datasets. Addressing these issues requires text-to-SQL models to go beyond traditional semantic parsing to better understand database content. Experimental findings emphasize the critical role of database values in generating accurate SQL queries for extensive data. Even state-of-the-art models like GPT-4 achieve only 54.89% accuracy in execution, far from the 92.96% human benchmark, underscoring ongoing challenges in the field. Additionally, our analysis of query efficiency provides insights into crafting optimized SQL queries for industrial use cases. We believe BIRD will play a crucial role in advancing real-world text-to-SQL applications. The leaderboard and source code can be accessed at BIRD Benchmark. As data complexity increases and the demand for rapid data retrieval grows, integrating AI models, especially Large Language Models (LLMs), to assist users in generating SQL queries from natural language is becoming increasingly important. This research outlines a system where LLMs effectively combine with metadata-driven approaches such as mapping connections, segment definitions, and business logic?to enable intuitive SQL query generation. The system's setup, benefits, and foundational patterns are demonstrated through test datasets and a Power BI presentation.
Keywords: Large Language Models (LLMs), Metadata-Driven Methods, SQL Query Generation, Natural Language Processing (NLP), Information Retrieval, System Architecture, Data Management, Power BI, Artificial Intelligence (AI), Business Logic Integration, Data Visualization, Complex Data Sets, Query Validation, Machine Learning
Edition: Volume 13 Issue 10, October 2024
Pages: 1886 - 1894
DOI: https://www.doi.org/10.21275/MS241026170018
Make Sure to Disable the Pop-Up Blocker of Web Browser
Similar Articles
Downloads: 0
Research Paper, Information Technology, India, Volume 13 Issue 1, January 2024
Pages: 661 - 664Revolutionizing Public Health: A Blockchain - Based System for Secure Genetic and Medical Data Management
Kunal Dhanda, Sweta Sehrawat
Downloads: 0
Analysis Study Research Paper, Information Technology, United States of America, Volume 13 Issue 10, October 2024
Pages: 1425 - 1428Enhancing Patient Care through Improved Provider Data Quality: An AI - Driven Data Solution
Jerry John Thayil
Downloads: 0
Research Paper, Information Technology, United States of America, Volume 13 Issue 11, November 2024
Pages: 616 - 619Building a Global Cross - Regional Data Platform to Centralize Data for a Global Enterprise
Shreesha Hegde Kukkuhalli
Downloads: 0
Research Paper, Information Technology, India, Volume 11 Issue 3, March 2022
Pages: 1642 - 1649Data Integration Strategies in Hybrid Cloud Environments
Sai Kumar Reddy Thumburu
Downloads: 0
Research Paper, Information Technology, India, Volume 12 Issue 11, November 2023
Pages: 2234 - 2241Data Ethics in CRM: Privacy and Transparency Issues
Venkat Raviteja Boppana