Knowledge Discovery in Databases Utilizing Large Language Models
International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 9 | Views: 228 | Weekly Hits: ⮙2 | Monthly Hits: ⮙3

Research Paper | Information Technology | United States of America | Volume 13 Issue 10, October 2024 | Popularity: 5.7 / 10


     

Knowledge Discovery in Databases Utilizing Large Language Models

Satyam Chauhan


Abstract: Converting natural language questions into executable SQL commands, known as text-to-SQL parsing, has seen a surge in interest recently. Advanced models like GPT-4 and Claude-2 have demonstrated significant potential in this area. However, existing benchmarks such as Spider and Wiki SQL primarily focus on simple database schemas with limited data, highlighting a disconnect between academic research and practical applications. To bridge this gap, we introduce BIRD, a comprehensive benchmark for large-scale database text-to-SQL tasks. BIRD includes 12,751 text-to-SQL pairs across 95 databases, totaling 33.4 GB and covering 37 diverse professional domains. Our focus on real-world database values brings forth new challenges, such as dealing with noisy or incomplete data, aligning natural language questions with external knowledge in the database, and improving SQL efficiency for large datasets. Addressing these issues requires text-to-SQL models to go beyond traditional semantic parsing to better understand database content. Experimental findings emphasize the critical role of database values in generating accurate SQL queries for extensive data. Even state-of-the-art models like GPT-4 achieve only 54.89% accuracy in execution, far from the 92.96% human benchmark, underscoring ongoing challenges in the field. Additionally, our analysis of query efficiency provides insights into crafting optimized SQL queries for industrial use cases. We believe BIRD will play a crucial role in advancing real-world text-to-SQL applications. The leaderboard and source code can be accessed at BIRD Benchmark. As data complexity increases and the demand for rapid data retrieval grows, integrating AI models, especially Large Language Models (LLMs), to assist users in generating SQL queries from natural language is becoming increasingly important. This research outlines a system where LLMs effectively combine with metadata-driven approaches such as mapping connections, segment definitions, and business logic?to enable intuitive SQL query generation. The system's setup, benefits, and foundational patterns are demonstrated through test datasets and a Power BI presentation.


Keywords: Large Language Models (LLMs), Metadata-Driven Methods, SQL Query Generation, Natural Language Processing (NLP), Information Retrieval, System Architecture, Data Management, Power BI, Artificial Intelligence (AI), Business Logic Integration, Data Visualization, Complex Data Sets, Query Validation, Machine Learning


Edition: Volume 13 Issue 10, October 2024


Pages: 1886 - 1894


DOI: https://www.doi.org/10.21275/MS241026170018


Please Disable the Pop-Up Blocker of Web Browser

Verification Code will appear in 2 Seconds ... Wait



Text copied to Clipboard!
Satyam Chauhan, "Knowledge Discovery in Databases Utilizing Large Language Models", International Journal of Science and Research (IJSR), Volume 13 Issue 10, October 2024, pp. 1886-1894, https://www.ijsr.net/getabstract.php?paperid=MS241026170018, DOI: https://www.doi.org/10.21275/MS241026170018

Top