International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 4 | Views: 406 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Research Paper | Computer Technology | India | Volume 12 Issue 5, May 2023 | Popularity: 5.2 / 10


     

Streamlining Enterprise Data Pipelines with an Automated DAG Factory for Airflow Orchestration in Cloud Environments using YAML Templates and JSON - Serialized Variables

Ramamurthy Valavandan, Balakrishnan Gothandapani, Savitha Ramamurthy


Abstract: Airflow is an open - source platform for creating, scheduling, and monitoring data pipelines. Its Directed Acyclic Graph (DAG) factory provides a mechanism for creating and managing DAGs in a programmatic way. However, the current implementation of the DAG factory in Airflow requires writing Python code, which can be time - consuming and error - prone. In this research paper, we propose a YAML - based DAG factory automation framework for Airflow, which provides a simple and intuitive way to define DAGs in YAML format. We describe the design and implementation of the framework and provide examples of how it can be used to automate the creation and management of DAGs in a cloud environment. We also evaluate the performance and scalability of the framework using real - world datasets and compare it to the existing Python - based DAG factory in Airflow. Our results demonstrate that the YAML - based DAG factory automation framework provides a more efficient and flexible way to create and manage DAGs in Airflow, especially in large - scale data processing scenarios.


Keywords: Airflow, Directed Acyclic Graph, DAG factory, YAML, automation, Python, CLI tool, schema file, GCP, Composer, JSON, dictionary, task status, DAG tasks, template generation, variable


Edition: Volume 12 Issue 5, May 2023


Pages: 656 - 673


DOI: https://www.doi.org/10.21275/SR23508230454



Make Sure to Disable the Pop-Up Blocker of Web Browser




Text copied to Clipboard!
Ramamurthy Valavandan, Balakrishnan Gothandapani, Savitha Ramamurthy, "Streamlining Enterprise Data Pipelines with an Automated DAG Factory for Airflow Orchestration in Cloud Environments using YAML Templates and JSON - Serialized Variables", International Journal of Science and Research (IJSR), Volume 12 Issue 5, May 2023, pp. 656-673, https://www.ijsr.net/getabstract.php?paperid=SR23508230454, DOI: https://www.doi.org/10.21275/SR23508230454