"A Scalable Framework for Heterogeneous Environmental Data Management U" by Pratik Poudel, Boyuan Guan et al.
 

Date of this Version

7-2025

Document Type

Article

Rights

default

Abstract

Environmental data originates from diverse sources, posing challenges in management, processing, and visualization. This paper introduces a scalable, AI-driven data pipeline framework for environmental data management and discovery. The framework integrates workflow orchestration, automated data ingestion and processing, federated storage, and seamless geospatial visualization. It employs a Ceph-based storage system to handle large, heterogeneous datasets, leveraging its fault-tolerant, distributed architecture for high-performance storage across object, block, and file interfaces. To enhance data discoverability and interoperability, the framework incorporates Generative AI (GenAI) for automated metadata generation, reducing manual annotation overhead while improving real-time processing and cross-platform integration. Additionally, the system enables interdisciplinary collaboration through standardized metadata structures and scalable data federation. A case study using buoy data validates the framework’s capabilities, including data processing, cleaning, and visualization. By addressing critical data integration and accessibility challenges, the system fosters a scalable, efficient, and intelligent research data-sharing ecosystem for environmental science studies.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Comments

This paper was presented at PEARC '25 and is published under ACM copyright. This project is supported by NSF grants OAC-2322308 and IIS-2331908.

Share

COinS
 

Rights Statement

Rights Statement

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).