Date of this Version

12-5-2025

Document Type

Presentation

Abstract

Research data management increasingly faces challenges of scale, heterogeneity, and labor-intensive curation—especially in environments that must meet FAIR requirements while supporting diverse scientific domains. This work presents EnviStor, a production-grade, multi-agent AI workflow designed to automate and enhance data processing, metadata generation, and platform operations across FIU’s 2.7-petabyte NSF-funded research data infrastructure. Building on the Engineering Knowledge Architecture—a 3-track model combining behavior rules, domain knowledge, and procedural skills—we demonstrate how specialized agents (Envita for data preparation, Stori for developer tasks, and DIVA for Dataverse administration) collaborate via MCP and JSON-RPC to form a reliable, auditable, security-bounded system.

The framework directly addresses key failure modes observed in traditional agentic AI systems: inconsistency, lack of memory, lost-in-the-middle context, and unsafe autonomous behavior. By structuring agent cognition around hierarchical knowledge and reusable skills, the system enables fast discovery (< 10s), consistent execution, continuous learning, and automated documentation. Deployed in FIU’s production environment, EnviStor achieved significant real-world impact, including 98.27% accuracy in matching 911/927 datasets to EML metadata, identification of 15,844 additional metadata files, and reducing multi-week manual workflows to hours. The developer agent (Stori) autonomously generated ~49k lines of code and 100+ documentation files, demonstrating sustained self-improvement.

This work illustrates how agentic AI can operate not merely as a chatbot but as a dependable member of a human-AI team, capable of coordinating pipelines, enforcing security boundaries, performing domain-specific reasoning, and evolving capabilities over time. We conclude with an assessment of remaining challenges—validation, error recovery, risk assessment, and multi-step workflow orchestration—and outline a roadmap toward cross-domain generalization and high-reliability self-improving agent ecosystems.

Share

COinS