Human Intuition and Algorithmic Efficiency Must Be Balanced to Enhance Data Mesh Resilience
Andrew Strelzoff,Benjamin D. Trump,Christopher L. Cummings,Madison Smith,Stephanie E. Galaitsi,Kelsey Stoddard,Jeff Keisler,Moshe Vardi,Nathaniel Bastian,Alexander Kott,Igor Linkov
DOI: https://doi.org/10.1145/3632290
IF: 22.7
2024-05-02
Communications of the ACM
Abstract:Entities handling extensive and complex data environments increasingly adopt the data mesh paradigm across all sectors. Data mesh is an architectural and organizational governance approach that treats data as a product, promoting domain-specific ownership and self-serve infrastructure. 11 It encourages domain teams to manage their data, with standardized metadata, governance, and a services layer for accessibility, which reduces centralization bottlenecks and improves data scalability and usability across complex organizations. In the commercial sector, multinational technology companies value data mesh for domain-oriented governance and decentralized structure. Often managing large, heterogeneous data, they benefit from enhanced scalability and domain-specific data management. In the public sector, government agencies adopt data mesh for resilient, flexible data-product delivery. 10 Defense organizations like the U.S. Army use data mesh to provide reliable data products for informed decision making. Healthcare institutions and research organizations use data mesh to safeguard sensitive information, enhancing data security and authorized database access. Particularly with private, sensitive, or classified data, challenges arise in building data governance for these complex structures. 14 Central to this challenge is the effective implementation of data governance in emerging paradigms like data mesh architecture. Data mesh revolutionizes how organizations manage their vast data landscapes. Unlike traditional monolithic approaches where data is centralized, data mesh architecture decentralizes data ownership, echoing efficiencies of scale achieved by microservice architecture. By distributing ownership, teams closest to the data can govern and leverage it effectively, making it more accessible, reliable, and usable. To manage such distributed and complex operations, autonomous decision models are necessary to evaluate user access to the data mesh, user activity correctness, correction of data safety and security issues, and passive quality control, searching for unauthorized data mesh access and quarantining any damage to the architecture. An effective data governance framework can enhance system security and resilience (preventing, avoiding, and mitigating misuse), as well as recovery capacity. 9 Automated decision models can handle much of the former, providing efficiency and robustness. However, human judgment remains necessary to address errors from AI managing large data systems. 1 This column explores a "human-in-the-loop" approach for AI data mesh management, especially for the U.S. Army, known as Cyber Expert LLM Safety Assistant (CELSA). CELSA combines AI efficiency and expert judgment to aid data mesh resilience by promptly addressing well-understood threats automatically and elevating handling of novel threats. This work aligns with the strategic plan launched by the recent U.S. Presidential Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. 13 Optimizing Data Mesh Governance Understanding the potential for errors is critical, both when the automated decision model incorrectly identifies a benign data activity as a threat or an anomaly (Type I error), causing users to be blocked from otherwise legitimate access, and when the decision model grants access to unapproved users or for invalid purposes (Type II error). Each error type imposes its own costs: Type I causes unnecessary action to be taken, wasting people's time or resources but the quantity of that waste is bounded by the action itself. Type II represents a failure to provide the promised service of the system being used; the harms of this failure, when realized, are likely to be high consequence, but it is uncertain when they will be realized, if they will be realized all at. 5 Novel resilience measures must balance human intuition and automated efficiency in data governance, prompting questions about incorporating human-in-the-loop capacity in the data mesh architecture. To enhance system resilience against anomalies, we contend that human judgment is crucial to identify errors after an initial AI model evaluation (for example, an adaptive large language model (LLM) for frequent testing and updates). Data mesh governance necessitates transparent interaction between the proposed LLM and a human expert panel. This starts when a user requests access or suggests data mesh changes, indicating potential disruptions by malicious or invalid users to the database architecture or entries. Rapid evaluation of such events is crucial, preferably near instantaneous, as access delays reduce data mesh product operational viability. Human judgment might suit smal -Abstract Truncated-
computer science, theory & methods, software engineering, hardware & architecture