Markus Stumptner,Wolfgang Mayer,Georg Grossmann,Jixue Liu,Wenhao Li,Pompeu Casanovas,Louis De Koker,Danuta Mendelson,David Watts,Bridget Bainbridge
Abstract:Traditionally the integration of data from multiple sources is done on an ad-hoc basis for each analysis scenario and application. This is a solution that is inflexible, incurs in high costs, leads to "silos" that prevent sharing data across different agencies or tasks, and is unable to cope with the modern environment, where workflows, tasks, and priorities frequently change. Operating within the Data to Decision Cooperative Research Centre (D2D CRC), the authors are currently involved in the Integrated Law Enforcement Project, which has the goal of developing a federated data platform that will enable the execution of integrated analytics on data accessed from different external and internal sources, thereby providing effective support to an investigator or analyst working to evaluate evidence and manage lines of inquiries in the investigation. Technical solutions should also operate ethically, in compliance with the law, and subject to good governance principles.
What problem does this paper attempt to address?
This paper aims to address the challenges faced by law enforcement agencies when processing and analyzing data from multiple different sources. Specifically, the paper attempts to solve the following main problems:
1. **Flexibility and Cost of Data Integration**: Traditionally, data integration is an ad - hoc operation for each analysis scenario and application. This method lacks flexibility, is costly, and leads to "data silos", which hinders data sharing between different agencies or tasks. Moreover, this approach cannot cope with the frequent changes in workflows, tasks, and priorities in the modern environment.
2. **Large - scale Data Analysis**: With the explosion of intelligence and other data volumes, how to extract the maximum value from these vast amounts of data has become a major challenge. Usually, only a small fraction of the current data can be analyzed.
3. **Legal Compliance and Governance**: Technical solutions need to be not only efficient but also compliant with legal requirements and follow good governance principles. This means that the acquisition, sharing, and analysis of information must be carried out within the framework of the rule of law.
4. **Cross - agency Data Access and Sharing**: Federal, state, or territorial agencies in Australia each hold a large amount of data, and cross - agency access and sharing of these data are usually restricted by multiple laws and complex rules and agreements. How to effectively access and link these data while complying with these regulations is an important issue.
5. **Evolution of Data Models and APIs**: Over time, the underlying data structures and access methods will change. How to maintain the validity and consistency of data links in such a dynamic environment is a challenge.
6. **Metadata Management**: The metadata management methods in commercial databases, intelligence tools, and big - data platforms are usually proprietary, lacking a federated metadata mechanism across multiple vendor tools. A metadata management mechanism that can span multiple vendor tools is needed to capture and manage metadata applicable to policing and intelligence contexts.
7. **Workflow Orchestration**: Existing commercial tools mainly rely on proprietary analysis tool chains and workflow implementations. Although there are some standards that support the exchange of analysis processes (such as UIMA), these tools are usually limited to a single vendor's tool chain. An open architecture is required to support cross - platform analysis processes while providing flexible workflow orchestration capabilities.
By developing a federated data platform, the goal of this paper is to enable comprehensive analysis of data from different internal and external sources, thereby providing effective support for investigators or analysts to evaluate evidence and manage investigation leads. At the same time, the platform will also adopt semantic techniques based on meta - modeling to achieve alignment and conversion between different APIs, services, process models, and metadata representation schemes.