Blended Integrated Open Data: dados abertos públicos integrados

Fabiola Santore,Lucas F. Oliveira,Rafael de Paulo Dias,Henrique V. Ehrenfried,Alessandro Elias,Diego Pasqualin,Luis C. E. de Bona,Marcos Didonet Del Fabro,Marcos Sunye
DOI: https://doi.org/10.48550/arXiv.1909.00743
2019-09-04
Abstract:While several public institutions provide its data openly, the effort required to access, integrate and query this data is too high, reducing the amount of possible dataset users. The Blended Integrated Open Data (BIOD) project has as objective to ease the access to public Open Data. It integrates and makes available more than 300Gb of data, containing billions of records from different Open Data Sets, allowing to query over them, and thus to retrieve related information from originally disconnected data sets. This paper presents the set of open data available, how to access it and how produce new compatible data to improve the existing data set.
Databases,Software Engineering
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to reduce the difficulty of accessing, integrating and querying public open data, thereby increasing the utilization rate of these data. Specifically, the paper introduces the Blended Integrated Open Data (BIOD) project. This project aims to integrate and provide more than 300GB of data (including billions of records), enabling users to more conveniently access and query relevant information from different open data sets. This not only simplifies the data acquisition process, but also promotes information retrieval and utilization between originally unconnected data sets. ### Main Objectives of the Paper: 1. **Lower the Threshold for Data Access**: By integrating multiple open data sources, reduce the technical capabilities and time costs required for users to access and use public open data. 2. **Promote the Widespread Use of Data**: By providing a unified and easy - to - query data platform, increase the frequency and scope of data use. 3. **Support Complex Queries**: Allow users to perform complex queries across data sets, thereby discovering more valuable information. ### Specific Methods: - **Data Integration**: Extract data from data sources of multiple government and public institutions and carry out standardization processing, so that it can be queried and analyzed on the same platform. - **Data Storage**: Use MonetDB as a data storage tool. This is a column - oriented database management system, which is especially suitable for OLAP (Online Analytical Processing) - type queries. - **Data Access**: Provide a RESTful API interface through the BlenDB tool, allowing users to directly access and query data through a simple query language without having to download the entire data set. - **Data Pre - aggregation**: Provide pre - aggregated data to reduce the amount of data and improve query efficiency, although this may sacrifice some precision. ### Expected Results: - **Improve Data Utilization Rate**: By reducing the difficulty of data access and use, attract more users to use these data for research and analysis. - **Promote Cross - domain Data Analysis**: By integrating data from different sources, support comprehensive cross - domain analysis and discover new associations and trends. - **Promote the Open Data Ecosystem**: Encourage more data sets to join this platform and form a growing open data ecosystem. In short, this paper shows how to solve the problems of accessing and using public open data through technical means by introducing the BIOD project, thereby promoting data - driven research and decision - making.