Leveraging Oil and Gas Data Lakes to Enable Data Science Factories

Daniel Antonio,Andreas Sadlier,Joseph Winston
DOI: https://doi.org/10.2118/196452-ms
2020-10-25
Abstract:Abstract As oil and gas companies undergo their digital transformations, they typically first focus their efforts on analytic and machine learning solutions. They expect immediate advancements in automated operations and detecting predictive failures. The solutions generally rely on small-scale proof of concept exercises to demonstrate their worth. Far too often, this approach relies on manually collected datasets. Ironically, these consume the majority of the project's time and resources; consequently, they fall short of their promise to yield significant financial returns. Organizations must centralize available information into a corporate data lake to enable data scientists to access all available data. New challenges also arise from information governance and data management because this data originates from different business units with their own goals and concerns. Rather than focusing on the analogy of a data lake as a storage methodology for information, it is useful to view a data lake model as a manufacturing facility that produces analytical insights and enhanced capabilities. Just as a manufacturing facility is organized around specific processes to deliver finished goods, a data lake should provide all capabilities necessary to transform raw data into valuable assets for oil and gas organizations. The data lake must therefore feature several analogous capabilities and properties. These include a receiving dock, quality assurance/quality control (QA/QC) stations, warehousing, and tooling and engineering, as well as flexible, lean assembly lines to build new products and shipping capacity to deliver the finished goods to customers. By applying successful manufacturing techniques to the data lake design, oil and gas companies can efficiently develop and maintain assembly lines for manufacturing analytical insights. This paper explains the similarities between delivering analytics and manufacturing processes. It also describes the data lake functionality. Each part of the process provides a critical component for generating analytical results and can be managed like its manufacturing counterpart to deliver lean processes that enable more efficient data science results.
What problem does this paper attempt to address?