Abstract:Background: Inexpensive techniques for measurement and data storage now enable medical researchers to acquire far more data than can conveniently be analyzed by traditional methods. The expression "big data" refers to quantities on the order of magnitude of a terabyte (1012 bytes); special techniques must be used to evaluate such huge quantities of data in a scientifically meaningful way. Whether data sets of this size are useful and important is an open question that currently confronts medical science. Methods: In this article, we give illustrative examples of the use of analytical techniques for big data and discuss them in the light of a selective literature review. We point out some critical aspects that should be considered to avoid errors when large amounts of data are analyzed. Results: Machine learning techniques enable the recognition of potentially relevant patterns. When such techniques are used, certain additional steps should be taken that are unnecessary in more traditional analyses; for example, patient characteristics should be differentially weighted. If this is not done as a preliminary step before similarity detection, which is a component of many data analysis operations, characteristics such as age or sex will be weighted no higher than any one out of 10 000 gene expression values. Experience from the analysis of conventional observational data sets can be called upon to draw conclusions about potential causal effects from big data sets. Conclusion: Big data techniques can be used, for example, to evaluate observational data derived from the routine care of entire populations, with clustering methods used to analyze therapeutically relevant patient subgroups. Such analyses can provide complementary information to clinical trials of the classic type. As big data analyses become more popular, various statistical techniques for causality analysis in observational data are becoming more widely available. This is likely to be of benefit to medical science, but specific adaptations will have to be made according to the requirements of the applications.

How big is Big Data?

Big Data for Better Science

Why big data and compute are not necessarily the path to big materials science

A Survey of Big Data Research

Challenges of Big Data Analysis

Machine Learning and Big Scientific Data

The Bearable Lightness of Big Data: Towards Massive Public Datasets in Scientific Machine Learning

Big Data, Big Challenges

Rethinking Abstractions for Big Data: Why, Where, How, and What

A Survey of Machine Learning for Big Data Processing

When we talk about Big Data, What do we really mean? Toward a more precise definition of Big Data

Medical big data: promise and challenges

What is big data? A consensual definition and a review of key research topics

The anatomy of big data computing

"Big Data" and its Origins

Big Data access and infrastructure for modern biology: case studies in data repository utility

Gaining insight from large data volumes with ease

Big data in medical science--a biostatistical view

Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service

Data learning from big data

What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets