Abstract:While the shared consensus genetic sequence of our species contains a great deal of information about our common biology, there is also much to be learned from the subtle genetic variations across our species. These variations are believed to be generally of little or no direct functional significance and predominantly reflect the chance accumulation of small genetic changes since our emergence as a species. Therefore, they carry little useful information when observed in a single individual. When tallied across a whole population though, these chance mutations can teach us a great deal about our evolutionary history and the patterns of inheritance in particular individuals. In particular, frequently observed patterns of single nucleotide polymorphisms (SNPs) in a population can identify segments of chromosome that have been passed down largely intact through long stretches of our evolution. Finding these frequently conserved chromosomal segments, or haplotypes, and developing methods to identify haplotype patterns in particular individuals, will in turn help us to identify those particular segments that carry genetic factors influencing risk for many common human diseases. To make the best use of this data, we will need to develop new models for the encoding of information in genome variations--the "language of genetic variation"--and new algorithms for fitting datasets to those models. This article surveys past work by the author and colleagues on this problem, utilising computational methods for locating frequent patterns in haploid sequence data, and "parsing" sequences so as to optimally explain them given the knowledge of the general population structure. The author's recent work in this area has been compiled into a set of computational tools available at http://www-2.cs.cmu.edu/~russells/software/hapmotif.html.

Dynamic programming algorithms for haplotype block partitioning: applications to human chromosome 21 haplotype data

A Dynamic Programming Algorithm for Haplotype Block Partitioning and Its Application in Association Studies.

Haplotype Block Partition with Limited Resources and Applications to Human Chromosome 21 Haplotype Data.

Dynamic Programming Algorithms for Haplotype Block Partitioning and Tag SNP Selection Using Haplotype Data or Genotype Data

HapBlock: Haplotype Block Partitioning and Tag SNP Selection Software Using a Set of Dynamic Programming Algorithms.

Haplotype Block Partitioning and Tag SNP Selection Using Genotype Data and Their Applications to Association Studies

The Effect of Haplotype-Block Definitions on Inference of Haplotype-Block Structure and Htsnps Selection

HapBlock – A Suite of Dynamic Programming Algorithms for Haplotype Block Partitioning and Tag SNP Selection Based on Haplotype and Genotype Data

Inference of missing SNPs and information quantity measurements for haplotype blocks.

Htsnper1.0: Software for Haplotype Block Partition and Htsnps Selection.

[Analysis and Application of SNP and Haplotype in the Human Genome].

HaploBlockFinder: Haplotype Block Analyses

Integer programming framework for pangenome-based genome inference

An Overview of the Haplotype Problems and Algorithms.

Distribution of Recombination Crossovers and the Origin of Haplotype Blocks: the Interplay of Population History, Recombination, and Mutation

Long-range Polony Haplotyping of Individual Human Chromosome Molecules

Genome-wide Compatible SNP Intervals and Their Properties

Linear Algebraic Tag SNP Selection and Haplotype Reconstruction

Fast and accurate haplotype inference with hidden markov model

Accurate Haplotype Inference for Multiple Linked Single-Nucleotide Polymorphisms Using Sibship Data

Haplotype parsing: methods for extracting information from human genetic variations