DNA hybridization to mismatched templates: a chip study

Felix Naef,Daniel A. Lim,Nila Patil,Marcelo Magnasco
DOI: https://doi.org/10.1103/PhysRevE.65.040902
2001-11-28
Abstract:High-density oligonucleotide arrays are among the most rapidly expanding technologies in biology today. In the {\sl GeneChip} system, the reconstruction of the target concentration depends upon the differential signal generated from hybridizing the target RNA to two nearly identical templates: a perfect match (PM) and a single mismatch (MM) probe. It has been observed that a large fraction of MM probes repeatably bind targets better than the PMs, against the usual expectation from sequence-specific hybridization; this is difficult to interpret in terms of the underlying physics. We examine this problem via a statistical analysis of a large set of microarray experiments. We classify the probes according to their signal to noise ($S/N$) ratio, defined as the eccentricity of a (PM, MM) pair's `trajectory' across many experiments. Of those probes having large $S/N$ ($>3$) only a fraction behave consistently with the commonly assumed hybridization model. Our results imply that the physics of DNA hybridization in microarrays is more complex than expected, and they suggest new ways of constructing estimators for the target RNA concentration.
Biological Physics,Data Analysis, Statistics and Probability,Quantitative Biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in DNA microarray technology, the fluorescence signal of single - mismatch (MM) probes is sometimes higher than that of perfect - match (PM) probes. According to the standard hybridization model, the signal of MM probes should be lower than that of PM probes, because MM probes contain a mismatched base and should theoretically reduce specific binding. However, in actual experiments, the signals of a large number of MM probes are higher than those of PM probes, which violates the expectations of the standard model. Specifically, the paper mainly explores the following problems: 1. **Why do a large number of MM probes show higher signals than PM probes?** - This phenomenon is difficult to explain with the existing hybridization models, indicating that the DNA hybridization process may be more complex than expected. 2. **How to explain the physical mechanisms behind these abnormal behaviors?** - By statistically analyzing a large amount of microarray experimental data, the paper finds that the behaviors of most probe pairs (PM - MM) are inconsistent with the standard model, implying that the existing models are insufficient to describe the actual situation. 3. **How to improve the existing RNA concentration estimation methods?** - Due to the limitations of the standard model, the existing gene expression level estimation methods may not be accurate enough. Therefore, new methods need to be explored to improve the signal - to - noise ratio and accuracy. ### Standard Hybridization Model The standard hybridization model assumes that: - The signal of PM probes \( PM = I_S+I_{NS}+B \) - The signal of MM probes \( MM=(1 - \alpha)I_S+I_{NS}+B \) where: - \( I_S \) is the contribution of specific binding - \( I_{NS} \) is the contribution of non - specific binding - \( B \) is the background noise - \( \alpha \) is the proportion of specific binding reduction caused by a single mismatch According to this model, theoretically \( PM \) should always be greater than \( MM \), that is, \( PM>MM \). However, the experimentally observed situation is not the case, especially in the high - luminance area, many probe pairs show the phenomenon of \( MM>PM \). ### Experimental Observation and Statistical Analysis By analyzing the joint probability distribution \( P(\log PM,\log MM) \) of multiple data sets, the paper finds the following characteristics: - In the high - luminance area, the distribution is divided into two branches, one of which is completely below the diagonal of \( MM < PM \). - Approximately 30% of probe pairs show the phenomenon of \( MM>PM \) under various conditions. - This abnormal behavior is widely present in different types of chips and is not caused by a few probes. ### Conclusion The paper points out that the existing hybridization models cannot fully explain these phenomena, indicating that we need to re - examine the physical mechanisms of DNA hybridization, especially the interactions between short sequences and single - base - mismatch templates. In addition, this research result is of great significance for designing more effective gene chip analysis tools, especially how to improve the noise suppression levels of different methods. In summary, this paper reveals some of the under - understood complexities in the DNA hybridization process and provides new directions for future research.