A study of accuracy and precision in oligonucleotide arrays: extracting more signal at large concentrations

Felix Naef,Nicholas D. Socci,Marcelo Magnasco
DOI: https://doi.org/10.48550/arXiv.physics/0205031
2002-08-28
Abstract:Despite the success and popularity of oligonucleotide arrays as a high-throughput technique for measuring mRNA expression levels, quantitative calibration studies have until now been limited. The main reason is that suitable data was not available. However, calibration data recently produced by Affymetrix now permits detailed studies of the intensity dependent sensitivity. Given a certain transcript concentration, it is of particular interest to know whether current analysis methods are capable of detecting differential expression ratios of 2 or higher. Using the calibration data, we demonstrate that while current techniques are capable of detecting changes in the low to mid concentration range, the situation is noticeably worse for high concentrations. In this regime, expression changes as large as 4 fold are severely biased, and changes of 2 are often undetectable. Such effects are mainly the consequence of the sequence specific binding properties of probes, and not the result of optical saturation in the fluorescence measurements. GeneChips are manufactured such that each transcript is probed by a set of sequences with a wide affinity range. We show that this property can be used to design a method capable of reducing the high intensity bias. The idea behind our methods is to transfers the weight of a measurement to a subset of probes with optimal linear response at a given concentration, which can be achieved using local embedding techniques.
Biological Physics,Quantitative Biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the case of high - concentration transcripts, the problem of quantitative deviation in measuring mRNA expression levels by oligonucleotide arrays (such as Affymetrix GeneChips). Specifically, the existing analysis methods can detect differentially expressed genes well in the low - to - medium concentration range, but in the high - concentration range, even a 4 - fold change will be severely compressed, and a 2 - fold change is often undetectable. ### Main problem summary: 1. **Quantitative deviation at high concentrations**: - When the transcript concentration is high, the current analysis methods (such as the MAS 5.0 algorithm) have a significantly reduced ability to detect differential expression. - It is manifested that even if there is a large expression change (such as 4 - fold), the actually detected change is far less than the true value, and even a 2 - fold change may not be detected. 2. **Causes of deviation**: - This deviation is mainly caused by the sequence - specific binding characteristics of the probes, rather than the optical saturation problem in fluorescence measurement. - The binding affinity of the probes is likely to reach saturation at high concentrations, resulting in signal compression. 3. **Need for improvement methods**: - The paper proposes a new analysis method, aiming to reduce the deviation at high concentrations through locally linear embeddings. - The core idea of the new method is to transfer the measurement weights to a subset of probes with the best linear response at a given concentration, thereby improving the accuracy in the high - concentration region. ### Formula representation: - **Matrix representation**: \[ A_j^i = \begin{cases} PM_j^i & \text{if } 1 \leq j \leq N_p \\ MM_{(j - N_p)}^i & \text{if } N_p < j \leq 2N_p \end{cases} \] where \( PM_j^i \) and \( MM_j^i \) respectively represent the original, background - subtracted, and normalized data of the \( j \) - th perfect - match probe and single - base - mismatch probe in the \( i \) - th experiment. - **Weight definition**: \[ \sum_{i = 1}^{N_e} w_i = 1 \] \[ m_j = \sum_{i = 1}^{N_e} w_i \log(A_j^i) \] - **Principal component analysis**: \[ \sqrt{w_i}(\log(A_j^i)-m_j)=\sum_{k = 1}^{N_p} U_{ik} D_k V_j^k \] where \( U \), \( D \), \( V \) are the results of singular value decomposition (SVD). - **Signal calculation**: \[ s_i = v_{\max} \sum_{j = 1}^{N_p} \log(A_j^i) V_j^1 \] where \( v_{\max}=\max_j |V_j^1| \). ### Conclusion: The paper shows the compression effect of existing methods at high concentrations and proposes a new method that utilizes the wide affinity range of probe sets to improve the accuracy in the high - concentration region. However, this method may sacrifice a certain degree of precision because it reduces the noise - suppression effect brought by averaging.