Noisy group testing via spatial coupling

Amin Coja-Oghlan,Max Hahn-Klimroth,Lukas Hintze,Dominik Kaaser,Lena Krieg,Maurice Rolvien,Olga Scheftelowitsch
2024-02-05
Abstract:We study the problem of identifying a small set $k\sim n^\theta$, $0<\theta<1$, of infected individuals within a large population of size $n$ by testing groups of individuals simultaneously. All tests are conducted concurrently. The goal is to minimise the total number of tests required. In this paper we make the (realistic) assumption that tests are noisy, i.e.\ that a group that contains an infected individual may return a negative test result or one that does not contain an infected individual may return a positive test results with a certain probability. The noise need not be symmetric. We develop an algorithm called SPARC that correctly identifies the set of infected individuals up to $o(k)$ errors with high probability with the asymptotically minimum number of tests. Additionally, we develop an algorithm called SPEX that exactly identifies the set of infected individuals w.h.p. with a number of tests that matches the information-theoretic lower bound for the constant column design, a powerful and well-studied test design.
Discrete Mathematics,Information Theory,Combinatorics
What problem does this paper attempt to address?
This paper attempts to solve the problem of identifying a small number of infected individuals in a large - scale population, especially in the case of noisy test results. Specifically, the paper focuses on how to minimize the total number of required tests by simultaneously testing groups of multiple individuals while ensuring that all or almost all infected individuals can be accurately identified. ### Main contributions of the paper 1. **Approximate Recovery Algorithm in Noise Model (SPARC)**: - The paper proposes an algorithm named SPARC, which can correctly identify the set of infected individuals with the optimal number of tests with high probability and an error not exceeding \(o(k)\). - The algorithm uses a randomized test design and can correctly identify infected individuals with high probability when the number of tests exceeds \((1 +\epsilon)m_{\text{SPARC}}\). - Here, \(m_{\text{SPARC}}=c_{\text{Sh}}k\ln(n / k)\), where \(c_{\text{Sh}}\) is a constant related to channel parameters. 2. **Exact Recovery Algorithm in Noise Model (SPEX)**: - The paper also proposes an algorithm named SPEX, which can exactly identify all infected individuals with the optimal number of tests with high probability. - The algorithm also uses a randomized test design and can correctly identify all infected individuals with high probability when the number of tests exceeds \((1 +\epsilon)m_{\text{SPEX}}\). - Here, \(m_{\text{SPEX}}=c_{\text{ex}}(\theta)k\ln(n / k)\), where \(c_{\text{ex}}(\theta)\) is a constant related to channel parameters and infection density \(\theta\). 3. **Theoretical lower bounds**: - The paper proves that when the number of tests is less than \((1-\epsilon)m_{\text{SPARC}}\), any test design cannot approximately recover the set of infected individuals with high probability. - Similarly, the paper also proves that when the number of tests is less than \((1-\epsilon)m_{\text{SPEX}}\), any test design cannot exactly recover the set of infected individuals with high probability. ### Key techniques and methods - **Spatial Coupling**: - The test design in the paper uses the spatial coupling technique from coding theory. This technique combines randomization and topological structure, making the test design more effective. - The spatial coupling technique performs particularly well in noisy environments and can significantly improve the efficiency and accuracy of tests. - **Kullback - Leibler Divergence (KL Divergence)**: - The paper uses KL divergence multiple times to measure the differences between different distributions, which is an important tool in information theory. - For example, \(D_{\text{KL}}(y\parallel z)=y\ln(y / z)+(1 - y)\ln((1 - y)/(1 - z))\) is used to calculate the difference between two probability distributions. - **Optimization problems**: - Many results in the paper rely on complex optimization problems, which are used to determine the optimal number of tests and test design parameters. - For example, the calculation of \(c_{\text{ex}}(\theta)\) involves multiple optimization steps that consider the influence of channel parameters and infection density. ### Experimental results and comparisons - **Binary Symmetric Channel (Symmetric Noise Model)**: - Under the symmetric noise model, the paper shows the performance of the SPARC and SPEX algorithms and compares them with previous best algorithms (such as the DD algorithm). - The results show that even in the case of very small noise (such as 1%), the new algorithms can still significantly reduce the required number of tests. - **Z - Channel**: - Under the Z - channel model, the paper also shows the SPARC and SPEX algorithms.