Shaving Logs via Large Sieve Inequality: Faster Algorithms for Sparse Convolution and More

Ce Jin,Yinzhan Xu
2024-03-30
Abstract:In sparse convolution-type problems, a common technique is to hash the input integers modulo a random prime $p\in [Q/2,Q]$ for some parameter $Q$, which reduces the range of the input integers while preserving their additive structure. However, this hash family suffers from two drawbacks, which led to bottlenecks in many state-of-the-art algorithms: (1) The collision probability of two elements from $[N]$ is $O(\frac{\log N}{Q})$ rather than $O(\frac{1}{Q})$; (2) It is difficult to derandomize the choice of $p$; known derandomization techniques lead to super-logarithmic overhead [Chan, Lewenstein STOC'15].
Data Structures and Algorithms
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on several algorithm - related problems associated with sparse convolution. Specifically: 1. **Sparse Nonnegative Convolution**: - The goal is to compute the convolution \(A \star B\) of two nonnegative integer vectors \(A\) and \(B\) with high probability (at least a probability of \(1 - 1/t\)) in \(O(t \log t)\) time, where \(t\) is the sparsity \(\|A \star B\|_0\) of the output. - The previous best algorithm runs in \(O(t \log t+\text{polylog}(N\Delta))\) time, where \(\Delta=\max\{\|A\|_\infty,\|B\|_\infty\}\), and has a success probability of \(1 - 2^{-\sqrt{\log t}}\). This paper significantly improves this result by removing the \(\text{polylog}(N\Delta)\) term and increasing the success probability to \(1 - 1/t\). 2. **Text - to - Pattern Hamming Distances**: - Given a pattern \(P\) of length \(m\) and a text \(T\) of length \(n\), the goal is to compute the Hamming distance between the pattern \(P\) and each substring of length \(m\) in the text \(T\). - This paper provides a deterministic \(O(n\sqrt{m}\log\log m)\) - time algorithm, which is an improvement over the previous best deterministic algorithm \(O(n\sqrt{m}(\log m\log\log m)^{1/4})\) and is close to the optimal time complexity of the randomized algorithm. 3. **Sparse General Convolution**: - For input vectors that may contain negative numbers, the goal is to compute the convolution in \(O(t \log t)\) time, where \(t\) is the maximum sparsity of the input and output. - The previous algorithm runs in \(O(t\log^{2}t)\) time. In specific cases (i.e., when the length \(N\) of the input vector is less than or equal to \(t^{1.99}\) and \(\Delta\leqslant2^{2\log t / \text{polyloglog}t}\)), this paper provides a \(O(t \log t)\) - time Monte Carlo randomized algorithm. ### Main Technical Contributions 1. **Modulo Prime Hashing and the Large Sieve Inequality**: - This paper uses the large sieve inequality to analyze the performance of modulo prime hash functions, thereby improving the probability of hash collisions. Specifically, for a set \(A\subseteq[N]\), the expected number of collision pairs in modulo prime hashing is improved from \(O(|A|^{2}\log N/Q + |A|)\) to \(O(|A|(|A|/Q+\log\log N))\). - This improvement enables more effective use of hashing techniques when dealing with sparse convolution problems, especially when dealing with highly sparse data. 2. **Application of the Prony Method**: - By regarding each hash bucket as a sparse polynomial and using the Prony method to recover the elements in these buckets, this paper achieves a faster algorithm. The time complexity of the Prony method is \(O(\text{poly}(s))\), where \(s = O(\log\log N)\). 3. **Sparse Testing**: - In order to...