Abstract:Keyphrase ranking plays a crucial role in information retrieval and summarization by indexing and retrieving relevant information efficiently. Advances in natural language processing, especially large language models (LLMs), have improved keyphrase extraction and ranking. However, traditional methods often overlook diversity, resulting in redundant keyphrases. We propose a novel approach using Submodular Function Optimization (SFO) to balance relevance and diversity in keyphrase ranking. By framing the task as submodular maximization, our method selects diverse and representative keyphrases. Experiments on benchmark datasets show that our approach outperforms existing methods in both relevance and diversity metrics, achieving SOTA performance in execution time. Our code is available online.

What problem does this paper attempt to address?

The key problem that this paper attempts to solve is the deficiency of existing key - phrase extraction methods in balancing relevance and diversity. Specifically, traditional key - phrase extraction methods often focus too much on relevance, resulting in redundant extracted phrases and being unable to comprehensively capture the topics of documents. Although the development of natural language processing (NLP) technologies, especially large - language models (LLMs), has significantly improved the effect of key - phrase extraction, there are still challenges in optimizing relevance and diversity simultaneously. To solve this problem, the author proposes a new method based on Submodular Function Optimization (SFO) to optimize relevance and diversity simultaneously in key - phrase ranking. SFO is an optimization method with diminishing - return characteristics and is suitable for tasks that require diverse selection, such as document summarization and data subset selection. By modeling the key - phrase selection task as a sub - modular maximization problem, this method can ensure that the finally selected key - phrases are both representative and diverse. ### Formula Explanation 1. **Objective Function**: \[ f(S)=\sum_{k_{p} \in S} R(k_{p})-\alpha \sum_{k_{p_{i}} \neq k_{p_{j}}} Sim(k_{p_{i}}, k_{p_{j}}) \] where: - \(R(k_{p})\) is the relevance score of the candidate key - phrase \(k_{p}\). - \(Sim(k_{p_{i}}, k_{p_{j}})\) is the similarity between two key - phrases \(k_{p_{i}}\) and \(k_{p_{j}}\). - \(\alpha \geq 0\) is a hyper - parameter that controls the trade - off between relevance and diversity. 2. **Relevance Score**: \[ R(k_{p})=\cos(e_{k_{p}}, e_{D})=\frac{e_{k_{p}}^{T} e_{D}}{\left\|e_{k_{p}}\right\|\left\|e_{D}\right\|} \] where: - \(e_{k_{p}}\) is the embedding vector of the key - phrase. - \(e_{D}\) is the embedding vector of the document. 3. **Similarity Calculation**: \[ Sim(k_{p_{i}}, k_{p_{j}})=\cos(e_{k_{p_{i}}}, e_{k_{p_{j}}})=\frac{e_{k_{p_{i}}}^{T} e_{k_{p_{j}}}}{\left\|e_{k_{p_{i}}}\right\|\left\|e_{k_{p_{j}}}\right\|} \] Through the above formulas, this method can reduce redundancy and increase diversity while maintaining a high relevance between key - phrases and document content, thus providing a more comprehensive document representation. Experimental results show that this method outperforms existing methods on multiple benchmark datasets, performs well in both relevance and diversity metrics, and has a significant advantage in execution time.

Optimizing Keyphrase Ranking for Relevance and Diversity Using Submodular Function Optimization (SFO)

Optimizing top-k retrieval: submodularity analysis and search strategies

Diversifying Relevant Phrases

Balancing Relevance and Diversity in Online Bipartite Matching via Submodularity

Balancing Utility and Fairness in Submodular Maximization (Technical Report)

Fast Semidifferential-based Submodular Function Optimization

A Subtopic Taxonomy-Aware Framework for Diversity Evaluation.

Multi-objective optimization for sponsored search.

The Power of Second Chance: Personalized Submodular Maximization with Two Candidates

Conditional Sequential Slate Optimization

Quality and Diversity Optimization: A Unifying Modular Framework

Global Optimization for Advertisement Selection in Sponsored Search

Achieving Long-term Fairness in Submodular Maximization through Randomization

Fairness in Monotone $k$-submodular Maximization: Algorithms and Applications

Subtraction-Average-Based Optimizer: A New Swarm-Inspired Metaheuristic Algorithm for Solving Optimization Problems

Directly Optimize Diversity Evaluation Measures: A New Approach to Search Result Diversification.

Achieving Diversity in Objective Space for Sample-efficient Search of Multiobjective Optimization Problems

A Local Optimization Framework for Multi-Objective Ergodic Search

Localized Distributional Robustness in Submodular Multi-Task Subset Selection

Submodular Optimization for Keyframe Selection & Usage in SLAM

Beyond Greedy Search: Pruned Exhaustive Search for Diversified Result Ranking.