Majorizing Stress Formula Two

Jan de Leeuw
2024-07-26
Abstract:Modifications of the smacof algorithm for multidimensional scaling are proposed that provide a convergent majorization algorithm for Kruskal's stress formula two.
Numerical Analysis,Computation,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? The main objective of this paper is to propose an improvement to the SMACOF (Scaling by MAjorizing a COmplicated Function) algorithm in order to provide a convergent majorization algorithm for Kruskal's stress formula two. Specifically: 1. **Optimization problems in multidimensional scaling**: - The paper focuses on minimizing Kruskal's stress formula two (σ2) in multidimensional scaling (MDS). MDS is a technique for mapping high - dimensional data to low - dimensional space so that the distances between data points preserve the distance relationships of the original data as much as possible. 2. **Selection of stress formulas**: - Kruskal proposed two stress formulas: stress formula one (σ1) and stress formula two (σ2). Stress formula two is a newer loss function and is considered superior in some cases. - Stress formula two is defined as: \[ \sigma_2(X) := \frac{\sum_{i<j} w_{ij} (\delta_{ij} - d_{ij}(X))^2}{\sum_{i<j} w_{ij} (d_{ij}(X) - d(X))^2} \] where \( d(X) = \sum_{i<j} w_{ij} d_{ij}(X) \). 3. **Improvement of the algorithm**: - The existing non - metric and nonlinear R - implemented SMACOF procedures minimize Kruskal's normalized stress formula one (σ1), but this paper aims to extend the SMACOF theory and algorithm to be applicable to stress formula two (σ2). - An iterative algorithm based on the majorization method is proposed for minimizing stress formula two. This method simplifies the optimization problem by constructing an upper - bound function and ensures the convergence of the algorithm. 4. **Practical applications and comparisons**: - The article demonstrates the effectiveness of the new algorithm through specific examples (such as Ekman color data and De Gruijter Dutch political party similarity data) and compares its results with traditional methods. - The results show that although the minimized configurations of the two stress formulas are very similar, stress formula two may show better performance or different solutions on certain datasets. In summary, this paper aims to improve the performance of multidimensional scaling in processing specific types of data by improving the SMACOF algorithm so that it can effectively minimize stress formula two.