How to use the Kohonen algorithm to simultaneously analyse individuals in a survey

Marie Cottrell,Patrick Letrémy
DOI: https://doi.org/10.1016/j.neucom.2004.04.011
2006-10-19
Abstract:The Kohonen algorithm (SOM, Kohonen,1984, 1995) is a very powerful tool for data analysis. It was originally designed to model organized connections between some biological neural networks. It was also immediately considered as a very good algorithm to realize vectorial quantization, and at the same time pertinent classification, with nice properties for visualization. If the individuals are described by quantitative variables (ratios, frequencies, measurements, amounts, etc.), the straightforward application of the original algorithm leads to build code vectors and to associate to each of them the class of all the individuals which are more similar to this code-vector than to the others. But, in case of individuals described by categorical (qualitative) variables having a finite number of modalities (like in a survey), it is necessary to define a specific algorithm. In this paper, we present a new algorithm inspired by the SOM algorithm, which provides a simultaneous classification of the individuals and of their modalities.
Statistics Theory
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to use the Kohonen algorithm (Self - Organizing Map, SOM) to simultaneously analyze the patterns of individuals and categorical variables in the survey. Specifically, the article focuses on how to handle databases composed of survey data, which are mainly described by categorical variables (i.e., qualitative variables). The traditional SOM algorithm is suitable for numerical data, but for data sets of categorical variables, a new algorithm is required to simultaneously classify and visualize individuals and their categorical patterns. ### Specific problems of the paper: 1. **Limitations of traditional methods**: For categorical variables (such as gender, occupational category, income level, etc.), it is inappropriate to directly apply the traditional SOM algorithm because these variables cannot be simply represented by numerical values. Categorical variables have a limited number of patterns, and there is no natural concept of order or distance between patterns. 2. **Deficiencies of Multiple Correspondence Analysis (MCA)**: Although MCA can be used to handle categorical variables, its projection method may distort distances, making it difficult to accurately interpret the relationships between individuals and patterns. 3. **Requirement for simultaneous classification**: Researchers hope to find a method that can simultaneously classify individuals and their categorical patterns and display them intuitively on a map, so that "neighboring" individuals or patterns are grouped into the same or adjacent categories. ### Solution: The paper proposes a new algorithm - KDISJ (Kohonen Disjunctive Analysis), which is based on the Corrected Disjunctive Table (Dc) and combines the ideas of the Kohonen algorithm. In this way, simultaneous classification and visualization of individuals and categorical patterns can be achieved, thus overcoming the limitations of traditional methods. ### Main steps: 1. **Construct the corrected disjunctive table**: Define a new table Dc by making weighted adjustments to the original disjunctive table. 2. **Define the Kohonen network**: Associate a code vector Cu with each unit, which contains the spatial information of individuals and patterns. 3. **Dual - learning process**: Alternately extract individuals and patterns and update the corresponding code vectors to make them closer to the selected individuals or patterns. 4. **Classification and visualization**: The final Kohonen classification not only classifies individuals but also classifies categorical patterns and maintains the association between them. ### Experimental verification: The paper demonstrates the effectiveness and superiority of the KDISJ algorithm through two practical cases (part - time employees and unemployed workers). Compared with traditional MCA and other classification methods, the KDISJ algorithm performs better in terms of classification accuracy, bias control, and visualization effects. ### Summary: This paper solves the problem of how to use the improved Kohonen algorithm to effectively classify and visualize individuals and categorical patterns simultaneously when dealing with survey data of categorical variables. This not only expands the application range of SOM technology but also provides new tools and methods for handling complex socio - economic survey data.