Abstract:The outputs of non-linear feed-forward neural network are positive, which could be treated as probability when they are normalized to one. If we take Entropy-Based Principle into consideration, the outputs for each sample could be represented as the distribution of this sample for different clusters. Entropy-Based Principle is the principle with which we could estimate the unknown distribution under some limited conditions. As this paper defines two processes in Feed-Forward Neural Network, our limited condition is the abstracted features of samples which are worked out in the abstraction process. And the final outputs are the probability distribution for different clusters in the clustering process. As Entropy-Based Principle is considered into the feed-forward neural network, a clustering method is born. We have conducted some experiments on six open UCI datasets, comparing with a few baselines and applied purity as the measurement . The results illustrate that our method outperforms all the other baselines that are most popular clustering methods.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to apply the Feed - Forward Neural Network (FFNN) to the clustering task in unsupervised learning. Traditionally, FFNN is mainly used for supervised learning tasks, such as regression and classification, which require a large amount of labeled data for training. However, in unsupervised learning, especially in clustering tasks, the data are usually unlabeled, and it is difficult for the traditional FFNN to be directly applied.
To solve this problem, the author proposes the Max - Entropy Feed - Forward Clustering Neural Network based on the Max - Entropy Principle. This method enables FFNN to cluster data under unsupervised conditions by introducing the maximum entropy and minimum entropy principles. Specifically:
1. **Feature Extraction and Probability Distribution**: In FFNN, the output of each layer can be regarded as both abstract data features and the probability distribution of samples belonging to different clusters.
2. **Maximum Entropy and Minimum Entropy Principles**:
- In the abstract layer (hidden layer), minimize the entropy to improve the feature extraction ability, making the sample closer to a specific linear regression learner.
- In the clustering layer (output layer), maximize the entropy to estimate the unknown probability distribution on the data manifold, thereby determining the cluster to which the sample belongs.
3. **Optimization Objective Function**: By constructing an optimization problem and combining the above two principles, a new clustering algorithm is designed. The optimization objective function is as follows:
\[
\text{Max } J = -\sum_{i = 1}^{N}\left(1-\frac{O_i}{\sum_{j = 1}^{N}(1 - O_j)}\right)\log\left(1-\frac{O_i}{\sum_{j = 1}^{N}(1 - O_j)}\right)+\lambda\sum_{l = 1}^{L}\sum_{i = 1}^{L_l}\left(1-\frac{O_{i,l}}{\sum_{j = 1}^{L_l}(1 - O_{j,l})}\right)\log\left(1-\frac{O_{i,l}}{\sum_{j = 1}^{L_l}(1 - O_{j,l})}\right)
\]
where:
- \( N \) is the number of neurons in the output layer.
- \( L_l \) is the number of neurons in the \( l \) - th layer.
- \( L \) is the number of layers in the network.
- \( O_i \) is the output of the \( i \) - th neuron in the output layer.
- \( O_{i,l} \) is the output of the \( i \) - th neuron in the \( l \) - th layer.
- \( \lambda \) is a hyper - parameter that controls the weights of the two parts.
4. **Experimental Verification**: The author conducted experiments on six publicly available UCI datasets and compared baseline models such as K - Means, density - based methods, hierarchical clustering methods, and Expectation - Maximization (EM) clustering methods. The experimental results show that the proposed method outperforms other baseline models on most datasets.
Through this method, the paper successfully applies FFNN to the unsupervised clustering task and shows its potential in dealing with complex data structures.