J. G. Wolff
Abstract:This paper describes a novel approach to unsupervised learning that has been developed within a framework of "information compression by multiple alignment, unification and search" (ICMAUS), designed to integrate learning with other AI functions such as parsing and production of language, fuzzy pattern recognition, probabilistic and exact forms of reasoning, and others.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to develop a new unsupervised learning method, which can integrate multiple functions within a single framework, such as language parsing and generation, fuzzy pattern recognition, best - match information retrieval, class hierarchies and their property inheritance, probabilistic reasoning and exact reasoning, etc. Specifically, the core objectives of the paper include:
1. **Powerful knowledge representation system**: It can represent syntactic and semantic structures in natural language and integrate these structures seamlessly.
2. **Unsupervised learning**: Without external error correction (such as negative samples or graded samples from simple to complex), the system can learn the segmented structures in natural language (such as words, phrases and sentences), as well as distribution - equivalent categories (such as nouns, verbs and adjectives). In addition, the system should be able to handle discontinuous grammatical dependencies and semantic structures, and can learn "correct" grammar when the data is contaminated.
3. **Avoid over - generalization**: The system should be able to distinguish between "correct" generalization and "over - generalization" without the need for an external error - correction mechanism.
4. **Optimization process**: The learning process is regarded as an optimization process rather than learning for a specific target grammar. Through heuristic methods (such as hill - climbing), the system can find a good - enough solution in the abstract grammar space.
5. **Minimum length coding principle**: The system is based on the Minimum Length Encoding (MLE) principle, aiming to derive the most concise representation by compressing information. This principle is not only used for grammar inference, but also for other types of information processing.
6. **Multiple alignment, unification and search (ICMAUS)**: The concept of "multiple alignment" is introduced, which is a more general form of pattern matching and supports encoding new information at different levels. By searching, matching and unifying patterns in the old information base, the system can effectively compress and represent new information.
7. **Integration with other AI functions**: An important objective of the research is to integrate learning with other AI functions (such as pattern recognition, reasoning, planning and problem - solving) within a relatively simple framework.
8. **Implementation of neural mechanisms**: Considering that the model may eventually be applied to understanding the brain and nervous system, the research also explores how to implement the ICMAUS framework with neural mechanisms.
In summary, this paper aims to develop a powerful unsupervised learning framework, which can not only handle complex tasks in natural language learning, but also be widely applied in other fields, such as pattern recognition, reasoning and computational modelling.