Cognitive Components of Speech at Different Time Scales
Ling Feng,Lars Kai Hansen
2007-01-01
Abstract:Cognitive Components of Speech at Different Time Scales Ling Feng (lf@imm.dtu.dk) Informatics and Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby, Denmark Lars Kai Hansen (lkh@imm.dtu.dk) Informatics and Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby, Denmark Abstract statistics. This has been demonstrated by a variety of inde- pendent component analysis (ICA) algorithms, whose rep- resentations closely resemble those found in natural percep- tual systems. Examples are, e.g., visual features (Bell & Se- jnowski, 1997; Hoyer & Hyvrinen, 2000), and sound features (Lewicki, 2002). Within an attempt to generalize these findings to higher cognitive functions we proposed and tested the independent cognitive component hypothesis, which basically asks the question: Do humans also use information theoretically opti- mal ICA methods in more generic and abstract data analysis? Cognitive component analysis (COCA) is thus simply defined as the process of unsupervised grouping of abstract data such that the ensuing group structure is well-aligned with that re- sulting from human cognitive activity (Hansen, Ahrendt, & Larsen, 2005). For the preliminary research on COCA, hu- man cognitive activity is restricted to the human labels in su- pervised learning methods. This interpretation is not compre- hensive, however it is capable of representing some intrinsic mechanism of human cognition. Further more, COCA is not limited to one specific technique, but rather a conglomerate of different techniques. We envision that efficient representa- tions of high level processes are based on sparse distributed codes and approximate independence, similar to what has been found for more basic perceptual processes. As men- tioned, independence can dramatically reduce the perception- to-action mappings by using factorial codes rather than com- plex codes based on the full joint distribution. Hence, it is a natural starting point to look for high-level statistically inde- pendent features when aiming at high-level representations. In this paper we focus on cognitive processes in digital speech signals. The paper is organized as follows: First we discuss the specifics of the cognitive component hypothesis in rela- tion to speech, then we describe our specific methods, present results obtained for the TIMIT database, and finally, we con- clude and draw some perspectives. Cognitive component analysis (COCA) is defined as unsu- pervised grouping of data leading to a group structure well- aligned with that resulting from human cognitive activity. We focus here on speech at different time scales looking for pos- sible hidden ‘cognitive structure’. Statistical regularities have earlier been revealed at multiple time scales corresponding to: phoneme, gender, height and speaker identity. We here show that the same simple unsupervised learning algorithm can de- tect these cues. Our basic features are 25-dimensional short- time Mel-frequency weighted cepstral coefficients, assumed to model the basic representation of the human auditory system. The basic features are aggregated in time to obtain features at longer time scales. Simple energy based filtering is used to achieve a sparse representation. Our hypothesis is now basi- cally ecological: We hypothesize that features that are essen- tially independent in a reasonable ensemble can be efficiently coded using a sparse independent component representation. The representations are indeed shown to be very similar be- tween supervised learning (invoking cognitive activity) and un- supervised learning (statistical regularities), hence lending ad- ditional support to our cognitive component hypothesis. Keywords: Cognitive component analysis; time scales; en- ergy based sparsification; statistical regularity; unsupervised learning; supervised learning. Introduction The evolution of human cognition is an on-going interplay between statistical properties of the ecology, the process of natural selection, and learning. Robust statistical regularities will be exploited by an evolutionary optimized brain (Barlow, 1989). Statistical independence may be one such regularity, which would allow the system to take advantage of factorial codes of much lower complexity than those pertinent to the full joint distribution. In (Wagensberg, 2000), the success of given ‘life forms’ is linked to their ability to recognize in- dependence between predictable and un-predictable process in a given niche. This represents a precision of the classical Darwinian paradigm by arguing that natural selection sim- ply favors innovations which increase the independence of the agent and un-predictable processes. The agent can be an individual or a group. The resulting human cognitive sys- tem can model complex multi-agent scenery, and use a broad spectrum of cues for analyzing perceptual input and for iden- tification of individual signal producing processes. The optimized representations for low level perception are indeed based on independence in relevant natural ensemble Cognitive Component Analysis In sensory coding it is proposed that visual system is near to optimal in representing natural scenes by invoking ‘sparse distributed’ coding (Field, 1994). The sparse signal consists of relatively few large magnitude samples in a background of numbers of small signals. When mixing such indepen-