From Shapes to Sounds : A perceptual mapping ∗
V. Raykar
Abstract:In this report we present a perceptually inspired mapping to convert a simple two dimensional image consisting of simple geometrical shapes to a one dimensional audio waveform consisting of simple harmonic complexes. More specifically we map objects to harmonic complexes where the pitch, timbre and location of the complex corresponds to the size, shape and the position of the object respectively. 1 Motivation On the outset audition and vision appear to be two completely different sensory modalities. While visual perception has a two dimensional input (the image on the retina), the input to the auditory system is a one dimensional pressure waveform incident on the eardrum. Each of the modalities has its own percepts. Spatial location, depth, motion, size, color, symmetry, texture contribute to a rich set of visual percepts. We make sense of the world we see in terms of the different objects we see and the percepts associated with them. In a similar way we make sense of the auditory scene in terms of the auditory percepts like source direction, range, timbre and pitch. Even though these two modalities look different a computational frame work exists which can explain both these seemingly different perception in a unified framework1. There exists a interesting medical condition called synesthesiawhere there exists confusion between these two senses, where people reportedly hear shapes 2. This could be probably because of the cross-wiring between the two areas in the brain. There is an interesting theory which explains how evolution of language is related to the shapes of objects. Sounds can be metaphors for images, for example sounds can be described as bright or dull. The sounds and shapes of the objects have characteristics in common that can be abstracted, say a sharp, cutting quality of a word, and the shape it describes also calledBouba/kiki effectbased on the results of an experiment with two shapes and asking people to related the nonsense words bouba and kiki to them 3. ∗This report was written for the course project for ENEE632: Speech and Audio Processing offered in Spring 2004 by Prof. Shihab Shamma. Shamma S. ”On the role of space and time in auditory processing” in Trends Cogn Sci 2001 Aug 1;5(8):340-348 See Richard Cytowic’s book The Man Who Tasted Shapes for more interesting detailed account See the article Hearing Colors, Tasting Shapes: The Puzzle of Language by Vilyanur Ramchandran in Scentific American 2 Goal of the project In this project we are concerned with the following concrete problem. Given a two dimensional visual input we would like to sonify the image into a one dimensional auditory waveform. We would like to do it in such a way that there is a convincing perceptual map between the visual and auditory percepts. Consider the image shown in Figure 1 which has a square and a circle next to each other. We recognize the image in terms of the objects. We say there are two objects of different sizes and different shapes and at different locations. We would like to map these visual percepts to a suitable auditory percept. The potential candidates are pitch, timbre and location. The task would involve the following three steps: • Given a 2D image extract the visual percepts in the image which we would like to map to. This involves identifying how many objects are there in the image, their position sizes and their shapes. • Deciding which auditory percept to map to which visual percept. • Generating a auditory waveform corresponding to these percepts. Each of these is discussed in detail in the next three sections. Figure 1: We recognize this image as consisting of a square and circle of different sizes placed next to each other. 3 Symmetry as a tool to extract the visual percepts Given a image our task is to extract the following visual percepts • Find the number of distinct objects in the image. • Their spatial location in the images. • The size of each of the objects. • A convenient description which encodes the shape of the objects. We will be discussing with Figure 1 as our example. We use the concept of symmetry to localize the objects in a scene. Symmetry is an important mechanism by which we identify the structure of objects. Most of the natural objects (animal and plants) and also man made objects show a high degree of symmetry. An object is considered symmetric if it remains invariant under some transformation. Two kinds of symmetry which we are familiar are the bilateral and radial symmetry. A object is bilaterally symmetric about an axis if it is invariant to a reflection about that axis. A object is rotationally symmetric if it is invariant under a rotation. For example a square has four axis of bilateral symmetry, while a circle has infinite axes of bilateral symmetry. Most mammals are bilaterally symmetric. Clearly these are not the only two kinds of symmetry. Consider the leaf shown in Fig 2 which is symmetric about its stalk. The stalk may not be exactly vertical. Note that when defining symmetry we did not specify what kind of transformation. Also symmetry is exhibited at various scales. Certain kinds of fractals have symmetry at all possible scales. We need a multi-scale, multi-directional quantitative measure of symmetry. To this end we use the even and odd Gabor wavelets to define a quantitative measure of symmetry. Figure 2: A leaf which is symmetrical about its stalk. 3.1 Gabor wavelets Gabor wavelets are plane waves restricted by a gaussian envelope. The Gabor filter consists of a even symmetric part and a odd symmetric part which are defined as follows: Φ(x, y) = ( k 1 + k 2 2 σ2 ) exp[ (k 1 + k 2 2)(x 2 + y) 2σ2 ]cos(k1x + k2y) (1) Φ(x, y) = ( k 1 + k 2 2 σ2 ) exp[ (k 1 + k 2 2)(x 2 + y) 2σ2 ]sin(k1x + k2y) (2) Compactly we can write it as a complex filter. Φ(x, y) = ( k 1 + k 2 2 σ2 ) exp[ (k 1 + k 2 2)(x 2 + y) 2σ2 ] exp[i(k1x + k2y)] (3) Sometimes a DC correction is also added to the filter. This makes sure that the integral over the filter is zero. The output becomes independent of the mean gray level of the image under consideration. Φ(x, y) = ( k 1 + k 2 2 σ2 ) exp[ (k 1 + k 2 2)(x 2 + y) 2σ2 ]{exp[i(k1x+k2y)]−exp[−σ 2 2 ]} (4) k1 andk2 can be written as k1 = rcos(θ) (5) k2 = rsin(θ) (6) r controls the scale of the filter and θ controls the orientation of the filter.σ controls the number of excitatory and inhibitory lobes in the filter. There are a number of ways parameterize the Gabor wavelets and this is one of them. Figure 3 shows a example of the even and the odd gabor filters for a particular scale and orientation zero degrees. Gabor wavelets are useful models for simple cell receptive fields in the visual cortex 4. Gabor showed that these function achieve the theoretical limit for the joint representation of information in the 2D spatial and fourier domains. Pollen and Ronner showed that simple cells exist in quadrature-phase pairs as in the even and the odd symmetric part. We can use a series of Gabor filters corresponding to different scales and orientation to build a multi-scale multi-orientation representation of the image. Figure 4 shows the Gabor filters for different orientations and scales. J.G. Daugman, ” Uncertainty Relation for Resolution in Space, Spatial Frequency, and Orientation Optimized by TwoDimensional Visual Cortical Filters,” J. Optical Soc. Amer., vol. 2, no. 7, pp. 1,160-1,169, 1985 (a) Even Gabor Wavelet (c)Odd Gabor Wavelet Figure 3: A sample gabor wavelet showing the even part and the odd part. 3.2 Measure of Symmetry Figure 5(a) shows the original image. Figure 5(b) and Figure 5(c) show the even and odd Gabor filter for a particular scale and orientation zero degrees. Figure 5(e) and Figure 5(f) shows the output when the given image is filtered with these two Gabor wavelets. For the even filter the response is high at points where the image is symmetric and for the odd filter the response is high where the image is anti symmetric. So at the given point if the image is symmetric in that orientation then the even filter will give high response and the odd filter will give a low response. So we can define a measure of symmetry as the difference between the even and the odd part. For a given image I(x, y) and given a gabor wavelet Φ(x, y, r, θ) corresponding to a particular scale r and orientationθ we can define symmetry Sym(x, y, r, θ) as Sym(x, y, r, θ) = |I(x, y) ̄ Φ(x, y, r, θ)| − |I(x, y) ̄ Φ(x, y, r, θ)| (7) where ̄ is the convolution operation. Figure 5(d) shows the symmetry. As can be seen the output is high at points of symmetry. Figure 6 shows the same results for a different orientation ofθ = 45. Symmetry can occur at different orientations and scales. This can be clearly seen for a test image as shown in Figure 7 which shows the symmetry response at different scales and orientations. We can sum up the symmetry response at different scales and orientations we get a complete measure of symmetry. Local maximas in this representation will correspond to points of very high symmetry. TotalSymmetry(x, y) = ∑