Michael J. O'Donnell,Ilia Bisnovatyi
Abstract:Computing practice today depends on visual output to drive almost all user interaction. Other senses, such as audition, may be totally neglected, or used tangentially, or used in highly restricted specialized ways. We have excellent audio rendering through D-A conversion, but we lack rich general facilities for modeling and manipulating sound comparable in quality and flexibility to graphics. We need co-ordinated research in several disciplines to improve the use of sound as an interactive information channel.
Incremental and separate improvements in synthesis, analysis, speech processing, audiology, acoustics, music, etc. will not alone produce the radical progress that we seek in sonic practice. We also need to create a new central topic of study in digital audio research. The new topic will assimilate the contributions of different disciplines on a common foundation. The key central concept that we lack is sound as a general-purpose information channel. We must investigate the structure of this information channel, which is driven by the co-operative development of auditory perception and physical sound production. Particular audible encodings, such as speech and music, illuminate sonic information by example, but they are no more sufficient for a characterization than typography is sufficient for a characterization of visual information.
What problem does this paper attempt to address?
The core issue this paper attempts to address is the lack of research on sound as an information channel in computer science. Specifically, the authors point out that current sound research mainly focuses on isolated areas such as sound generation, propagation, and perception, lacking a unified theoretical framework that integrates these fields. To achieve this goal, the paper proposes the following points:
1. **Interdisciplinary Collaboration**: Coordinated research is needed across multiple disciplines, including computer science, numerical mathematics, psychology, linguistics, bioacoustics, and music, to understand different aspects of sound structure.
2. **Information Structure of Sound**: Explore the essential characteristics of sound as a universal information channel and reveal its intrinsic structure through different applications (such as speech and music).
3. **Time Scales**: Study the perceptual characteristics of sound at different time scales, including pitch perception, transient effects, and event sequences.
4. **Descriptive Models**: Propose various sound descriptive models, such as waveform models, Fourier models, time-frequency models, stochastic models, excitation/filter models, and physical models, to better understand and manipulate sound.
5. **Synthesis and Analysis Techniques**: Develop new sound synthesis methods and address challenging issues in sound analysis, particularly how to choose the most appropriate description from infinitely many possibilities.
6. **Auditory User Interface**: Develop sound-based human-computer interaction interfaces (AUI) to fully utilize the potential of sound in conveying information.
7. **Linear and Nonlinear Theories**: Explore sound theories applicable to nonlinear phenomena to better understand complex sound generation mechanisms.
8. **Auditory Scene Analysis**: Build technologies capable of separating and identifying different sound sources, similar to object recognition in computer vision.
9. **Collaboration between Art and Science**: Emphasize the application value of artistic principles in scientific and engineering fields, especially in enhancing the effectiveness of information presentation.
In summary, the paper aims to promote a deeper understanding of sound as an information medium through the establishment of an interdisciplinary research community and to advance its practical applications in multiple fields.