Abstract
When developing advanced location-based systems augmented with audio ambiances, it would be cost-effective to use a few representative samples from typical environments for describing a larger number of similar locations. The aim of this experiment was to study the human ability to discriminate audio ambiances recorded in similar locations of the same urban environment. A listening experiment consisting of material from three different environments and nine different locations was carried out with nineteen subjects to study the credibility of audio representations for certain environments which would diminish the need for collecting huge audio databases. The first goal was to study to what degree humans are able to recognize whether the recording has been made in an indicated location or in another similar location, when presented with the name of the place, location on a map, and the associated audio ambiance. The second goal was to study whether the ability to discriminate audio ambiances from different locations is affected by a visual cue, by presenting additional information in form of a photograph of the suggested location. The results indicate that audio ambiances from similar urban areas of the same city differ enough so that it is not acceptable to use a single recording as ambience to represent different yet similar locations. Including an image was found to increase the perceived credibility of all the audio samples in representing a certain location. The results suggest that developers of audio-augmented location-based systems should aim at using audio samples recorded on-site for each location in order to achieve a credible impression.
- Bonebright TL (1998) Perceptual structure of everyday sounds: a multidimensional scaling approach. In: Proceedings of the 7th international conference on auditory display. ICAD, Laboratory of Acoustics and Audio Signal Processing and the Telecommunications Software and Multimedia Laboratory, Helsinki University, pp 73-78.Google Scholar
- Bonebright TL, Miner NE, Goldsmith TE, Caudell TP (1998) Data collection and analysis techniques for evaluating the perceptual qualities of auditory stimuli. In: Proceedings of the ICAD '98. ICAD, British Computer Society. Google ScholarDigital Library
- Burr D, Alais D (2006) Combining visual and auditory information. Prog Brain Res 155:243-258.Google ScholarCross Ref
- Chrisler J, McCreary D (2010) Handbook of gender research in psychology, vol 1. Gender research in general and experimental psychology. Springer, New York.Google Scholar
- Frassinetti F, Bolognini N, Ladavas E (2002) Enhancement of visual perception by crossmodal visuo-auditory interaction. Exp Brain Res 147(3):332-343.Google ScholarCross Ref
- Google: Street view. URL http://maps.google.com/ (online). Accessed 10 Oct 2011.Google Scholar
- Green PD, Cooke MP, Crawford MD (1995) Auditory scene analysis and HMM recognition of speech in noise. In: Proceedings of the ICASSP '95, pp 401-404.Google Scholar
- Hong JY, Lee PJ, Jeon JY (2010) Evaluation of urban soundscape using soundwalking. In: Proceedings of 20th international congress on acoustics. Sydney.Google Scholar
- Klabbers E, Veldhuis R (2001) Reducing audible spectral discontinuities. IEEE Trans Speech Audio Process 9(1):39-51.Google ScholarCross Ref
- Lakatos S, McAdams S, Causs R (1997) The representation of auditory source characteristics: simple geometric form. Percept Psychophys 59(8):1180-1190.Google ScholarCross Ref
- MacEachren AM, Taylor DRF (1994) Visualization in modern cartography. Pergamon, New York.Google Scholar
- MacVeigh R, Jacobson RD (2007) Increasing the dimensionality of a geographic information system (GIS) using auditory display. In: Scavone GP (ed) Proceedings of the 13th international conference on auditory display (ICAD 2007). Schulich School of Music, McGill University, Montreal, Canada, pp 530-535.Google Scholar
- Miner NE (1998) Creating wavelet-based models for real-time synthesis of perceptually convincing environmental sounds. PhD thesis, University of New Mexico. Google ScholarDigital Library
- Nokia: Maps 3d. URL http://maps.nokia.com/3D (online). Accessed 10 Oct 2011.Google Scholar
- Peltonen VTK, Eronen AJ, Parviainen MP, Klapuri AP (2001) Recognition of everyday auditory scenes: potentials, latencies and cues. In: Proceedings of the 110th audio engineering society convention. Hall, Amsterdam.Google Scholar
- Philips S, Pitton J, Atlas L (2006) Perceptual feature identification for active sonar echoes. In: Proceedings of the IEEE OCEANS conference, pp 1-6.Google ScholarCross Ref
- Schiewe J, Kornfeld AL (2009) Framework and potential implementations of urban sound cartography. In: Proceedings of the 12th AGILE international conference on geographic information science. Hannover.Google Scholar
- Storms RL (1998) Auditory-visual cross-modal perception phenomena. PhD thesis, Naval Postgraduate School, Monterey, CA, USA.Google Scholar
- Urban remix project. URL http://urbanremix.gatech.edu. Online accessed 10 Oct 2011.Google Scholar
- Viollon S, Lavandier C, Drake C (2002) Influence of visual setting on sound ratings in an urban environment. Appl Acoust 63(5):493-511.Google ScholarCross Ref
- Vroomen J, Gelder BD (2000) Sound enhances visual perception: cross-modal effects of auditory organization on vision. J Exp Psychol Hum Percept Perform 26(5):1583-1590.Google ScholarCross Ref
Index Terms
- On the human ability to discriminate audio ambiances from similar locations of an urban environment
Recommendations
Ambiguity in Automatic Chord Transcription: Recognizing Major and Minor Chords
Adaptive Multimedia Retrieval: Semantics, Context, and AdaptationAbstractAutomatic chord transcription is the process of transforming the harmonic content of a music signal into chord symbols. We use difficult chord transcription cases in the Beatles material to compare human performance to computer performance. ...
Joint Recognition and Linking of Fine-Grained Locations from Tweets
WWW '16: Proceedings of the 25th International Conference on World Wide WebMany users casually reveal their locations such as restaurants, landmarks, and shops in their tweets. Recognizing such fine-grained locations from tweets and then linking the location mentions to well-defined location profiles (e.g., with formal name, ...
Audio–visual language instruction understanding for robotic sorting
AbstractFor robot in human environment, it has always been expected that the robot can execute specified tasks following language instructions. Most current methods only rely on visual perception to understand the language instruction, while ...
Highlights- A novel task of audio–visual language instruction understanding for robotic sorting is proposed, in which both the visual and audio information is leveraged ...
Comments