MPEG-7 Compatible Representation of Audios

The first step in constructing an MPEG-7 compatible audio management system is to decide what kind of queries will be supported and then to design an MPEG-7 profile accordingly. The representation of audio is crucial since it directly affects the system's performance. There is a trade-off between the accuracy of representation and the speed of access: more detailed representation will enable more detailed queries but will also result in longer response time during retrieval. Keeping these factors in mind, we decided to use the MPEG-7 profile shown in Figure 1, below. Our profile corresponds to the audio representation portion of the detailed audiovisual profile, with our own interpretation of what to represent with Audio Segments so that our system can support the wide range of queries it is designed for. First, audio and visual data are separated using Media Source Decomposition. Then, audio content is hierarchically decomposed into smaller structural and semantic units.


Figure 1: MPEG-7 Profile used in BilAudio-7.

Temporal Decomposition of Audio into Segments Manually. Audio is partitioned into non-overlapping audio pieces called segments, each having a temporal location (start time and duration), semantic annotation to describe the properties of segments with free text, keyword and structured annotation and audio descriptor (e.g., basic spectral, spectral timbral descriptors). This process is done manually, by specifying the start and end points of each segment and annotating these segments by Feature Extraction and Annotation Tool.

Auto-Segmentation of Whole Audio File. Whole audio file is seperated into segments of 2 kinds; non-silence and silence segments. Auto-Segmentation is done by specifying silence threshold amplitudes and threshold lengths. This process is very important in xml database construction. Instead of manually eliminating silence segments, process is automatized and much more big audio databases are constructed. Non-silence segments are processed normally to extract MPEG-7 audio descriptors whereas silence segments are not processed, instead they are labeled with "Silence" keyword and MPEG-7 silence descriptor.


Home - Contents
Last update: 23.12.2009
For questions and/or comments please contact Muhammet Baştan (bastan*at*cs*bilkent*edu*tr)