MPEG-7 Compatible Audio Feature Extraction and Annotation Tool

The MPEG-7 representations of audios are obtained using the MPEG-7 compatible audio feature extraction and annotation tool , shown below. Although there exists auto-segmentation option in the tool, currently the tool is operated manually to obtain the MPEG-7 representations according to this MPEG-7 profile. Audios, along with segment information, are loaded and then processed on a segment-by-segment basis. Users can manually select Segments and then annotate them with free text, keyword and structured annotations. The MPEG-7 audio descriptors (basic descriptors, basic spectral descriptors, basic signal parameters, temporal timbral descriptors, spectral timbral descriptors, spectral basis representations) of audio segments are computed by the tool, using an MPEG-7 feature extraction library adapted from MPEG-7 XM Reference Software. The user can select the set of audio descriptors to describe each type of audio segment (e.g., any subset of basic descriptors, basic spectral descriptors, basic signal parameters, temporal timbral descriptors, spectral timbral descriptors, spectral basis representations to describe the keyframes). The semantic content is described by text annotations (free text, keyword and structured annotation), which strike a good balance between simplicity (in terms of manual annotation effort and processing during querying) and expressiveness. The output is saved as an MPEG-7 compatible XML file to be stored in the XML database. The tool can currently handle 8 audio descriptors and still being improved to handle all descriptors, and will become a full-fledged MPEG-7 compatible multimedia feature extraction and annotation tool with as much automatic processing capabilities as possible so that manual processing time, human subjectivity and error-proneness can be reduced.


Figure 1: MPEG-7 compatible audio feature extraction and annotation tool according to this MPEG-7 profile.
In the graphical user interface, the current audio frame is shown at the top left,
Selected and automatically generated audio segments are shown on the right in a hierarchical tree view
reflecting the structure of the audio.


Home - Contents
Last update: 23.12.2009
For questions and/or comments please contact Muhammet Baştan (bastan*at*cs*bilkent*edu*tr)