Early prototype multimedia database management systems used the query-by-example (QBE) paradigm to respond to user queries. Users needed to formulate their queries by providing examples or sketches. Query-by-keyword (QBK) paradigm, on the other hand, has emerged due to the desire to search multimedia content in terms of semantic concepts using keywords or sentences rather than low-level features. This is because it is much easier to formulate some queries by keywords, which is also the way text retrieval systems are working. However, some queries are still easier to formulate by examples or sketches (e.g., trajectory of a moving object). Moreover, there is the so-called "semantic gap" problem, the disparity between the low-level representation and high-level semantics, which makes it very difficult to build multimedia systems capable of supporting keyword-based semantic queries effectively with an acceptable number of semantic concepts. The consequence is the need to support both query paradigms seamlessly in an integrated way.
Another important issue to be considered in today's multimedia management systems is interoperability. This is especially crucial for distributed architectures if the system is to be used by multiple heterogeneous clients. Therefore, MPEG-7 standard as the multimedia content description interface can be employed to address this issue.
The design of a retrieval system is directly affected by the type of queries to be supported. Types of descriptors and the granularity of the representation determine the system's performance in terms of speed and accuracy. As the level of detail in the representation increases more detailed queries can be answered by the system. However, both the database size and system response time increase. Therefore, the system should be designed according to the type of queries to be supported, and representation granularity should be selected accordingly. Below, we give some example audio query types that might be attractive for most users, but which also are not supported by the existing systems all together in an MPEG-7 compatible framework.
We developed BilAudio-7 as a powerful MPEG-7 compatible, segment-based audio database system to support such multimodal queries in an integrated way. We designed an MPEG-7 profile for audio representation which enables detailed queries on audios, and used our MPEG-7 compatible audio feature extraction and annotation tool to obtain the MPEG-7 compatible audio representations according to this profile. The Visual Query Interface of BilAudio-7 is an easy-to-use and powerful query interface to formulate complex multimodal queries easily, with support for a comprehensive set of MPEG-7 descriptors. Queries are processed on the multi-threaded Query Processing Server with a multimodal query processing and subquery result fusion architecture, which is also suitable for parallelization.