MPEG-7

MPEG-7 is an ISO/IEC standard developed by MPEG (Moving Picture Experts Group), the committee that also developed the standards MPEG-1, MPEG-2 and MPEG-4. Different from the previous MPEG standards, MPEG-7 is designed to describe the content of multimedia. It is formally called "Multimedia Content Description Interface". It was announced in 2001.

MPEG-7 offers a comprehensive set of audiovisual description tools in the form of Descriptors (D) and Description Schemes (DS) that describe the multimedia data, forming a common basis for applications and enabling efficient and effective access to the data. The Description Definition Language (DDL) is based on W3C XML with some MPEG-7 specific extensions, such as vectors and matrices. Therefore, MPEG-7 documents are XML documents that conform to particular MPEG-7 schemas for describing multimedia content. Descriptors describe features, attributes or groups of attributes of multimedia content. Description Schemes describe entities or relationships pertaining to multimedia content. They specify the structure and semantics of their components, which may be Description Schemes, Descriptors or datatypes.

The MPEG-7 eXperimentation Model (XM) Reference Software is the framework for all the reference code of the MPEG-7 standard. It implements the normative components of MPEG-7. MPEG-7 standardizes multimedia content description but it does not specify how the description is produced. It is up to the MPEG-7 compatible application developers how the descriptors are extracted from the multimedia provided that the output conforms to the standard. MPEG-7 Audio Description Tools consist of basic structures and Descriptors that cover basic audio features.
The MPEG-7 low-level descriptors (LLDs) form the foundation layer of the standard . It consists of a collection of simple, lowcomplexity audio features that can be used to characterize any type of sound. The LLDs offer flexibility to the standard, allowing new applications to be built in addition to the ones that can be designed based on the MPEG-7 high-level tools. The foundation layer comprises a series of 18 generic LLDs consisting of a normative part (the syntax and semantics of the descriptor) and an optional, nonnormative part which recommends possible extraction and/or similarity matching methods. The temporal and spectral LLDs can be classified into the following groups: