Multimedia IR-Alan F. Smeaton ~ Blog on IR, NLP and Related Areas<br>by<br>Shailesh Pandey

Tuesday, September 4, 2007

Multimedia IR-Alan F. Smeaton

Posted on 9:14 AM by shailesh | No comments

Multimedia Information Retrieval

Alan F. Smeaton

Centre for Digital Video Processing & Adaptive Information Cluster

Dublin City University

Technological developments have allowed easy creation, storage, transmission, rendering and archiving of multimedia. Often text IR method is augmented to media-specific retrieval facilities.

IR on Audio-Speech

Complexities

Speaker variability e.g. Speed of delivery, stressing, volume, background noise etc.

Acoustic Ambiguity e.g. Homophones (to, two and too), small acoustic distinctions (bee and pea).

Context-dependency e.g. Phones can be produced a number of ways depending on its context.

Computational cost of recognizing large vocabulary of words

IR on Audio-Music

MIDI is easy as notes are available and indexing terms are n-gram of notes. But with other forms such as MP3 it is complex.

Image Retrieval

Retrieval from photos, technical drawings, legal documents etc.

Generally there are two types:

Text-based Image Search

Based on Manual Annotation or Automatic Annotation (by google)

Content-based Image Search

Extract low level features such as colour and texture and extract semantic objects. This is done for each object in the collection. Similar process is carried out for any query image(s) and distance between query and indexed images are calculated. This produces the rank list.

Query

User specifies the required colours, textures, features and/or enters keywords.

User can also draw in the requirements such as spatial arrangements.

User can also query by providing image(s) with similar composition.

Video

Use metadata and browse keyframes

Medata includes title, date, actor(s), producer(s) etc. coupled with keyframe/storyboard previews.

Use text from speech-Automatic Speech Recognition (ASR)-captions-video OCR
Match keyframes vs query images

Keyframes extracted as shot representatives can be used for retrieval

Use semantic video features

Involves pre-processing of video or keyframes to detect features.

Use video/image objects as queries

Categories: ESSIR-2007

Blog on IR, NLP and Related Areas
by
Shailesh Pandey

Blog primarily created to jot down things I have observed, discussed or learned that are relevant to my research. Comments are welcomed.

Tuesday, September 4, 2007