Multimedia Information Retrieval
Alan F. Smeaton
Centre for Digital Video Processing & Adaptive Information Cluster
Technological developments have allowed easy creation, storage, transmission, rendering and archiving of multimedia. Often text IR method is augmented to media-specific retrieval facilities.
IR on Audio-Speech
Complexities
Speaker variability e.g. Speed of delivery, stressing, volume, background noise etc.
Acoustic Ambiguity e.g. Homophones (to, two and too), small acoustic distinctions (bee and pea).
Context-dependency e.g. Phones can be produced a number of ways depending on its context.
Computational cost of recognizing large vocabulary of words
IR on Audio-Music
Image Retrieval
Retrieval from photos, technical drawings, legal documents etc.
Generally there are two types:
Text-based Image Search
Based on Manual Annotation or Automatic Annotation (by google)
Content-based Image Search
Extract low level features such as colour and texture and extract semantic objects. This is done for each object in the collection. Similar process is carried out for any query image(s) and distance between query and indexed images are calculated. This produces the rank list.
Query
User specifies the required colours, textures, features and/or enters keywords.
User can also draw in the requirements such as spatial arrangements.
User can also query by providing image(s) with similar composition.
Video
- Use metadata and browse keyframes
Medata includes title, date, actor(s), producer(s) etc. coupled with keyframe/storyboard previews.
- Use text from speech-Automatic Speech Recognition (ASR)-captions-video OCR
- Match keyframes vs query images
Keyframes extracted as shot representatives can be used for retrieval
- Use semantic video features
Involves pre-processing of video or keyframes to detect features.
Use video/image objects as queries


.gif)
0 comments:
Post a Comment