Tuesday, September 4, 2007

Introduction to IR-Keith van Rijsbergen

Introduction to Information Retrieval

Keith van Rijsbergen

Computing Science

Glasgow University

IR: Retrieval of unstructured data (text documents, images, videos etc.)

Some general terms used in IR:

Term Frequency

Frequency of word occurrence in a document is a useful measure for word significance.

Inverse document frequency

The value of a keyword varies inversely with the log of the number of documents in which it occurs.

IR Model

Explains the structure and processes of IR systems

Clarifies the general characteristics of IR systems

There exist various models

Boolean, Vector space, Probabilistic, Language models, Cognitive etc.

Cranfield Paradigm

• Document collection

• Relevance judgements in advance

• Run strategy A and B

• Evaluate A and B in terms of Precision & Recall

• Compare A with B statistically

• State whether A is comparable to B, A is better than B, B is worse than A

Categories:

0 comments: