Friday, September 7, 2007

Machine Learning from Text Classification Applications-David D. Lewis

Machine Learning from Text Classification Applications

Dave Lewis

David D. Lewis Consulting, LLC

www.DavidDLewis.com


Text Classification

Assigning text to one of several predefined groups


Use

As a component in NLP systems

Improving information access


How to improve Information Access

Relevance Feedback (Interactive Search)

Organize search results into classes

Awareness control (Filtering porn, highlighting relevant text etc.)


Categories of Classifications

Binary (either this or that)

Multiclass (belongs to one of the classes)

Multilabel (may belong to more than one class)

Ordinal (Multiclass w/ ordered classes)

Hierarchical (class organized as trees)

Fuzzy (ordinal with numeric membership)


Types of Classifiers

Rule-Based

Numeric (weighted combination of attribute values)

Instance Based (Direct comparison of text with known labeled examples)

Automata/Pattern Matching (Based on pattern matching)

Hybrid (Weighted/voted, numeric w/ attributes defined by patterns etc.)


Why good classifiers go bad

Input data changes (change in format, meaning of word changes etc.)

Classes added, deleted, redefined

Organizational policies changed

Cost of errors change


How to deal with this

Make robust classifier

Monitor classifier effectiveness

Periodically retrain

Monitor for novel inputs

0 comments: