Machine Learning from Text Classification Applications
Dave Lewis
David D. Lewis Consulting, LLC
Text Classification
Assigning text to one of several predefined groups
Use
As a component in NLP systems
Improving information access
How to improve Information Access
Relevance Feedback (Interactive Search)
Organize search results into classes
Awareness control (Filtering porn, highlighting relevant text etc.)
Categories of Classifications
Binary (either this or that)
Multiclass (belongs to one of the classes)
Multilabel (may belong to more than one class)
Ordinal (Multiclass w/ ordered classes)
Hierarchical (class organized as trees)
Fuzzy (ordinal with numeric membership)
Types of Classifiers
Rule-Based
Numeric (weighted combination of attribute values)
Instance Based (Direct comparison of text with known labeled examples)
Automata/Pattern Matching (Based on pattern matching)
Hybrid (Weighted/voted, numeric w/ attributes defined by patterns etc.)
Why good classifiers go bad
Input data changes (change in format, meaning of word changes etc.)
Classes added, deleted, redefined
Organizational policies changed
Cost of errors change
How to deal with this
Make robust classifier
Monitor classifier effectiveness
Periodically retrain
Monitor for novel inputs



.gif)
0 comments:
Post a Comment