2000
Natural language text classification and filtering with trigrams and evolutionary nearest neighbour classifiers
Publication
Publication
N~grams offer fast language independent multi-class text categorization. Text is reduced in a single pass to ngram vectors. These are assigned to one of several classes by a) nearest neighbour (KNN) and b) genetic algorithm operating on weights in a nearest neighbour classifier. 91 accuracy is found on binary classification on short multi-author technical English documents. This falls if more categories are used but 69 is obtained with 8~classes. Zipf law is found not to apply to trigrams.
| Additional Metadata | |
|---|---|
| , , , | |
| , | |
| CWI | |
| Software Engineering [SEN] | |
|
Langdon, W. B. (2000). Natural language text classification and filtering with trigrams and evolutionary nearest neighbour classifiers. Software Engineering [SEN]. CWI. |
|