2000
Natural language text classification and filtering with trigrams and evolutionary nearest neighbour classifiers
Publication
Publication
N~grams offer fast language independent multi-class text categorization. Text is reduced in a single pass to ngram vectors. These are assigned to one of several classes by a) nearest neighbour (KNN) and b) genetic algorithm operating on weights in a nearest neighbour classifier. 91 accuracy is found on binary classification on short multi-author technical English documents. This falls if more categories are used but 69 is obtained with 8~classes. Zipf law is found not to apply to trigrams.
Additional Metadata | |
---|---|
, , , | |
, | |
CWI | |
Software Engineering [SEN] | |
Langdon, W. B. (2000). Natural language text classification and filtering with trigrams and evolutionary nearest neighbour classifiers. Software Engineering [SEN]. CWI. |