N~grams offer fast language independent multi-class text categorization. Text is reduced in a single pass to ngram vectors. These are assigned to one of several classes by a) nearest neighbour (KNN) and b) genetic algorithm operating on weights in a nearest neighbour classifier. 91 accuracy is found on binary classification on short multi-author technical English documents. This falls if more categories are used but 69 is obtained with 8~classes. Zipf law is found not to apply to trigrams.

Optimization (acm G.1.6), Combinatorics (acm G.2.1), Learning (acm I.2.6), Problem Solving, Control Methods, and Search (acm I.2.8)
Learning and adaptive systems (msc 68T05), Problem solving (heuristics, search strategies, etc.) (msc 68T20)
Software Engineering [SEN]

Langdon, W.B. (2000). Natural language text classification and filtering with trigrams and evolutionary nearest neighbour classifiers. Software Engineering [SEN]. CWI.