Next-word: order by order

The same context, predicted by a unigram, bigram and trigram model (built-in cat/dog corpus).

Unigram ignores context
Bigram last 1 word
Trigram last 2 words

The five models at a glance

How each model represents context, what it captures, and where it falls short.

Model Represents context as Word order? Context length Output Key limitation
Bag-of-Words Count vector over vocab No Whole doc (unordered) Sparse vector Loses all order & meaning
N-grams Counts of word sequences Local Previous n−1 words Next-word probabilities Sparse; tiny context
Word2Vec Dense learned vectors No* Training window only One vector per word Context-free word meaning
LSTM Recurrent hidden + cell state Yes Whole sequence (decays) Contextual states / next word Sequential; long-range fade
Transformer Attention over all tokens Yes Whole sequence, direct Contextual vectors / next token Compute grows with length²

* Word2Vec uses order within its training window but produces a single order-independent vector per word.