Next-word: order by order

The same context, predicted by a unigram, bigram and trigram model (built-in cat/dog corpus).

Unigram ignores context

Bigram last 1 word

Trigram last 2 words

The five models at a glance

How each model represents context, what it captures, and where it falls short.

Model	Represents context as	Word order?	Context length	Output	Key limitation
Bag-of-Words	Count vector over vocab	No	Whole doc (unordered)	Sparse vector	Loses all order & meaning
N-grams	Counts of word sequences	Local	Previous n−1 words	Next-word probabilities	Sparse; tiny context
Word2Vec	Dense learned vectors	No*	Training window only	One vector per word	Context-free word meaning
LSTM	Recurrent hidden + cell state	Yes	Whole sequence (decays)	Contextual states / next word	Sequential; long-range fade
Transformer	Attention over all tokens	Yes	Whole sequence, direct	Contextual vectors / next token	Compute grows with length²

* Word2Vec uses order within its training window but produces a single order-independent vector per word.