Tokenisers
Factory
← Hub
Visualiser
Compare
Algorithms
Live Comparison
See how BPE, WordPiece and SentencePiece tokenisation handle the same input
Try:
English
Japanese
Arabic
Chinese
The tokenisation of unbelievable text is fascinating!
BPE
Byte Pair Encoding
WordPiece
BERT-style
SentencePiece
Unigram-based