tok: Fast Text Tokenization

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <>. It's extremely fast for both training new vocabularies and tokenizing texts.

Version: 0.1.3
Depends: R (≥ 4.2.0)
Imports: R6, cli
Suggests: rmarkdown, testthat (≥ 3.0.0), hfhub (≥ 0.1.1), withr
Published: 2024-07-06
DOI: 10.32614/CRAN.package.tok
Author: Daniel Falbel [aut, cre], Posit [cph]
tok author details
Maintainer: Daniel Falbel <daniel at>
License: MIT + file LICENSE
NeedsCompilation: yes
SystemRequirements: Rust tool chain w/ cargo, libclang/llvm-config
Materials: README NEWS
CRAN checks: tok results


Reference manual: tok.pdf


Package source: tok_0.1.3.tar.gz
Windows binaries: r-devel:, r-release:, r-oldrel:
macOS binaries: r-release (arm64): tok_0.1.3.tgz, r-oldrel (arm64): tok_0.1.3.tgz, r-release (x86_64): tok_0.1.3.tgz, r-oldrel (x86_64): tok_0.1.1.tgz
Old sources: tok archive

Reverse dependencies:

Reverse imports: sacRebleu


Please use the canonical form to link to this page.