Installation
Requirements
Python 3.7 or higher
No runtime dependencies
Optional for source Rust builds: Rust toolchain +
maturin
PyPI Installation
The easiest way to install Crosstem is via pip:
pip install crosstem
This installs the base package (~280 MB) with derivational and inflectional data for 15 languages.
Rust Acceleration Backend
Crosstem includes a PyO3-based Rust derivational backend for faster stemming. If your wheel includes the extension, it is used automatically. If not available, Crosstem safely falls back to the pure-Python implementation.
Build locally from source with Rust:
pip install maturin
maturin develop --manifest-path rust/Cargo.toml
Force Python fallback backend (for debugging/parity checks):
from crosstem import DerivationalStemmer
stemmer = DerivationalStemmer("eng", use_rust_backend=False)
Etymology Data (Optional)
The etymology dataset is ~1 GB and hosted separately on GitHub Releases. To download it:
from crosstem import download_etymology
download_etymology()
Or from the command line:
python -m crosstem.download
This downloads etymology.json to the crosstem/data directory with a progress bar.
Check if Etymology is Installed
from crosstem import is_etymology_downloaded
if is_etymology_downloaded():
print("Etymology data is available")
else:
print("Etymology data not found")
Remove Etymology Data
To free up disk space:
from crosstem import remove_etymology
remove_etymology()
Development Installation
To install from source for development:
git clone https://github.com/droidmaximus/crosstem.git
cd crosstem
pip install -e .
If you are working on the Rust extension itself:
pip install maturin
maturin develop --manifest-path rust/Cargo.toml
Verifying Installation
Test your installation:
python -c "from crosstem import DerivationalStemmer; print(DerivationalStemmer('eng').stem('organization'))"
Expected output: organize