Installation

Requirements

  • Python 3.7 or higher

  • No runtime dependencies

  • Optional for source Rust builds: Rust toolchain + maturin

PyPI Installation

The easiest way to install Crosstem is via pip:

pip install crosstem

This installs the base package (~280 MB) with derivational and inflectional data for 15 languages.

Rust Acceleration Backend

Crosstem includes a PyO3-based Rust derivational backend for faster stemming. If your wheel includes the extension, it is used automatically. If not available, Crosstem safely falls back to the pure-Python implementation.

Build locally from source with Rust:

pip install maturin
maturin develop --manifest-path rust/Cargo.toml

Force Python fallback backend (for debugging/parity checks):

from crosstem import DerivationalStemmer
stemmer = DerivationalStemmer("eng", use_rust_backend=False)

Etymology Data (Optional)

The etymology dataset is ~1 GB and hosted separately on GitHub Releases. To download it:

from crosstem import download_etymology
download_etymology()

Or from the command line:

python -m crosstem.download

This downloads etymology.json to the crosstem/data directory with a progress bar.

Check if Etymology is Installed

from crosstem import is_etymology_downloaded

if is_etymology_downloaded():
    print("Etymology data is available")
else:
    print("Etymology data not found")

Remove Etymology Data

To free up disk space:

from crosstem import remove_etymology
remove_etymology()

Development Installation

To install from source for development:

git clone https://github.com/droidmaximus/crosstem.git
cd crosstem
pip install -e .

If you are working on the Rust extension itself:

pip install maturin
maturin develop --manifest-path rust/Cargo.toml

Verifying Installation

Test your installation:

python -c "from crosstem import DerivationalStemmer; print(DerivationalStemmer('eng').stem('organization'))"

Expected output: organize