Quick Start

This guide will get you started with Crosstem in 5 minutes.

Note

Crosstem 1.0 uses a Rust-accelerated derivational backend (PyO3) when available. If the extension is unavailable, it automatically falls back to the pure-Python backend.

Basic Stemming

from crosstem import DerivationalStemmer

# Initialize stemmer for English
stemmer = DerivationalStemmer('eng')

# Stem a single word
print(stemmer.stem('organization'))  # Output: organize
print(stemmer.stem('beautiful'))     # Output: beauty
print(stemmer.stem('happiness'))     # Output: happy

Backend Selection (Optional)

By default, Crosstem uses Rust acceleration when available:

from crosstem import DerivationalStemmer

# Default: Rust backend if installed, Python fallback otherwise
fast_stemmer = DerivationalStemmer('eng')

# Force pure-Python backend (debugging/parity checks)
py_stemmer = DerivationalStemmer('eng', use_rust_backend=False)

Cross-POS Stemming

Unlike traditional stemmers, Crosstem finds roots across parts of speech:

stemmer = DerivationalStemmer('eng')

# Noun → Verb
print(stemmer.stem('organization'))  # organize
print(stemmer.stem('destruction'))   # destruct

# Adjective → Noun
print(stemmer.stem('beautiful'))     # beauty
print(stemmer.stem('organizational')) # organize

Batch Processing

words = ['organization', 'organizational', 'organize', 'organizing']
stems = [stemmer.stem(word) for word in words]
print(stems)  # ['organize', 'organize', 'organize', 'organize']

Word Families

Find all words derived from a root:

stemmer = DerivationalStemmer('eng')
family = stemmer.get_word_family('organize')
print(f"Found {len(family)} related words")
print(family[:10])  # First 10 words

Inflectional Analysis

from crosstem import InflectionAnalyzer

analyzer = InflectionAnalyzer('eng')

# Analyze word inflections
inflections = analyzer.get_inflections('run')
print(inflections)
# Output: {'runs', 'running', 'ran'}

Etymology Tracing

First, download the etymology data:

from crosstem import download_etymology
download_etymology()

Then trace word origins:

from crosstem import EtymologyLinker

linker = EtymologyLinker()

# Find etymology
etymology = linker.get_etymology('English', 'organize')
print(etymology)

Multi-language Support

Crosstem supports 15 languages:

# German
de_stemmer = DerivationalStemmer('deu')
print(de_stemmer.stem('Organisation'))  # organisieren

# French
fr_stemmer = DerivationalStemmer('fra')
print(fr_stemmer.stem('organisation'))  # organiser

# Spanish
es_stemmer = DerivationalStemmer('spa')
print(es_stemmer.stem('organización'))  # organizar

Supported Languages

cat - Catalan
ces - Czech
deu - German
eng - English
fin - Finnish
fra - French
hbs - Serbo-Croatian
hun - Hungarian
ita - Italian
mon - Mongolian
pol - Polish
por - Portuguese
rus - Russian
spa - Spanish
swe - Swedish

Next Steps

Read the User Guide for detailed usage
Learn about the Algorithm behind Crosstem
Check out Examples for real-world use cases
See the API Reference reference for all available methods