Quick Start
This guide will get you started with Crosstem in 5 minutes.
Note
Crosstem 1.0 uses a Rust-accelerated derivational backend (PyO3) when available. If the extension is unavailable, it automatically falls back to the pure-Python backend.
Basic Stemming
from crosstem import DerivationalStemmer
# Initialize stemmer for English
stemmer = DerivationalStemmer('eng')
# Stem a single word
print(stemmer.stem('organization')) # Output: organize
print(stemmer.stem('beautiful')) # Output: beauty
print(stemmer.stem('happiness')) # Output: happy
Backend Selection (Optional)
By default, Crosstem uses Rust acceleration when available:
from crosstem import DerivationalStemmer
# Default: Rust backend if installed, Python fallback otherwise
fast_stemmer = DerivationalStemmer('eng')
# Force pure-Python backend (debugging/parity checks)
py_stemmer = DerivationalStemmer('eng', use_rust_backend=False)
Cross-POS Stemming
Unlike traditional stemmers, Crosstem finds roots across parts of speech:
stemmer = DerivationalStemmer('eng')
# Noun → Verb
print(stemmer.stem('organization')) # organize
print(stemmer.stem('destruction')) # destruct
# Adjective → Noun
print(stemmer.stem('beautiful')) # beauty
print(stemmer.stem('organizational')) # organize
Batch Processing
words = ['organization', 'organizational', 'organize', 'organizing']
stems = [stemmer.stem(word) for word in words]
print(stems) # ['organize', 'organize', 'organize', 'organize']
Word Families
Find all words derived from a root:
stemmer = DerivationalStemmer('eng')
family = stemmer.get_word_family('organize')
print(f"Found {len(family)} related words")
print(family[:10]) # First 10 words
Inflectional Analysis
from crosstem import InflectionAnalyzer
analyzer = InflectionAnalyzer('eng')
# Analyze word inflections
inflections = analyzer.get_inflections('run')
print(inflections)
# Output: {'runs', 'running', 'ran'}
Etymology Tracing
First, download the etymology data:
from crosstem import download_etymology
download_etymology()
Then trace word origins:
from crosstem import EtymologyLinker
linker = EtymologyLinker()
# Find etymology
etymology = linker.get_etymology('English', 'organize')
print(etymology)
Multi-language Support
Crosstem supports 15 languages:
# German
de_stemmer = DerivationalStemmer('deu')
print(de_stemmer.stem('Organisation')) # organisieren
# French
fr_stemmer = DerivationalStemmer('fra')
print(fr_stemmer.stem('organisation')) # organiser
# Spanish
es_stemmer = DerivationalStemmer('spa')
print(es_stemmer.stem('organización')) # organizar
Supported Languages
cat - Catalan
ces - Czech
deu - German
eng - English
fin - Finnish
fra - French
hbs - Serbo-Croatian
hun - Hungarian
ita - Italian
mon - Mongolian
pol - Polish
por - Portuguese
rus - Russian
spa - Spanish
swe - Swedish
Next Steps
Read the User Guide for detailed usage
Learn about the Algorithm behind Crosstem
Check out Examples for real-world use cases
See the API Reference reference for all available methods