Quick Start

This guide will get you started with Crosstem in 5 minutes.

Note

Crosstem 1.0 uses a Rust-accelerated derivational backend (PyO3) when available. If the extension is unavailable, it automatically falls back to the pure-Python backend.

Basic Stemming

from crosstem import DerivationalStemmer

# Initialize stemmer for English
stemmer = DerivationalStemmer('eng')

# Stem a single word
print(stemmer.stem('organization'))  # Output: organize
print(stemmer.stem('beautiful'))     # Output: beauty
print(stemmer.stem('happiness'))     # Output: happy

Backend Selection (Optional)

By default, Crosstem uses Rust acceleration when available:

from crosstem import DerivationalStemmer

# Default: Rust backend if installed, Python fallback otherwise
fast_stemmer = DerivationalStemmer('eng')

# Force pure-Python backend (debugging/parity checks)
py_stemmer = DerivationalStemmer('eng', use_rust_backend=False)

Cross-POS Stemming

Unlike traditional stemmers, Crosstem finds roots across parts of speech:

stemmer = DerivationalStemmer('eng')

# Noun → Verb
print(stemmer.stem('organization'))  # organize
print(stemmer.stem('destruction'))   # destruct

# Adjective → Noun
print(stemmer.stem('beautiful'))     # beauty
print(stemmer.stem('organizational')) # organize

Batch Processing

words = ['organization', 'organizational', 'organize', 'organizing']
stems = [stemmer.stem(word) for word in words]
print(stems)  # ['organize', 'organize', 'organize', 'organize']

Word Families

Find all words derived from a root:

stemmer = DerivationalStemmer('eng')
family = stemmer.get_word_family('organize')
print(f"Found {len(family)} related words")
print(family[:10])  # First 10 words

Inflectional Analysis

from crosstem import InflectionAnalyzer

analyzer = InflectionAnalyzer('eng')

# Analyze word inflections
inflections = analyzer.get_inflections('run')
print(inflections)
# Output: {'runs', 'running', 'ran'}

Etymology Tracing

First, download the etymology data:

from crosstem import download_etymology
download_etymology()

Then trace word origins:

from crosstem import EtymologyLinker

linker = EtymologyLinker()

# Find etymology
etymology = linker.get_etymology('English', 'organize')
print(etymology)

Multi-language Support

Crosstem supports 15 languages:

# German
de_stemmer = DerivationalStemmer('deu')
print(de_stemmer.stem('Organisation'))  # organisieren

# French
fr_stemmer = DerivationalStemmer('fra')
print(fr_stemmer.stem('organisation'))  # organiser

# Spanish
es_stemmer = DerivationalStemmer('spa')
print(es_stemmer.stem('organización'))  # organizar

Supported Languages

  • cat - Catalan

  • ces - Czech

  • deu - German

  • eng - English

  • fin - Finnish

  • fra - French

  • hbs - Serbo-Croatian

  • hun - Hungarian

  • ita - Italian

  • mon - Mongolian

  • pol - Polish

  • por - Portuguese

  • rus - Russian

  • spa - Spanish

  • swe - Swedish

Next Steps