API Reference
This page documents all public classes and methods in Crosstem.
DerivationalStemmer
- class DerivationalStemmer(language: str = 'eng', use_rust_backend: bool = True)
Main class for finding morphological roots through derivational relationships.
- Parameters:
- Raises:
ValueError – If language is not supported
Example:
from crosstem import DerivationalStemmer stemmer = DerivationalStemmer('eng') root = stemmer.stem('organization')
- stem(word: str, use_derivations: bool = True) str
Find the morphological root of a word using BFS graph traversal.
- Parameters:
word (str) – The word to stem
- Returns:
The morphological root, or the original word if not in graph
- Return type:
Algorithm: Uses breadth-first search through derivational relationships, scoring candidates based on word length, part of speech, and productivity.
Example:
stemmer = DerivationalStemmer('eng') # Cross-POS stemming print(stemmer.stem('organization')) # organize (noun → verb) print(stemmer.stem('beautiful')) # beauty (adj → noun) # Multi-hop traversal print(stemmer.stem('organizational')) # organize (2 hops)
- get_word_family(word: str, max_depth: int = 2) list
Get all words derived from the given root word.
- get_derivations(word: str) list
Get derivational links for a word.
- Parameters:
word (str) – Input word
- Returns:
List of derivation objects with
form,pos, andrelation- Return type:
Example:
stemmer = DerivationalStemmer('eng') family = stemmer.get_word_family('organize') print(len(family)) # 43 related words
InflectionAnalyzer
- class InflectionAnalyzer(language: str)
Analyzer for inflectional morphology (grammatical variations of the same word).
- Parameters:
language (str) – ISO 639-3 language code
- Raises:
ValueError – If language is not supported
Example:
from crosstem import InflectionAnalyzer analyzer = InflectionAnalyzer('eng') inflections = analyzer.get_inflections('run')
EtymologyLinker
- class EtymologyLinker
Class for tracing cross-lingual etymology relationships.
Note
Requires etymology data to be downloaded first using
download_etymology().Example:
from crosstem import EtymologyLinker, download_etymology download_etymology() # One-time download linker = EtymologyLinker()
- get_etymology(language: str, word: str) dict
Get etymology information for a word.
- Parameters:
- Returns:
Dictionary of etymology relationships
- Return type:
Relationship types:
INHERITED_FROM: Inherited from ancestor languageBORROWED_FROM: Borrowed/loaned from another languageDERIVED_FROM: Derived from another wordETYMOLOGICAL_ORIGIN_OF: Source of another word
Example:
linker = EtymologyLinker() etymology = linker.get_etymology('English', 'organize') print(etymology)
- get_borrowed_words(target_lang: str, source_lang: str) list
Find all words borrowed from one language into another.
- Parameters:
- Returns:
List of borrowed words
- Return type:
Example:
linker = EtymologyLinker() french_loans = linker.get_borrowed_words('English', 'French') print(f"Found {len(french_loans)} French loanwords")
Helper Functions
- download_etymology() None
Download the etymology dataset (~1 GB) from GitHub Releases.
Shows a progress bar during download and validates the file after completion.
Example:
from crosstem import download_etymology download_etymology()
Supported Languages
- SUPPORTED_LANGUAGES: list
List of supported ISO 639-3 language codes:
[ 'cat', # Catalan 'ces', # Czech 'deu', # German 'eng', # English 'fin', # Finnish 'fra', # French 'hbs', # Serbo-Croatian 'hun', # Hungarian 'ita', # Italian 'mon', # Mongolian 'pol', # Polish 'por', # Portuguese 'rus', # Russian 'spa', # Spanish 'swe', # Swedish ]
Exceptions
- exception ValueError
Raised when an invalid language code is provided:
stemmer = DerivationalStemmer('invalid') # ValueError: Language 'invalid' not supported
- exception FileNotFoundError
Raised when attempting to use etymology features without downloading data:
linker = EtymologyLinker() # Without downloading first # FileNotFoundError: Etymology data not found
Constants
Type Hints
All public methods include type hints for better IDE support:
from crosstem import DerivationalStemmer
def process_text(text: str, language: str = 'eng') -> list[str]:
"""Process text and return stems."""
stemmer = DerivationalStemmer(language)
words = text.split()
return [stemmer.stem(word) for word in words]