API Reference
=============

This page documents all public classes and methods in Crosstem.

DerivationalStemmer
-------------------

.. py:class:: DerivationalStemmer(language: str = "eng", use_rust_backend: bool = True)

   Main class for finding morphological roots through derivational relationships.
   
   :param language: ISO 639-3 language code (e.g., 'eng', 'deu', 'fra')
   :type language: str
   :param use_rust_backend: Use Rust backend when available; falls back to Python if unavailable
   :type use_rust_backend: bool
   :raises ValueError: If language is not supported
   
   **Example**::
   
      from crosstem import DerivationalStemmer
      
      stemmer = DerivationalStemmer('eng')
      root = stemmer.stem('organization')

   .. py:method:: stem(word: str, use_derivations: bool = True) -> str
   
      Find the morphological root of a word using BFS graph traversal.
      
      :param word: The word to stem
      :type word: str
      :return: The morphological root, or the original word if not in graph
      :rtype: str
      
      **Algorithm**: Uses breadth-first search through derivational relationships,
      scoring candidates based on word length, part of speech, and productivity.
      
      **Example**::
      
         stemmer = DerivationalStemmer('eng')
         
         # Cross-POS stemming
         print(stemmer.stem('organization'))    # organize (noun → verb)
         print(stemmer.stem('beautiful'))       # beauty (adj → noun)
         
         # Multi-hop traversal
         print(stemmer.stem('organizational'))  # organize (2 hops)

   .. py:method:: get_word_family(word: str, max_depth: int = 2) -> list
   
      Get all words derived from the given root word.
      
      :param word: The root word
      :type word: str
      :return: Sorted list of words in the derivational family
      :rtype: list
   .. py:method:: get_derivations(word: str) -> list

      Get derivational links for a word.

      :param word: Input word
      :type word: str
      :return: List of derivation objects with ``form``, ``pos``, and ``relation``
      :rtype: list

      
      **Example**::
      
         stemmer = DerivationalStemmer('eng')
         family = stemmer.get_word_family('organize')
         print(len(family))  # 43 related words

InflectionAnalyzer
------------------

.. py:class:: InflectionAnalyzer(language: str)

   Analyzer for inflectional morphology (grammatical variations of the same word).
   
   :param language: ISO 639-3 language code
   :type language: str
   :raises ValueError: If language is not supported
   
   **Example**::
   
      from crosstem import InflectionAnalyzer
      
      analyzer = InflectionAnalyzer('eng')
      inflections = analyzer.get_inflections('run')

   .. py:method:: get_inflections(word: str) -> set
   
      Get all inflectional forms of a word.
      
      :param word: The base word
      :type word: str
      :return: Set of inflected forms
      :rtype: set
      
      **Example**::
      
         analyzer = InflectionAnalyzer('eng')
         
         print(analyzer.get_inflections('run'))
         # {'run', 'runs', 'running', 'ran'}
         
         print(analyzer.get_inflections('go'))
         # {'go', 'goes', 'going', 'went', 'gone'}

EtymologyLinker
---------------

.. py:class:: EtymologyLinker()

   Class for tracing cross-lingual etymology relationships.
   
   .. note::
      Requires etymology data to be downloaded first using :func:`download_etymology`.
   
   **Example**::
   
      from crosstem import EtymologyLinker, download_etymology
      
      download_etymology()  # One-time download
      linker = EtymologyLinker()

   .. py:method:: get_etymology(language: str, word: str) -> dict
   
      Get etymology information for a word.
      
      :param language: Full language name (e.g., 'English', 'French')
      :type language: str
      :param word: The word to look up
      :type word: str
      :return: Dictionary of etymology relationships
      :rtype: dict
      
      **Relationship types**:
      
      * ``INHERITED_FROM``: Inherited from ancestor language
      * ``BORROWED_FROM``: Borrowed/loaned from another language
      * ``DERIVED_FROM``: Derived from another word
      * ``ETYMOLOGICAL_ORIGIN_OF``: Source of another word
      
      **Example**::
      
         linker = EtymologyLinker()
         etymology = linker.get_etymology('English', 'organize')
         print(etymology)

   .. py:method:: get_borrowed_words(target_lang: str, source_lang: str) -> list
   
      Find all words borrowed from one language into another.
      
      :param target_lang: Language that borrowed words
      :type target_lang: str
      :param source_lang: Language that provided words
      :type source_lang: str
      :return: List of borrowed words
      :rtype: list
      
      **Example**::
      
         linker = EtymologyLinker()
         french_loans = linker.get_borrowed_words('English', 'French')
         print(f"Found {len(french_loans)} French loanwords")

Helper Functions
----------------

.. py:function:: download_etymology() -> None

   Download the etymology dataset (~1 GB) from GitHub Releases.
   
   Shows a progress bar during download and validates the file after completion.
   
   **Example**::
   
      from crosstem import download_etymology
      download_etymology()

.. py:function:: is_etymology_downloaded() -> bool

   Check if etymology data is available.
   
   :return: True if etymology.json exists, False otherwise
   :rtype: bool
   
   **Example**::
   
      from crosstem import is_etymology_downloaded
      
      if not is_etymology_downloaded():
          print("Please download etymology data first")

.. py:function:: remove_etymology() -> None

   Remove downloaded etymology data to free disk space.
   
   **Example**::
   
      from crosstem import remove_etymology
      remove_etymology()

Supported Languages
-------------------

.. py:data:: SUPPORTED_LANGUAGES
   :type: list

   List of supported ISO 639-3 language codes::
   
      [
          'cat',  # Catalan
          'ces',  # Czech
          'deu',  # German
          'eng',  # English
          'fin',  # Finnish
          'fra',  # French
          'hbs',  # Serbo-Croatian
          'hun',  # Hungarian
          'ita',  # Italian
          'mon',  # Mongolian
          'pol',  # Polish
          'por',  # Portuguese
          'rus',  # Russian
          'spa',  # Spanish
          'swe',  # Swedish
      ]

Exceptions
----------

.. py:exception:: ValueError

   Raised when an invalid language code is provided::
   
      stemmer = DerivationalStemmer('invalid')
      # ValueError: Language 'invalid' not supported

.. py:exception:: FileNotFoundError

   Raised when attempting to use etymology features without downloading data::
   
      linker = EtymologyLinker()  # Without downloading first
      # FileNotFoundError: Etymology data not found

Constants
---------

.. py:data:: MAX_DEPTH
   :type: int
   :value: 3

   Maximum depth for BFS traversal when finding roots.

.. py:data:: PRODUCTIVITY_THRESHOLDS
   :type: dict

   Language-specific productivity thresholds for filtering candidates::
   
      {
          'eng': {'V': 5, 'N': 9},    # English
          'deu': {'V': 4, 'N': 3},    # German
          'fra': {'V': 4, 'N': 5},    # French
          'rus': {'V': 3, 'N': 2},    # Russian
          # ... other languages
      }

Type Hints
----------

All public methods include type hints for better IDE support::

   from crosstem import DerivationalStemmer
   
   def process_text(text: str, language: str = 'eng') -> list[str]:
       """Process text and return stems."""
       stemmer = DerivationalStemmer(language)
       words = text.split()
       return [stemmer.stem(word) for word in words]