Contributing ============ We welcome contributions to Crosstem! This guide explains how to contribute. Ways to Contribute ------------------- * **Bug reports**: Found an issue? Report it on GitHub * **Feature requests**: Suggest new features or improvements * **Code contributions**: Submit pull requests with fixes or enhancements * **Documentation**: Improve docs, add examples, fix typos * **Language data**: Add support for new languages * **Testing**: Write tests, report edge cases Getting Started --------------- Development Setup ~~~~~~~~~~~~~~~~~ 1. Fork the repository on GitHub 2. Clone your fork locally:: git clone https://github.com/YOUR_USERNAME/crossstem.git cd crossstem 3. Create a virtual environment:: python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate 4. Install in development mode:: pip install -e . 5. Install development dependencies:: pip install pytest black flake8 mypy Running Tests ~~~~~~~~~~~~~ :: # Run all tests pytest # Run with coverage pytest --cov=crossstem # Run specific test file pytest tests/test_stemmer.py Code Style ~~~~~~~~~~ We use Black for formatting and flake8 for linting:: # Format code black crossstem/ # Check linting flake8 crossstem/ # Type checking mypy crossstem/ Reporting Bugs -------------- Before reporting a bug: 1. Check if it's already reported in GitHub Issues 2. Make sure you're using the latest version 3. Test with a minimal reproducible example Bug Report Template ~~~~~~~~~~~~~~~~~~~ :: **Describe the bug** A clear description of what the bug is. **To Reproduce** Steps to reproduce the behavior: 1. Import Crossstem 2. Call stemmer.stem('word') 3. See error **Expected behavior** What you expected to happen. **Actual behavior** What actually happened. **Environment** - OS: [e.g., Windows 10, Ubuntu 22.04] - Python version: [e.g., 3.9.7] - Crossstem version: [e.g., 0.2.0] **Minimal example** ```python from crossstem import DerivationalStemmer stemmer = DerivationalStemmer('eng') print(stemmer.stem('problematic_word')) ``` Feature Requests ---------------- We're open to new features! Please describe: 1. **Use case**: Why is this feature needed? 2. **Proposal**: How should it work? 3. **Examples**: Show example usage 4. **Alternatives**: What alternatives exist? Pull Requests ------------- PR Checklist ~~~~~~~~~~~~ Before submitting a PR: - [ ] Code follows the project style (Black + flake8) - [ ] All tests pass - [ ] New tests added for new features - [ ] Documentation updated - [ ] CHANGELOG.md updated - [ ] Commit messages are clear PR Process ~~~~~~~~~~ 1. Create a feature branch:: git checkout -b feature/amazing-feature 2. Make your changes 3. Add tests for new functionality 4. Ensure all tests pass 5. Commit with clear messages:: git commit -m "Add amazing feature" 6. Push to your fork:: git push origin feature/amazing-feature 7. Open a Pull Request on GitHub Adding Languages ---------------- To add support for a new language: Data Requirements ~~~~~~~~~~~~~~~~~ 1. **Derivational data**: MorphyNet-compatible JSON format 2. **Inflectional data**: UniMorph-compatible TSV format 3. **Minimum coverage**: At least 20,000 words 4. **License**: Must be open license (CC BY-SA or similar) Format Example ~~~~~~~~~~~~~~ Derivational data (``_derivations.json``):: { "word1": { "derives_from": ["parent1", "parent2"], "derives_to": ["child1", "child2"], "pos": "V" }, "word2": { "derives_from": [], "derives_to": ["child3"], "pos": "N" } } Calibrating Thresholds ~~~~~~~~~~~~~~~~~~~~~~~ 1. Analyze productivity distribution:: python scripts/analyze_productivity.py 2. Set thresholds in ``crossstem/stemmer.py``:: PRODUCTIVITY_THRESHOLDS = { 'new': {'V': 3, 'N': 4}, # Your language # ... existing languages } 3. Test stemming quality:: python scripts/test_language.py 4. Adjust thresholds based on results Adding Tests ~~~~~~~~~~~~ Create ``tests/test_.py``:: def test__stemming(): stemmer = DerivationalStemmer('') # Test cases assert stemmer.stem('word1') == 'expected_root1' assert stemmer.stem('word2') == 'expected_root2' # Multi-hop cases assert stemmer.stem('derived_word') == 'root' Documentation Updates ~~~~~~~~~~~~~~~~~~~~~ 1. Add language to ``docs/source/languages.rst`` 2. Update README.md with new language count 3. Add examples in ``docs/source/examples.rst`` Improving Algorithm ------------------- If you have ideas for improving the BFS algorithm: 1. Open an issue to discuss the approach 2. Provide benchmark results showing improvement 3. Include examples of edge cases it handles better 4. Ensure it doesn't regress existing behavior Testing Strategy ~~~~~~~~~~~~~~~~ * Benchmark against Porter on common word lists * Test accuracy on hand-labeled examples * Measure speed with large corpora * Verify behavior across all 15 languages Code Organization ----------------- Project Structure ~~~~~~~~~~~~~~~~~ :: crossstem/ ├── __init__.py # Package exports ├── stemmer.py # DerivationalStemmer class ├── analyzer.py # InflectionAnalyzer class ├── etymology_linker.py # EtymologyLinker class ├── download.py # Etymology download utilities ├── exceptions.py # Custom exceptions └── data/ # Language data files Testing Structure ~~~~~~~~~~~~~~~~~ :: tests/ ├── test_stemmer.py # Stemming tests ├── test_analyzer.py # Inflection tests ├── test_etymology.py # Etymology tests └── test_.py # Language-specific tests Adding Documentation -------------------- Documentation is built with Sphinx and hosted on Read the Docs. Local Build ~~~~~~~~~~~ :: cd docs/ pip install -r requirements.txt make html View at ``docs/build/html/index.html`` Adding Pages ~~~~~~~~~~~~ 1. Create ``docs/source/.rst`` 2. Add to ``index.rst`` table of contents 3. Build and verify locally 4. Submit PR Docstring Style ~~~~~~~~~~~~~~~ Use Google-style docstrings:: def stem(self, word: str) -> str: """Find the morphological root of a word. Args: word: The word to stem Returns: The morphological root Example: >>> stemmer = DerivationalStemmer('eng') >>> stemmer.stem('organization') 'organize' """ Code Review Process ------------------- All PRs are reviewed by maintainers. We look for: * **Correctness**: Does it work as intended? * **Tests**: Is it well-tested? * **Documentation**: Is it documented? * **Style**: Does it follow conventions? * **Performance**: Does it maintain speed? Feedback may include: * Requests for changes * Suggestions for improvements * Questions about design decisions Please be patient and constructive in discussions. Release Process --------------- Versioning ~~~~~~~~~~ We follow Semantic Versioning (semver): * **MAJOR**: Incompatible API changes * **MINOR**: New features, backwards-compatible * **PATCH**: Bug fixes, backwards-compatible Maintainer Responsibilities ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Review and merge PRs 2. Update CHANGELOG.md 3. Create GitHub releases 4. Publish to PyPI 5. Update documentation Community Guidelines -------------------- * Be respectful and constructive * Focus on the issue, not the person * Assume good intentions * Ask questions when unclear * Give credit to contributors License ------- By contributing, you agree that your contributions will be licensed under the MIT License. Questions? ---------- * Open an issue on GitHub * Tag maintainers: @droidmaximus Thank you for contributing to Crossstem!