# Utility Function Patterns ## Purpose This document establishes coding patterns and conventions specifically for utility functions in StarPunk. These patterns ensure utilities are consistent, testable, and maintainable. ## Philosophy Utility functions should be: - **Pure**: No side effects where possible - **Focused**: One responsibility per function - **Predictable**: Same input always produces same output - **Testable**: Easy to test in isolation - **Documented**: Clear purpose and usage ## Function Design Patterns ### 1. Pure Functions (Preferred) **Pattern**: Functions that don't modify state or have side effects. **Good Example**: ```python def calculate_content_hash(content: str) -> str: """Calculate SHA-256 hash of content.""" return hashlib.sha256(content.encode('utf-8')).hexdigest() ``` **Why**: Easy to test, no hidden dependencies, predictable behavior. **When to Use**: Calculations, transformations, validations. ### 2. Functions with I/O Side Effects **Pattern**: Functions that read/write files or interact with external systems. **Good Example**: ```python def write_note_file(file_path: Path, content: str) -> None: """Write note content to file atomically.""" temp_path = file_path.with_suffix(file_path.suffix + '.tmp') try: temp_path.write_text(content, encoding='utf-8') temp_path.replace(file_path) except Exception: temp_path.unlink(missing_ok=True) raise ``` **Why**: Side effects are isolated, error handling is explicit, cleanup is guaranteed. **When to Use**: File operations, database operations, network calls. ### 3. Validation Functions **Pattern**: Functions that check validity and return boolean or raise exception. **Good Example** (Boolean Return): ```python def validate_slug(slug: str) -> bool: """Validate that slug meets requirements.""" if not slug: return False if len(slug) > MAX_SLUG_LENGTH: return False return bool(SLUG_PATTERN.match(slug)) ``` **Good Example** (Exception Raising): ```python def require_valid_slug(slug: str) -> None: """Require slug to be valid, raise ValueError if not.""" if not validate_slug(slug): raise ValueError( f"Invalid slug '{slug}': must be 1-100 characters, " f"lowercase alphanumeric with hyphens only" ) ``` **When to Use**: Input validation, precondition checking, security checks. ### 4. Generator Functions **Pattern**: Functions that yield values instead of returning lists. **Good Example**: ```python def iter_note_files(notes_dir: Path) -> Iterator[Path]: """Iterate over all note files in directory.""" for year_dir in sorted(notes_dir.iterdir()): if not year_dir.is_dir(): continue for month_dir in sorted(year_dir.iterdir()): if not month_dir.is_dir(): continue for note_file in sorted(month_dir.glob("*.md")): yield note_file ``` **Why**: Memory efficient, lazy evaluation, can be interrupted. **When to Use**: Processing many items, large datasets, streaming operations. ## Error Handling Patterns ### 1. Specific Exceptions **Pattern**: Raise specific exception types with descriptive messages. **Good**: ```python def generate_slug(content: str) -> str: """Generate slug from content.""" if not content or not content.strip(): raise ValueError("Content cannot be empty or whitespace-only") # ... rest of function ``` **Bad**: ```python def generate_slug(content: str) -> str: if not content: raise Exception("Bad input") # Too generic ``` ### 2. Error Message Format **Pattern**: Include context, expected behavior, and actual problem. **Good**: ```python raise ValueError( f"Invalid slug '{slug}': must contain only lowercase letters, " f"numbers, and hyphens (pattern: {SLUG_PATTERN.pattern})" ) raise FileNotFoundError( f"Note file not found: {file_path}. " f"Database may be out of sync with filesystem." ) ``` **Bad**: ```python raise ValueError("Invalid slug") raise FileNotFoundError("File missing") ``` ### 3. Exception Chaining **Pattern**: Preserve original exception context when re-raising. **Good**: ```python def read_note_file(file_path: Path) -> str: """Read note content from file.""" try: return file_path.read_text(encoding='utf-8') except UnicodeDecodeError as e: raise ValueError( f"Failed to read {file_path}: invalid UTF-8 encoding" ) from e ``` **Why**: Preserves stack trace, shows root cause. ### 4. Cleanup on Error **Pattern**: Use try/finally or context managers to ensure cleanup. **Good**: ```python def write_note_file(file_path: Path, content: str) -> None: """Write note content to file atomically.""" temp_path = file_path.with_suffix('.tmp') try: temp_path.write_text(content, encoding='utf-8') temp_path.replace(file_path) except Exception: temp_path.unlink(missing_ok=True) # Cleanup on error raise ``` ## Type Hint Patterns ### 1. Basic Types **Pattern**: Use built-in types where possible. ```python def generate_slug(content: str, created_at: Optional[datetime] = None) -> str: """Generate URL-safe slug from content.""" pass def validate_note_path(file_path: Path, data_dir: Path) -> bool: """Validate file path is within data directory.""" pass ``` ### 2. Collection Types **Pattern**: Specify element types for collections. ```python from typing import List, Dict, Set, Optional def make_slug_unique(base_slug: str, existing_slugs: Set[str]) -> str: """Make slug unique by adding suffix if needed.""" pass def get_note_paths(notes_dir: Path) -> List[Path]: """Get all note file paths.""" pass ``` ### 3. Optional Types **Pattern**: Use Optional[T] for nullable parameters and returns. ```python from typing import Optional def find_note(slug: str) -> Optional[Path]: """Find note file by slug, returns None if not found.""" pass ``` ### 4. Union Types (Use Sparingly) **Pattern**: Use Union when a parameter truly accepts multiple types. ```python from typing import Union from pathlib import Path def ensure_path(path: Union[str, Path]) -> Path: """Convert string or Path to Path object.""" return Path(path) if isinstance(path, str) else path ``` **Note**: Prefer single types. Only use Union when necessary. ## Documentation Patterns ### 1. Function Docstrings **Pattern**: Google-style docstrings with sections. ```python def generate_slug(content: str, created_at: Optional[datetime] = None) -> str: """ Generate URL-safe slug from note content Creates a slug by extracting the first few words from the content and normalizing them to lowercase with hyphens. If content is insufficient, falls back to timestamp-based slug. Args: content: The note content (markdown text) created_at: Optional timestamp for fallback slug (defaults to now) Returns: URL-safe slug string (lowercase, alphanumeric + hyphens only) Raises: ValueError: If content is empty or contains only whitespace Examples: >>> generate_slug("Hello World! This is my first note.") 'hello-world-this-is-my' >>> generate_slug("Testing... with special chars!@#") 'testing-with-special-chars' Notes: - This function does NOT check for uniqueness - Caller must verify slug doesn't exist in database """ ``` **Required Sections**: - Summary (first line) - Description (paragraph after summary) - Args (if any parameters) - Returns (if returns value) - Raises (if raises exceptions) **Optional Sections**: - Examples (highly recommended) - Notes (for important caveats) - References (for external specs) ### 2. Inline Comments **Pattern**: Explain why, not what. **Good**: ```python # Use atomic rename to prevent file corruption if interrupted temp_path.replace(file_path) # Random suffix prevents enumeration attacks suffix = secrets.token_urlsafe(4)[:4] ``` **Bad**: ```python # Rename temp file to final path temp_path.replace(file_path) # Generate suffix suffix = secrets.token_urlsafe(4)[:4] ``` ### 3. Module Docstrings **Pattern**: Describe module purpose and contents. ```python """ Core utility functions for StarPunk This module provides essential utilities for slug generation, file operations, hashing, and date/time handling. These utilities are used throughout the application and have no external dependencies beyond standard library and Flask configuration. Functions: generate_slug: Create URL-safe slug from content calculate_content_hash: Calculate SHA-256 hash write_note_file: Atomically write note to file format_rfc822: Format datetime for RSS feeds Constants: MAX_SLUG_LENGTH: Maximum slug length (100) SLUG_PATTERN: Regex for valid slugs """ ``` ## Testing Patterns ### 1. Test Function Naming **Pattern**: `test_{function_name}_{scenario}` ```python def test_generate_slug_from_content(): """Test basic slug generation from content.""" slug = generate_slug("Hello World This Is My Note") assert slug == "hello-world-this-is-my" def test_generate_slug_empty_content(): """Test slug generation raises error on empty content.""" with pytest.raises(ValueError): generate_slug("") def test_generate_slug_special_characters(): """Test slug generation removes special characters.""" slug = generate_slug("Testing... with!@# special chars") assert slug == "testing-with-special-chars" ``` ### 2. Test Organization **Pattern**: Group related tests in classes. ```python class TestSlugGeneration: """Test slug generation functions""" def test_generate_slug_from_content(self): """Test basic slug generation.""" pass def test_generate_slug_empty_content(self): """Test error on empty content.""" pass class TestContentHashing: """Test content hashing functions""" def test_calculate_hash_consistency(self): """Test hash is consistent.""" pass ``` ### 3. Fixtures for Common Setup **Pattern**: Use pytest fixtures for reusable test data. ```python import pytest from pathlib import Path @pytest.fixture def temp_note_file(tmp_path): """Create temporary note file for testing.""" file_path = tmp_path / "test.md" file_path.write_text("# Test Note") return file_path def test_read_note_file(temp_note_file): """Test reading note file.""" content = read_note_file(temp_note_file) assert content == "# Test Note" ``` ### 4. Parameterized Tests **Pattern**: Test multiple cases with one test function. ```python @pytest.mark.parametrize("content,expected", [ ("Hello World", "hello-world"), ("Testing 123", "testing-123"), ("Special!@# Chars", "special-chars"), ]) def test_generate_slug_variations(content, expected): """Test slug generation with various inputs.""" slug = generate_slug(content) assert slug == expected ``` ## Security Patterns ### 1. Path Validation **Pattern**: Always validate paths before file operations. ```python def safe_file_operation(file_path: Path, data_dir: Path) -> None: """Perform file operation with path validation.""" # ALWAYS validate first if not validate_note_path(file_path, data_dir): raise ValueError(f"Invalid file path: {file_path}") # Now safe to operate file_path.write_text("content") ``` ### 2. Use secrets for Random **Pattern**: Use `secrets` module, not `random` for security-sensitive operations. **Good**: ```python import secrets def generate_random_suffix(length: int = 4) -> str: """Generate cryptographically secure random suffix.""" chars = 'abcdefghijklmnopqrstuvwxyz0123456789' return ''.join(secrets.choice(chars) for _ in range(length)) ``` **Bad**: ```python import random def generate_random_suffix(length: int = 4) -> str: """Generate random suffix.""" chars = 'abcdefghijklmnopqrstuvwxyz0123456789' return ''.join(random.choice(chars) for _ in range(length)) # NOT secure ``` ### 3. Input Sanitization **Pattern**: Validate and sanitize all external input. ```python def generate_slug(content: str) -> str: """Generate slug from content.""" # Validate input if not content or not content.strip(): raise ValueError("Content cannot be empty") # Sanitize by removing dangerous characters normalized = normalize_slug_text(content) # Additional validation if not validate_slug(normalized): # Fallback to safe default normalized = datetime.utcnow().strftime("%Y%m%d-%H%M%S") return normalized ``` ## Performance Patterns ### 1. Lazy Evaluation **Pattern**: Don't compute what you don't need. **Good**: ```python def generate_slug(content: str, created_at: Optional[datetime] = None) -> str: """Generate slug from content.""" slug = normalize_slug_text(content) # Only generate timestamp if needed if not slug: created_at = created_at or datetime.utcnow() slug = created_at.strftime("%Y%m%d-%H%M%S") return slug ``` ### 2. Compile Regex Once **Pattern**: Define regex patterns as module constants. **Good**: ```python # Module level - compiled once SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$') SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]') def validate_slug(slug: str) -> bool: """Validate slug.""" return bool(SLUG_PATTERN.match(slug)) def normalize_slug_text(text: str) -> str: """Normalize text for slug.""" return SAFE_SLUG_PATTERN.sub('', text) ``` **Bad**: ```python def validate_slug(slug: str) -> bool: """Validate slug.""" # Compiles regex every call - inefficient return bool(re.match(r'^[a-z0-9]+(?:-[a-z0-9]+)*$', slug)) ``` ### 3. Avoid Premature Optimization **Pattern**: Write clear code first, optimize if needed. **Good**: ```python def extract_first_words(text: str, max_words: int = 5) -> str: """Extract first N words from text.""" words = text.split() return ' '.join(words[:max_words]) ``` **Don't Do This Unless Profiling Shows It's Necessary**: ```python def extract_first_words(text: str, max_words: int = 5) -> str: """Extract first N words from text.""" # Premature optimization - more complex, minimal gain words = [] count = 0 for word in text.split(): words.append(word) count += 1 if count >= max_words: break return ' '.join(words) ``` ## Constants Pattern ### 1. Module-Level Constants **Pattern**: Define configuration as constants at module level. ```python # Slug configuration MAX_SLUG_LENGTH = 100 MIN_SLUG_LENGTH = 1 SLUG_WORDS_COUNT = 5 RANDOM_SUFFIX_LENGTH = 4 # File operations TEMP_FILE_SUFFIX = '.tmp' TRASH_DIR_NAME = '.trash' # Hashing CONTENT_HASH_ALGORITHM = 'sha256' ``` **Why**: Easy to modify, clear intent, compile-time resolution. ### 2. Naming Convention **Pattern**: ALL_CAPS_WITH_UNDERSCORES ```python MAX_NOTE_LENGTH = 10000 # Good DEFAULT_ENCODING = 'utf-8' # Good maxNoteLength = 10000 # Bad max_note_length = 10000 # Bad (looks like variable) ``` ### 3. Magic Numbers **Pattern**: Replace magic numbers with named constants. **Good**: ```python SLUG_WORDS_COUNT = 5 def generate_slug(content: str) -> str: """Generate slug from content.""" words = content.split()[:SLUG_WORDS_COUNT] return normalize_slug_text(' '.join(words)) ``` **Bad**: ```python def generate_slug(content: str) -> str: """Generate slug from content.""" words = content.split()[:5] # What is 5? Why 5? return normalize_slug_text(' '.join(words)) ``` ## Anti-Patterns to Avoid ### 1. God Functions **Bad**: ```python def process_note(content, do_hash=True, do_slug=True, do_path=True, ...): """Do everything with a note.""" # 500 lines of code doing too many things pass ``` **Good**: ```python def generate_slug(content: str) -> str: """Generate slug.""" pass def calculate_hash(content: str) -> str: """Calculate hash.""" pass def generate_path(slug: str, timestamp: datetime) -> Path: """Generate path.""" pass ``` ### 2. Mutable Default Arguments **Bad**: ```python def create_note(content: str, tags: list = []) -> Note: tags.append('note') # Modifies shared default list! return Note(content, tags) ``` **Good**: ```python def create_note(content: str, tags: Optional[List[str]] = None) -> Note: if tags is None: tags = [] tags.append('note') return Note(content, tags) ``` ### 3. Returning Multiple Types **Bad**: ```python def find_note(slug: str) -> Union[Note, bool, None]: """Find note by slug.""" if slug_invalid: return False note = db.query(slug) return note if note else None ``` **Good**: ```python def find_note(slug: str) -> Optional[Note]: """Find note by slug, returns None if not found.""" if not validate_slug(slug): raise ValueError(f"Invalid slug: {slug}") return db.query(slug) # Returns Note or None ``` ### 4. Silent Failures **Bad**: ```python def generate_slug(content: str) -> str: """Generate slug.""" try: slug = normalize_slug_text(content) return slug if slug else "untitled" # Silent fallback except Exception: return "untitled" # Swallowing errors ``` **Good**: ```python def generate_slug(content: str) -> str: """Generate slug.""" if not content or not content.strip(): raise ValueError("Content cannot be empty") slug = normalize_slug_text(content) if not slug: # Explicit fallback with timestamp slug = datetime.utcnow().strftime("%Y%m%d-%H%M%S") return slug ``` ## Summary **Key Principles**: 1. Pure functions are preferred 2. Specific exceptions with clear messages 3. Type hints on all functions 4. Comprehensive docstrings 5. Security-first validation 6. Test everything thoroughly 7. Constants for configuration 8. Clear over clever **Remember**: Utility functions are the foundation. Make them rock-solid. ## References - [Python Coding Standards](/home/phil/Projects/starpunk/docs/standards/python-coding-standards.md) - [PEP 8 - Style Guide](https://peps.python.org/pep-0008/) - [PEP 257 - Docstring Conventions](https://peps.python.org/pep-0257/) - [PEP 484 - Type Hints](https://peps.python.org/pep-0484/) - [Python secrets Documentation](https://docs.python.org/3/library/secrets.html) - [Pytest Best Practices](https://docs.pytest.org/en/stable/goodpractices.html)