Files
StarPunk/docs/standards/utility-function-patterns.md
2025-11-18 19:21:31 -07:00

18 KiB

Utility Function Patterns

Purpose

This document establishes coding patterns and conventions specifically for utility functions in StarPunk. These patterns ensure utilities are consistent, testable, and maintainable.

Philosophy

Utility functions should be:

  • Pure: No side effects where possible
  • Focused: One responsibility per function
  • Predictable: Same input always produces same output
  • Testable: Easy to test in isolation
  • Documented: Clear purpose and usage

Function Design Patterns

1. Pure Functions (Preferred)

Pattern: Functions that don't modify state or have side effects.

Good Example:

def calculate_content_hash(content: str) -> str:
    """Calculate SHA-256 hash of content."""
    return hashlib.sha256(content.encode('utf-8')).hexdigest()

Why: Easy to test, no hidden dependencies, predictable behavior.

When to Use: Calculations, transformations, validations.

2. Functions with I/O Side Effects

Pattern: Functions that read/write files or interact with external systems.

Good Example:

def write_note_file(file_path: Path, content: str) -> None:
    """Write note content to file atomically."""
    temp_path = file_path.with_suffix(file_path.suffix + '.tmp')
    try:
        temp_path.write_text(content, encoding='utf-8')
        temp_path.replace(file_path)
    except Exception:
        temp_path.unlink(missing_ok=True)
        raise

Why: Side effects are isolated, error handling is explicit, cleanup is guaranteed.

When to Use: File operations, database operations, network calls.

3. Validation Functions

Pattern: Functions that check validity and return boolean or raise exception.

Good Example (Boolean Return):

def validate_slug(slug: str) -> bool:
    """Validate that slug meets requirements."""
    if not slug:
        return False
    if len(slug) > MAX_SLUG_LENGTH:
        return False
    return bool(SLUG_PATTERN.match(slug))

Good Example (Exception Raising):

def require_valid_slug(slug: str) -> None:
    """Require slug to be valid, raise ValueError if not."""
    if not validate_slug(slug):
        raise ValueError(
            f"Invalid slug '{slug}': must be 1-100 characters, "
            f"lowercase alphanumeric with hyphens only"
        )

When to Use: Input validation, precondition checking, security checks.

4. Generator Functions

Pattern: Functions that yield values instead of returning lists.

Good Example:

def iter_note_files(notes_dir: Path) -> Iterator[Path]:
    """Iterate over all note files in directory."""
    for year_dir in sorted(notes_dir.iterdir()):
        if not year_dir.is_dir():
            continue
        for month_dir in sorted(year_dir.iterdir()):
            if not month_dir.is_dir():
                continue
            for note_file in sorted(month_dir.glob("*.md")):
                yield note_file

Why: Memory efficient, lazy evaluation, can be interrupted.

When to Use: Processing many items, large datasets, streaming operations.

Error Handling Patterns

1. Specific Exceptions

Pattern: Raise specific exception types with descriptive messages.

Good:

def generate_slug(content: str) -> str:
    """Generate slug from content."""
    if not content or not content.strip():
        raise ValueError("Content cannot be empty or whitespace-only")

    # ... rest of function

Bad:

def generate_slug(content: str) -> str:
    if not content:
        raise Exception("Bad input")  # Too generic

2. Error Message Format

Pattern: Include context, expected behavior, and actual problem.

Good:

raise ValueError(
    f"Invalid slug '{slug}': must contain only lowercase letters, "
    f"numbers, and hyphens (pattern: {SLUG_PATTERN.pattern})"
)

raise FileNotFoundError(
    f"Note file not found: {file_path}. "
    f"Database may be out of sync with filesystem."
)

Bad:

raise ValueError("Invalid slug")
raise FileNotFoundError("File missing")

3. Exception Chaining

Pattern: Preserve original exception context when re-raising.

Good:

def read_note_file(file_path: Path) -> str:
    """Read note content from file."""
    try:
        return file_path.read_text(encoding='utf-8')
    except UnicodeDecodeError as e:
        raise ValueError(
            f"Failed to read {file_path}: invalid UTF-8 encoding"
        ) from e

Why: Preserves stack trace, shows root cause.

4. Cleanup on Error

Pattern: Use try/finally or context managers to ensure cleanup.

Good:

def write_note_file(file_path: Path, content: str) -> None:
    """Write note content to file atomically."""
    temp_path = file_path.with_suffix('.tmp')
    try:
        temp_path.write_text(content, encoding='utf-8')
        temp_path.replace(file_path)
    except Exception:
        temp_path.unlink(missing_ok=True)  # Cleanup on error
        raise

Type Hint Patterns

1. Basic Types

Pattern: Use built-in types where possible.

def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
    """Generate URL-safe slug from content."""
    pass

def validate_note_path(file_path: Path, data_dir: Path) -> bool:
    """Validate file path is within data directory."""
    pass

2. Collection Types

Pattern: Specify element types for collections.

from typing import List, Dict, Set, Optional

def make_slug_unique(base_slug: str, existing_slugs: Set[str]) -> str:
    """Make slug unique by adding suffix if needed."""
    pass

def get_note_paths(notes_dir: Path) -> List[Path]:
    """Get all note file paths."""
    pass

3. Optional Types

Pattern: Use Optional[T] for nullable parameters and returns.

from typing import Optional

def find_note(slug: str) -> Optional[Path]:
    """Find note file by slug, returns None if not found."""
    pass

4. Union Types (Use Sparingly)

Pattern: Use Union when a parameter truly accepts multiple types.

from typing import Union
from pathlib import Path

def ensure_path(path: Union[str, Path]) -> Path:
    """Convert string or Path to Path object."""
    return Path(path) if isinstance(path, str) else path

Note: Prefer single types. Only use Union when necessary.

Documentation Patterns

1. Function Docstrings

Pattern: Google-style docstrings with sections.

def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
    """
    Generate URL-safe slug from note content

    Creates a slug by extracting the first few words from the content and
    normalizing them to lowercase with hyphens. If content is insufficient,
    falls back to timestamp-based slug.

    Args:
        content: The note content (markdown text)
        created_at: Optional timestamp for fallback slug (defaults to now)

    Returns:
        URL-safe slug string (lowercase, alphanumeric + hyphens only)

    Raises:
        ValueError: If content is empty or contains only whitespace

    Examples:
        >>> generate_slug("Hello World! This is my first note.")
        'hello-world-this-is-my'

        >>> generate_slug("Testing... with special chars!@#")
        'testing-with-special-chars'

    Notes:
        - This function does NOT check for uniqueness
        - Caller must verify slug doesn't exist in database
    """

Required Sections:

  • Summary (first line)
  • Description (paragraph after summary)
  • Args (if any parameters)
  • Returns (if returns value)
  • Raises (if raises exceptions)

Optional Sections:

  • Examples (highly recommended)
  • Notes (for important caveats)
  • References (for external specs)

2. Inline Comments

Pattern: Explain why, not what.

Good:

# Use atomic rename to prevent file corruption if interrupted
temp_path.replace(file_path)

# Random suffix prevents enumeration attacks
suffix = secrets.token_urlsafe(4)[:4]

Bad:

# Rename temp file to final path
temp_path.replace(file_path)

# Generate suffix
suffix = secrets.token_urlsafe(4)[:4]

3. Module Docstrings

Pattern: Describe module purpose and contents.

"""
Core utility functions for StarPunk

This module provides essential utilities for slug generation, file operations,
hashing, and date/time handling. These utilities are used throughout the
application and have no external dependencies beyond standard library and
Flask configuration.

Functions:
    generate_slug: Create URL-safe slug from content
    calculate_content_hash: Calculate SHA-256 hash
    write_note_file: Atomically write note to file
    format_rfc822: Format datetime for RSS feeds

Constants:
    MAX_SLUG_LENGTH: Maximum slug length (100)
    SLUG_PATTERN: Regex for valid slugs
"""

Testing Patterns

1. Test Function Naming

Pattern: test_{function_name}_{scenario}

def test_generate_slug_from_content():
    """Test basic slug generation from content."""
    slug = generate_slug("Hello World This Is My Note")
    assert slug == "hello-world-this-is-my"

def test_generate_slug_empty_content():
    """Test slug generation raises error on empty content."""
    with pytest.raises(ValueError):
        generate_slug("")

def test_generate_slug_special_characters():
    """Test slug generation removes special characters."""
    slug = generate_slug("Testing... with!@# special chars")
    assert slug == "testing-with-special-chars"

2. Test Organization

Pattern: Group related tests in classes.

class TestSlugGeneration:
    """Test slug generation functions"""

    def test_generate_slug_from_content(self):
        """Test basic slug generation."""
        pass

    def test_generate_slug_empty_content(self):
        """Test error on empty content."""
        pass


class TestContentHashing:
    """Test content hashing functions"""

    def test_calculate_hash_consistency(self):
        """Test hash is consistent."""
        pass

3. Fixtures for Common Setup

Pattern: Use pytest fixtures for reusable test data.

import pytest
from pathlib import Path

@pytest.fixture
def temp_note_file(tmp_path):
    """Create temporary note file for testing."""
    file_path = tmp_path / "test.md"
    file_path.write_text("# Test Note")
    return file_path

def test_read_note_file(temp_note_file):
    """Test reading note file."""
    content = read_note_file(temp_note_file)
    assert content == "# Test Note"

4. Parameterized Tests

Pattern: Test multiple cases with one test function.

@pytest.mark.parametrize("content,expected", [
    ("Hello World", "hello-world"),
    ("Testing 123", "testing-123"),
    ("Special!@# Chars", "special-chars"),
])
def test_generate_slug_variations(content, expected):
    """Test slug generation with various inputs."""
    slug = generate_slug(content)
    assert slug == expected

Security Patterns

1. Path Validation

Pattern: Always validate paths before file operations.

def safe_file_operation(file_path: Path, data_dir: Path) -> None:
    """Perform file operation with path validation."""
    # ALWAYS validate first
    if not validate_note_path(file_path, data_dir):
        raise ValueError(f"Invalid file path: {file_path}")

    # Now safe to operate
    file_path.write_text("content")

2. Use secrets for Random

Pattern: Use secrets module, not random for security-sensitive operations.

Good:

import secrets

def generate_random_suffix(length: int = 4) -> str:
    """Generate cryptographically secure random suffix."""
    chars = 'abcdefghijklmnopqrstuvwxyz0123456789'
    return ''.join(secrets.choice(chars) for _ in range(length))

Bad:

import random

def generate_random_suffix(length: int = 4) -> str:
    """Generate random suffix."""
    chars = 'abcdefghijklmnopqrstuvwxyz0123456789'
    return ''.join(random.choice(chars) for _ in range(length))  # NOT secure

3. Input Sanitization

Pattern: Validate and sanitize all external input.

def generate_slug(content: str) -> str:
    """Generate slug from content."""
    # Validate input
    if not content or not content.strip():
        raise ValueError("Content cannot be empty")

    # Sanitize by removing dangerous characters
    normalized = normalize_slug_text(content)

    # Additional validation
    if not validate_slug(normalized):
        # Fallback to safe default
        normalized = datetime.utcnow().strftime("%Y%m%d-%H%M%S")

    return normalized

Performance Patterns

1. Lazy Evaluation

Pattern: Don't compute what you don't need.

Good:

def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
    """Generate slug from content."""
    slug = normalize_slug_text(content)

    # Only generate timestamp if needed
    if not slug:
        created_at = created_at or datetime.utcnow()
        slug = created_at.strftime("%Y%m%d-%H%M%S")

    return slug

2. Compile Regex Once

Pattern: Define regex patterns as module constants.

Good:

# Module level - compiled once
SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$')
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')

def validate_slug(slug: str) -> bool:
    """Validate slug."""
    return bool(SLUG_PATTERN.match(slug))

def normalize_slug_text(text: str) -> str:
    """Normalize text for slug."""
    return SAFE_SLUG_PATTERN.sub('', text)

Bad:

def validate_slug(slug: str) -> bool:
    """Validate slug."""
    # Compiles regex every call - inefficient
    return bool(re.match(r'^[a-z0-9]+(?:-[a-z0-9]+)*$', slug))

3. Avoid Premature Optimization

Pattern: Write clear code first, optimize if needed.

Good:

def extract_first_words(text: str, max_words: int = 5) -> str:
    """Extract first N words from text."""
    words = text.split()
    return ' '.join(words[:max_words])

Don't Do This Unless Profiling Shows It's Necessary:

def extract_first_words(text: str, max_words: int = 5) -> str:
    """Extract first N words from text."""
    # Premature optimization - more complex, minimal gain
    words = []
    count = 0
    for word in text.split():
        words.append(word)
        count += 1
        if count >= max_words:
            break
    return ' '.join(words)

Constants Pattern

1. Module-Level Constants

Pattern: Define configuration as constants at module level.

# Slug configuration
MAX_SLUG_LENGTH = 100
MIN_SLUG_LENGTH = 1
SLUG_WORDS_COUNT = 5
RANDOM_SUFFIX_LENGTH = 4

# File operations
TEMP_FILE_SUFFIX = '.tmp'
TRASH_DIR_NAME = '.trash'

# Hashing
CONTENT_HASH_ALGORITHM = 'sha256'

Why: Easy to modify, clear intent, compile-time resolution.

2. Naming Convention

Pattern: ALL_CAPS_WITH_UNDERSCORES

MAX_NOTE_LENGTH = 10000        # Good
DEFAULT_ENCODING = 'utf-8'     # Good
maxNoteLength = 10000          # Bad
max_note_length = 10000        # Bad (looks like variable)

3. Magic Numbers

Pattern: Replace magic numbers with named constants.

Good:

SLUG_WORDS_COUNT = 5

def generate_slug(content: str) -> str:
    """Generate slug from content."""
    words = content.split()[:SLUG_WORDS_COUNT]
    return normalize_slug_text(' '.join(words))

Bad:

def generate_slug(content: str) -> str:
    """Generate slug from content."""
    words = content.split()[:5]  # What is 5? Why 5?
    return normalize_slug_text(' '.join(words))

Anti-Patterns to Avoid

1. God Functions

Bad:

def process_note(content, do_hash=True, do_slug=True, do_path=True, ...):
    """Do everything with a note."""
    # 500 lines of code doing too many things
    pass

Good:

def generate_slug(content: str) -> str:
    """Generate slug."""
    pass

def calculate_hash(content: str) -> str:
    """Calculate hash."""
    pass

def generate_path(slug: str, timestamp: datetime) -> Path:
    """Generate path."""
    pass

2. Mutable Default Arguments

Bad:

def create_note(content: str, tags: list = []) -> Note:
    tags.append('note')  # Modifies shared default list!
    return Note(content, tags)

Good:

def create_note(content: str, tags: Optional[List[str]] = None) -> Note:
    if tags is None:
        tags = []
    tags.append('note')
    return Note(content, tags)

3. Returning Multiple Types

Bad:

def find_note(slug: str) -> Union[Note, bool, None]:
    """Find note by slug."""
    if slug_invalid:
        return False
    note = db.query(slug)
    return note if note else None

Good:

def find_note(slug: str) -> Optional[Note]:
    """Find note by slug, returns None if not found."""
    if not validate_slug(slug):
        raise ValueError(f"Invalid slug: {slug}")
    return db.query(slug)  # Returns Note or None

4. Silent Failures

Bad:

def generate_slug(content: str) -> str:
    """Generate slug."""
    try:
        slug = normalize_slug_text(content)
        return slug if slug else "untitled"  # Silent fallback
    except Exception:
        return "untitled"  # Swallowing errors

Good:

def generate_slug(content: str) -> str:
    """Generate slug."""
    if not content or not content.strip():
        raise ValueError("Content cannot be empty")

    slug = normalize_slug_text(content)
    if not slug:
        # Explicit fallback with timestamp
        slug = datetime.utcnow().strftime("%Y%m%d-%H%M%S")

    return slug

Summary

Key Principles:

  1. Pure functions are preferred
  2. Specific exceptions with clear messages
  3. Type hints on all functions
  4. Comprehensive docstrings
  5. Security-first validation
  6. Test everything thoroughly
  7. Constants for configuration
  8. Clear over clever

Remember: Utility functions are the foundation. Make them rock-solid.

References