18 KiB
Utility Function Patterns
Purpose
This document establishes coding patterns and conventions specifically for utility functions in StarPunk. These patterns ensure utilities are consistent, testable, and maintainable.
Philosophy
Utility functions should be:
- Pure: No side effects where possible
- Focused: One responsibility per function
- Predictable: Same input always produces same output
- Testable: Easy to test in isolation
- Documented: Clear purpose and usage
Function Design Patterns
1. Pure Functions (Preferred)
Pattern: Functions that don't modify state or have side effects.
Good Example:
def calculate_content_hash(content: str) -> str:
"""Calculate SHA-256 hash of content."""
return hashlib.sha256(content.encode('utf-8')).hexdigest()
Why: Easy to test, no hidden dependencies, predictable behavior.
When to Use: Calculations, transformations, validations.
2. Functions with I/O Side Effects
Pattern: Functions that read/write files or interact with external systems.
Good Example:
def write_note_file(file_path: Path, content: str) -> None:
"""Write note content to file atomically."""
temp_path = file_path.with_suffix(file_path.suffix + '.tmp')
try:
temp_path.write_text(content, encoding='utf-8')
temp_path.replace(file_path)
except Exception:
temp_path.unlink(missing_ok=True)
raise
Why: Side effects are isolated, error handling is explicit, cleanup is guaranteed.
When to Use: File operations, database operations, network calls.
3. Validation Functions
Pattern: Functions that check validity and return boolean or raise exception.
Good Example (Boolean Return):
def validate_slug(slug: str) -> bool:
"""Validate that slug meets requirements."""
if not slug:
return False
if len(slug) > MAX_SLUG_LENGTH:
return False
return bool(SLUG_PATTERN.match(slug))
Good Example (Exception Raising):
def require_valid_slug(slug: str) -> None:
"""Require slug to be valid, raise ValueError if not."""
if not validate_slug(slug):
raise ValueError(
f"Invalid slug '{slug}': must be 1-100 characters, "
f"lowercase alphanumeric with hyphens only"
)
When to Use: Input validation, precondition checking, security checks.
4. Generator Functions
Pattern: Functions that yield values instead of returning lists.
Good Example:
def iter_note_files(notes_dir: Path) -> Iterator[Path]:
"""Iterate over all note files in directory."""
for year_dir in sorted(notes_dir.iterdir()):
if not year_dir.is_dir():
continue
for month_dir in sorted(year_dir.iterdir()):
if not month_dir.is_dir():
continue
for note_file in sorted(month_dir.glob("*.md")):
yield note_file
Why: Memory efficient, lazy evaluation, can be interrupted.
When to Use: Processing many items, large datasets, streaming operations.
Error Handling Patterns
1. Specific Exceptions
Pattern: Raise specific exception types with descriptive messages.
Good:
def generate_slug(content: str) -> str:
"""Generate slug from content."""
if not content or not content.strip():
raise ValueError("Content cannot be empty or whitespace-only")
# ... rest of function
Bad:
def generate_slug(content: str) -> str:
if not content:
raise Exception("Bad input") # Too generic
2. Error Message Format
Pattern: Include context, expected behavior, and actual problem.
Good:
raise ValueError(
f"Invalid slug '{slug}': must contain only lowercase letters, "
f"numbers, and hyphens (pattern: {SLUG_PATTERN.pattern})"
)
raise FileNotFoundError(
f"Note file not found: {file_path}. "
f"Database may be out of sync with filesystem."
)
Bad:
raise ValueError("Invalid slug")
raise FileNotFoundError("File missing")
3. Exception Chaining
Pattern: Preserve original exception context when re-raising.
Good:
def read_note_file(file_path: Path) -> str:
"""Read note content from file."""
try:
return file_path.read_text(encoding='utf-8')
except UnicodeDecodeError as e:
raise ValueError(
f"Failed to read {file_path}: invalid UTF-8 encoding"
) from e
Why: Preserves stack trace, shows root cause.
4. Cleanup on Error
Pattern: Use try/finally or context managers to ensure cleanup.
Good:
def write_note_file(file_path: Path, content: str) -> None:
"""Write note content to file atomically."""
temp_path = file_path.with_suffix('.tmp')
try:
temp_path.write_text(content, encoding='utf-8')
temp_path.replace(file_path)
except Exception:
temp_path.unlink(missing_ok=True) # Cleanup on error
raise
Type Hint Patterns
1. Basic Types
Pattern: Use built-in types where possible.
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
"""Generate URL-safe slug from content."""
pass
def validate_note_path(file_path: Path, data_dir: Path) -> bool:
"""Validate file path is within data directory."""
pass
2. Collection Types
Pattern: Specify element types for collections.
from typing import List, Dict, Set, Optional
def make_slug_unique(base_slug: str, existing_slugs: Set[str]) -> str:
"""Make slug unique by adding suffix if needed."""
pass
def get_note_paths(notes_dir: Path) -> List[Path]:
"""Get all note file paths."""
pass
3. Optional Types
Pattern: Use Optional[T] for nullable parameters and returns.
from typing import Optional
def find_note(slug: str) -> Optional[Path]:
"""Find note file by slug, returns None if not found."""
pass
4. Union Types (Use Sparingly)
Pattern: Use Union when a parameter truly accepts multiple types.
from typing import Union
from pathlib import Path
def ensure_path(path: Union[str, Path]) -> Path:
"""Convert string or Path to Path object."""
return Path(path) if isinstance(path, str) else path
Note: Prefer single types. Only use Union when necessary.
Documentation Patterns
1. Function Docstrings
Pattern: Google-style docstrings with sections.
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
"""
Generate URL-safe slug from note content
Creates a slug by extracting the first few words from the content and
normalizing them to lowercase with hyphens. If content is insufficient,
falls back to timestamp-based slug.
Args:
content: The note content (markdown text)
created_at: Optional timestamp for fallback slug (defaults to now)
Returns:
URL-safe slug string (lowercase, alphanumeric + hyphens only)
Raises:
ValueError: If content is empty or contains only whitespace
Examples:
>>> generate_slug("Hello World! This is my first note.")
'hello-world-this-is-my'
>>> generate_slug("Testing... with special chars!@#")
'testing-with-special-chars'
Notes:
- This function does NOT check for uniqueness
- Caller must verify slug doesn't exist in database
"""
Required Sections:
- Summary (first line)
- Description (paragraph after summary)
- Args (if any parameters)
- Returns (if returns value)
- Raises (if raises exceptions)
Optional Sections:
- Examples (highly recommended)
- Notes (for important caveats)
- References (for external specs)
2. Inline Comments
Pattern: Explain why, not what.
Good:
# Use atomic rename to prevent file corruption if interrupted
temp_path.replace(file_path)
# Random suffix prevents enumeration attacks
suffix = secrets.token_urlsafe(4)[:4]
Bad:
# Rename temp file to final path
temp_path.replace(file_path)
# Generate suffix
suffix = secrets.token_urlsafe(4)[:4]
3. Module Docstrings
Pattern: Describe module purpose and contents.
"""
Core utility functions for StarPunk
This module provides essential utilities for slug generation, file operations,
hashing, and date/time handling. These utilities are used throughout the
application and have no external dependencies beyond standard library and
Flask configuration.
Functions:
generate_slug: Create URL-safe slug from content
calculate_content_hash: Calculate SHA-256 hash
write_note_file: Atomically write note to file
format_rfc822: Format datetime for RSS feeds
Constants:
MAX_SLUG_LENGTH: Maximum slug length (100)
SLUG_PATTERN: Regex for valid slugs
"""
Testing Patterns
1. Test Function Naming
Pattern: test_{function_name}_{scenario}
def test_generate_slug_from_content():
"""Test basic slug generation from content."""
slug = generate_slug("Hello World This Is My Note")
assert slug == "hello-world-this-is-my"
def test_generate_slug_empty_content():
"""Test slug generation raises error on empty content."""
with pytest.raises(ValueError):
generate_slug("")
def test_generate_slug_special_characters():
"""Test slug generation removes special characters."""
slug = generate_slug("Testing... with!@# special chars")
assert slug == "testing-with-special-chars"
2. Test Organization
Pattern: Group related tests in classes.
class TestSlugGeneration:
"""Test slug generation functions"""
def test_generate_slug_from_content(self):
"""Test basic slug generation."""
pass
def test_generate_slug_empty_content(self):
"""Test error on empty content."""
pass
class TestContentHashing:
"""Test content hashing functions"""
def test_calculate_hash_consistency(self):
"""Test hash is consistent."""
pass
3. Fixtures for Common Setup
Pattern: Use pytest fixtures for reusable test data.
import pytest
from pathlib import Path
@pytest.fixture
def temp_note_file(tmp_path):
"""Create temporary note file for testing."""
file_path = tmp_path / "test.md"
file_path.write_text("# Test Note")
return file_path
def test_read_note_file(temp_note_file):
"""Test reading note file."""
content = read_note_file(temp_note_file)
assert content == "# Test Note"
4. Parameterized Tests
Pattern: Test multiple cases with one test function.
@pytest.mark.parametrize("content,expected", [
("Hello World", "hello-world"),
("Testing 123", "testing-123"),
("Special!@# Chars", "special-chars"),
])
def test_generate_slug_variations(content, expected):
"""Test slug generation with various inputs."""
slug = generate_slug(content)
assert slug == expected
Security Patterns
1. Path Validation
Pattern: Always validate paths before file operations.
def safe_file_operation(file_path: Path, data_dir: Path) -> None:
"""Perform file operation with path validation."""
# ALWAYS validate first
if not validate_note_path(file_path, data_dir):
raise ValueError(f"Invalid file path: {file_path}")
# Now safe to operate
file_path.write_text("content")
2. Use secrets for Random
Pattern: Use secrets module, not random for security-sensitive operations.
Good:
import secrets
def generate_random_suffix(length: int = 4) -> str:
"""Generate cryptographically secure random suffix."""
chars = 'abcdefghijklmnopqrstuvwxyz0123456789'
return ''.join(secrets.choice(chars) for _ in range(length))
Bad:
import random
def generate_random_suffix(length: int = 4) -> str:
"""Generate random suffix."""
chars = 'abcdefghijklmnopqrstuvwxyz0123456789'
return ''.join(random.choice(chars) for _ in range(length)) # NOT secure
3. Input Sanitization
Pattern: Validate and sanitize all external input.
def generate_slug(content: str) -> str:
"""Generate slug from content."""
# Validate input
if not content or not content.strip():
raise ValueError("Content cannot be empty")
# Sanitize by removing dangerous characters
normalized = normalize_slug_text(content)
# Additional validation
if not validate_slug(normalized):
# Fallback to safe default
normalized = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
return normalized
Performance Patterns
1. Lazy Evaluation
Pattern: Don't compute what you don't need.
Good:
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
"""Generate slug from content."""
slug = normalize_slug_text(content)
# Only generate timestamp if needed
if not slug:
created_at = created_at or datetime.utcnow()
slug = created_at.strftime("%Y%m%d-%H%M%S")
return slug
2. Compile Regex Once
Pattern: Define regex patterns as module constants.
Good:
# Module level - compiled once
SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$')
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')
def validate_slug(slug: str) -> bool:
"""Validate slug."""
return bool(SLUG_PATTERN.match(slug))
def normalize_slug_text(text: str) -> str:
"""Normalize text for slug."""
return SAFE_SLUG_PATTERN.sub('', text)
Bad:
def validate_slug(slug: str) -> bool:
"""Validate slug."""
# Compiles regex every call - inefficient
return bool(re.match(r'^[a-z0-9]+(?:-[a-z0-9]+)*$', slug))
3. Avoid Premature Optimization
Pattern: Write clear code first, optimize if needed.
Good:
def extract_first_words(text: str, max_words: int = 5) -> str:
"""Extract first N words from text."""
words = text.split()
return ' '.join(words[:max_words])
Don't Do This Unless Profiling Shows It's Necessary:
def extract_first_words(text: str, max_words: int = 5) -> str:
"""Extract first N words from text."""
# Premature optimization - more complex, minimal gain
words = []
count = 0
for word in text.split():
words.append(word)
count += 1
if count >= max_words:
break
return ' '.join(words)
Constants Pattern
1. Module-Level Constants
Pattern: Define configuration as constants at module level.
# Slug configuration
MAX_SLUG_LENGTH = 100
MIN_SLUG_LENGTH = 1
SLUG_WORDS_COUNT = 5
RANDOM_SUFFIX_LENGTH = 4
# File operations
TEMP_FILE_SUFFIX = '.tmp'
TRASH_DIR_NAME = '.trash'
# Hashing
CONTENT_HASH_ALGORITHM = 'sha256'
Why: Easy to modify, clear intent, compile-time resolution.
2. Naming Convention
Pattern: ALL_CAPS_WITH_UNDERSCORES
MAX_NOTE_LENGTH = 10000 # Good
DEFAULT_ENCODING = 'utf-8' # Good
maxNoteLength = 10000 # Bad
max_note_length = 10000 # Bad (looks like variable)
3. Magic Numbers
Pattern: Replace magic numbers with named constants.
Good:
SLUG_WORDS_COUNT = 5
def generate_slug(content: str) -> str:
"""Generate slug from content."""
words = content.split()[:SLUG_WORDS_COUNT]
return normalize_slug_text(' '.join(words))
Bad:
def generate_slug(content: str) -> str:
"""Generate slug from content."""
words = content.split()[:5] # What is 5? Why 5?
return normalize_slug_text(' '.join(words))
Anti-Patterns to Avoid
1. God Functions
Bad:
def process_note(content, do_hash=True, do_slug=True, do_path=True, ...):
"""Do everything with a note."""
# 500 lines of code doing too many things
pass
Good:
def generate_slug(content: str) -> str:
"""Generate slug."""
pass
def calculate_hash(content: str) -> str:
"""Calculate hash."""
pass
def generate_path(slug: str, timestamp: datetime) -> Path:
"""Generate path."""
pass
2. Mutable Default Arguments
Bad:
def create_note(content: str, tags: list = []) -> Note:
tags.append('note') # Modifies shared default list!
return Note(content, tags)
Good:
def create_note(content: str, tags: Optional[List[str]] = None) -> Note:
if tags is None:
tags = []
tags.append('note')
return Note(content, tags)
3. Returning Multiple Types
Bad:
def find_note(slug: str) -> Union[Note, bool, None]:
"""Find note by slug."""
if slug_invalid:
return False
note = db.query(slug)
return note if note else None
Good:
def find_note(slug: str) -> Optional[Note]:
"""Find note by slug, returns None if not found."""
if not validate_slug(slug):
raise ValueError(f"Invalid slug: {slug}")
return db.query(slug) # Returns Note or None
4. Silent Failures
Bad:
def generate_slug(content: str) -> str:
"""Generate slug."""
try:
slug = normalize_slug_text(content)
return slug if slug else "untitled" # Silent fallback
except Exception:
return "untitled" # Swallowing errors
Good:
def generate_slug(content: str) -> str:
"""Generate slug."""
if not content or not content.strip():
raise ValueError("Content cannot be empty")
slug = normalize_slug_text(content)
if not slug:
# Explicit fallback with timestamp
slug = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
return slug
Summary
Key Principles:
- Pure functions are preferred
- Specific exceptions with clear messages
- Type hints on all functions
- Comprehensive docstrings
- Security-first validation
- Test everything thoroughly
- Constants for configuration
- Clear over clever
Remember: Utility functions are the foundation. Make them rock-solid.