735 lines
18 KiB
Markdown
735 lines
18 KiB
Markdown
# Utility Function Patterns
|
|
|
|
## Purpose
|
|
|
|
This document establishes coding patterns and conventions specifically for utility functions in StarPunk. These patterns ensure utilities are consistent, testable, and maintainable.
|
|
|
|
## Philosophy
|
|
|
|
Utility functions should be:
|
|
- **Pure**: No side effects where possible
|
|
- **Focused**: One responsibility per function
|
|
- **Predictable**: Same input always produces same output
|
|
- **Testable**: Easy to test in isolation
|
|
- **Documented**: Clear purpose and usage
|
|
|
|
## Function Design Patterns
|
|
|
|
### 1. Pure Functions (Preferred)
|
|
|
|
**Pattern**: Functions that don't modify state or have side effects.
|
|
|
|
**Good Example**:
|
|
```python
|
|
def calculate_content_hash(content: str) -> str:
|
|
"""Calculate SHA-256 hash of content."""
|
|
return hashlib.sha256(content.encode('utf-8')).hexdigest()
|
|
```
|
|
|
|
**Why**: Easy to test, no hidden dependencies, predictable behavior.
|
|
|
|
**When to Use**: Calculations, transformations, validations.
|
|
|
|
### 2. Functions with I/O Side Effects
|
|
|
|
**Pattern**: Functions that read/write files or interact with external systems.
|
|
|
|
**Good Example**:
|
|
```python
|
|
def write_note_file(file_path: Path, content: str) -> None:
|
|
"""Write note content to file atomically."""
|
|
temp_path = file_path.with_suffix(file_path.suffix + '.tmp')
|
|
try:
|
|
temp_path.write_text(content, encoding='utf-8')
|
|
temp_path.replace(file_path)
|
|
except Exception:
|
|
temp_path.unlink(missing_ok=True)
|
|
raise
|
|
```
|
|
|
|
**Why**: Side effects are isolated, error handling is explicit, cleanup is guaranteed.
|
|
|
|
**When to Use**: File operations, database operations, network calls.
|
|
|
|
### 3. Validation Functions
|
|
|
|
**Pattern**: Functions that check validity and return boolean or raise exception.
|
|
|
|
**Good Example** (Boolean Return):
|
|
```python
|
|
def validate_slug(slug: str) -> bool:
|
|
"""Validate that slug meets requirements."""
|
|
if not slug:
|
|
return False
|
|
if len(slug) > MAX_SLUG_LENGTH:
|
|
return False
|
|
return bool(SLUG_PATTERN.match(slug))
|
|
```
|
|
|
|
**Good Example** (Exception Raising):
|
|
```python
|
|
def require_valid_slug(slug: str) -> None:
|
|
"""Require slug to be valid, raise ValueError if not."""
|
|
if not validate_slug(slug):
|
|
raise ValueError(
|
|
f"Invalid slug '{slug}': must be 1-100 characters, "
|
|
f"lowercase alphanumeric with hyphens only"
|
|
)
|
|
```
|
|
|
|
**When to Use**: Input validation, precondition checking, security checks.
|
|
|
|
### 4. Generator Functions
|
|
|
|
**Pattern**: Functions that yield values instead of returning lists.
|
|
|
|
**Good Example**:
|
|
```python
|
|
def iter_note_files(notes_dir: Path) -> Iterator[Path]:
|
|
"""Iterate over all note files in directory."""
|
|
for year_dir in sorted(notes_dir.iterdir()):
|
|
if not year_dir.is_dir():
|
|
continue
|
|
for month_dir in sorted(year_dir.iterdir()):
|
|
if not month_dir.is_dir():
|
|
continue
|
|
for note_file in sorted(month_dir.glob("*.md")):
|
|
yield note_file
|
|
```
|
|
|
|
**Why**: Memory efficient, lazy evaluation, can be interrupted.
|
|
|
|
**When to Use**: Processing many items, large datasets, streaming operations.
|
|
|
|
## Error Handling Patterns
|
|
|
|
### 1. Specific Exceptions
|
|
|
|
**Pattern**: Raise specific exception types with descriptive messages.
|
|
|
|
**Good**:
|
|
```python
|
|
def generate_slug(content: str) -> str:
|
|
"""Generate slug from content."""
|
|
if not content or not content.strip():
|
|
raise ValueError("Content cannot be empty or whitespace-only")
|
|
|
|
# ... rest of function
|
|
```
|
|
|
|
**Bad**:
|
|
```python
|
|
def generate_slug(content: str) -> str:
|
|
if not content:
|
|
raise Exception("Bad input") # Too generic
|
|
```
|
|
|
|
### 2. Error Message Format
|
|
|
|
**Pattern**: Include context, expected behavior, and actual problem.
|
|
|
|
**Good**:
|
|
```python
|
|
raise ValueError(
|
|
f"Invalid slug '{slug}': must contain only lowercase letters, "
|
|
f"numbers, and hyphens (pattern: {SLUG_PATTERN.pattern})"
|
|
)
|
|
|
|
raise FileNotFoundError(
|
|
f"Note file not found: {file_path}. "
|
|
f"Database may be out of sync with filesystem."
|
|
)
|
|
```
|
|
|
|
**Bad**:
|
|
```python
|
|
raise ValueError("Invalid slug")
|
|
raise FileNotFoundError("File missing")
|
|
```
|
|
|
|
### 3. Exception Chaining
|
|
|
|
**Pattern**: Preserve original exception context when re-raising.
|
|
|
|
**Good**:
|
|
```python
|
|
def read_note_file(file_path: Path) -> str:
|
|
"""Read note content from file."""
|
|
try:
|
|
return file_path.read_text(encoding='utf-8')
|
|
except UnicodeDecodeError as e:
|
|
raise ValueError(
|
|
f"Failed to read {file_path}: invalid UTF-8 encoding"
|
|
) from e
|
|
```
|
|
|
|
**Why**: Preserves stack trace, shows root cause.
|
|
|
|
### 4. Cleanup on Error
|
|
|
|
**Pattern**: Use try/finally or context managers to ensure cleanup.
|
|
|
|
**Good**:
|
|
```python
|
|
def write_note_file(file_path: Path, content: str) -> None:
|
|
"""Write note content to file atomically."""
|
|
temp_path = file_path.with_suffix('.tmp')
|
|
try:
|
|
temp_path.write_text(content, encoding='utf-8')
|
|
temp_path.replace(file_path)
|
|
except Exception:
|
|
temp_path.unlink(missing_ok=True) # Cleanup on error
|
|
raise
|
|
```
|
|
|
|
## Type Hint Patterns
|
|
|
|
### 1. Basic Types
|
|
|
|
**Pattern**: Use built-in types where possible.
|
|
|
|
```python
|
|
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
|
|
"""Generate URL-safe slug from content."""
|
|
pass
|
|
|
|
def validate_note_path(file_path: Path, data_dir: Path) -> bool:
|
|
"""Validate file path is within data directory."""
|
|
pass
|
|
```
|
|
|
|
### 2. Collection Types
|
|
|
|
**Pattern**: Specify element types for collections.
|
|
|
|
```python
|
|
from typing import List, Dict, Set, Optional
|
|
|
|
def make_slug_unique(base_slug: str, existing_slugs: Set[str]) -> str:
|
|
"""Make slug unique by adding suffix if needed."""
|
|
pass
|
|
|
|
def get_note_paths(notes_dir: Path) -> List[Path]:
|
|
"""Get all note file paths."""
|
|
pass
|
|
```
|
|
|
|
### 3. Optional Types
|
|
|
|
**Pattern**: Use Optional[T] for nullable parameters and returns.
|
|
|
|
```python
|
|
from typing import Optional
|
|
|
|
def find_note(slug: str) -> Optional[Path]:
|
|
"""Find note file by slug, returns None if not found."""
|
|
pass
|
|
```
|
|
|
|
### 4. Union Types (Use Sparingly)
|
|
|
|
**Pattern**: Use Union when a parameter truly accepts multiple types.
|
|
|
|
```python
|
|
from typing import Union
|
|
from pathlib import Path
|
|
|
|
def ensure_path(path: Union[str, Path]) -> Path:
|
|
"""Convert string or Path to Path object."""
|
|
return Path(path) if isinstance(path, str) else path
|
|
```
|
|
|
|
**Note**: Prefer single types. Only use Union when necessary.
|
|
|
|
## Documentation Patterns
|
|
|
|
### 1. Function Docstrings
|
|
|
|
**Pattern**: Google-style docstrings with sections.
|
|
|
|
```python
|
|
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
|
|
"""
|
|
Generate URL-safe slug from note content
|
|
|
|
Creates a slug by extracting the first few words from the content and
|
|
normalizing them to lowercase with hyphens. If content is insufficient,
|
|
falls back to timestamp-based slug.
|
|
|
|
Args:
|
|
content: The note content (markdown text)
|
|
created_at: Optional timestamp for fallback slug (defaults to now)
|
|
|
|
Returns:
|
|
URL-safe slug string (lowercase, alphanumeric + hyphens only)
|
|
|
|
Raises:
|
|
ValueError: If content is empty or contains only whitespace
|
|
|
|
Examples:
|
|
>>> generate_slug("Hello World! This is my first note.")
|
|
'hello-world-this-is-my'
|
|
|
|
>>> generate_slug("Testing... with special chars!@#")
|
|
'testing-with-special-chars'
|
|
|
|
Notes:
|
|
- This function does NOT check for uniqueness
|
|
- Caller must verify slug doesn't exist in database
|
|
"""
|
|
```
|
|
|
|
**Required Sections**:
|
|
- Summary (first line)
|
|
- Description (paragraph after summary)
|
|
- Args (if any parameters)
|
|
- Returns (if returns value)
|
|
- Raises (if raises exceptions)
|
|
|
|
**Optional Sections**:
|
|
- Examples (highly recommended)
|
|
- Notes (for important caveats)
|
|
- References (for external specs)
|
|
|
|
### 2. Inline Comments
|
|
|
|
**Pattern**: Explain why, not what.
|
|
|
|
**Good**:
|
|
```python
|
|
# Use atomic rename to prevent file corruption if interrupted
|
|
temp_path.replace(file_path)
|
|
|
|
# Random suffix prevents enumeration attacks
|
|
suffix = secrets.token_urlsafe(4)[:4]
|
|
```
|
|
|
|
**Bad**:
|
|
```python
|
|
# Rename temp file to final path
|
|
temp_path.replace(file_path)
|
|
|
|
# Generate suffix
|
|
suffix = secrets.token_urlsafe(4)[:4]
|
|
```
|
|
|
|
### 3. Module Docstrings
|
|
|
|
**Pattern**: Describe module purpose and contents.
|
|
|
|
```python
|
|
"""
|
|
Core utility functions for StarPunk
|
|
|
|
This module provides essential utilities for slug generation, file operations,
|
|
hashing, and date/time handling. These utilities are used throughout the
|
|
application and have no external dependencies beyond standard library and
|
|
Flask configuration.
|
|
|
|
Functions:
|
|
generate_slug: Create URL-safe slug from content
|
|
calculate_content_hash: Calculate SHA-256 hash
|
|
write_note_file: Atomically write note to file
|
|
format_rfc822: Format datetime for RSS feeds
|
|
|
|
Constants:
|
|
MAX_SLUG_LENGTH: Maximum slug length (100)
|
|
SLUG_PATTERN: Regex for valid slugs
|
|
"""
|
|
```
|
|
|
|
## Testing Patterns
|
|
|
|
### 1. Test Function Naming
|
|
|
|
**Pattern**: `test_{function_name}_{scenario}`
|
|
|
|
```python
|
|
def test_generate_slug_from_content():
|
|
"""Test basic slug generation from content."""
|
|
slug = generate_slug("Hello World This Is My Note")
|
|
assert slug == "hello-world-this-is-my"
|
|
|
|
def test_generate_slug_empty_content():
|
|
"""Test slug generation raises error on empty content."""
|
|
with pytest.raises(ValueError):
|
|
generate_slug("")
|
|
|
|
def test_generate_slug_special_characters():
|
|
"""Test slug generation removes special characters."""
|
|
slug = generate_slug("Testing... with!@# special chars")
|
|
assert slug == "testing-with-special-chars"
|
|
```
|
|
|
|
### 2. Test Organization
|
|
|
|
**Pattern**: Group related tests in classes.
|
|
|
|
```python
|
|
class TestSlugGeneration:
|
|
"""Test slug generation functions"""
|
|
|
|
def test_generate_slug_from_content(self):
|
|
"""Test basic slug generation."""
|
|
pass
|
|
|
|
def test_generate_slug_empty_content(self):
|
|
"""Test error on empty content."""
|
|
pass
|
|
|
|
|
|
class TestContentHashing:
|
|
"""Test content hashing functions"""
|
|
|
|
def test_calculate_hash_consistency(self):
|
|
"""Test hash is consistent."""
|
|
pass
|
|
```
|
|
|
|
### 3. Fixtures for Common Setup
|
|
|
|
**Pattern**: Use pytest fixtures for reusable test data.
|
|
|
|
```python
|
|
import pytest
|
|
from pathlib import Path
|
|
|
|
@pytest.fixture
|
|
def temp_note_file(tmp_path):
|
|
"""Create temporary note file for testing."""
|
|
file_path = tmp_path / "test.md"
|
|
file_path.write_text("# Test Note")
|
|
return file_path
|
|
|
|
def test_read_note_file(temp_note_file):
|
|
"""Test reading note file."""
|
|
content = read_note_file(temp_note_file)
|
|
assert content == "# Test Note"
|
|
```
|
|
|
|
### 4. Parameterized Tests
|
|
|
|
**Pattern**: Test multiple cases with one test function.
|
|
|
|
```python
|
|
@pytest.mark.parametrize("content,expected", [
|
|
("Hello World", "hello-world"),
|
|
("Testing 123", "testing-123"),
|
|
("Special!@# Chars", "special-chars"),
|
|
])
|
|
def test_generate_slug_variations(content, expected):
|
|
"""Test slug generation with various inputs."""
|
|
slug = generate_slug(content)
|
|
assert slug == expected
|
|
```
|
|
|
|
## Security Patterns
|
|
|
|
### 1. Path Validation
|
|
|
|
**Pattern**: Always validate paths before file operations.
|
|
|
|
```python
|
|
def safe_file_operation(file_path: Path, data_dir: Path) -> None:
|
|
"""Perform file operation with path validation."""
|
|
# ALWAYS validate first
|
|
if not validate_note_path(file_path, data_dir):
|
|
raise ValueError(f"Invalid file path: {file_path}")
|
|
|
|
# Now safe to operate
|
|
file_path.write_text("content")
|
|
```
|
|
|
|
### 2. Use secrets for Random
|
|
|
|
**Pattern**: Use `secrets` module, not `random` for security-sensitive operations.
|
|
|
|
**Good**:
|
|
```python
|
|
import secrets
|
|
|
|
def generate_random_suffix(length: int = 4) -> str:
|
|
"""Generate cryptographically secure random suffix."""
|
|
chars = 'abcdefghijklmnopqrstuvwxyz0123456789'
|
|
return ''.join(secrets.choice(chars) for _ in range(length))
|
|
```
|
|
|
|
**Bad**:
|
|
```python
|
|
import random
|
|
|
|
def generate_random_suffix(length: int = 4) -> str:
|
|
"""Generate random suffix."""
|
|
chars = 'abcdefghijklmnopqrstuvwxyz0123456789'
|
|
return ''.join(random.choice(chars) for _ in range(length)) # NOT secure
|
|
```
|
|
|
|
### 3. Input Sanitization
|
|
|
|
**Pattern**: Validate and sanitize all external input.
|
|
|
|
```python
|
|
def generate_slug(content: str) -> str:
|
|
"""Generate slug from content."""
|
|
# Validate input
|
|
if not content or not content.strip():
|
|
raise ValueError("Content cannot be empty")
|
|
|
|
# Sanitize by removing dangerous characters
|
|
normalized = normalize_slug_text(content)
|
|
|
|
# Additional validation
|
|
if not validate_slug(normalized):
|
|
# Fallback to safe default
|
|
normalized = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
|
|
|
|
return normalized
|
|
```
|
|
|
|
## Performance Patterns
|
|
|
|
### 1. Lazy Evaluation
|
|
|
|
**Pattern**: Don't compute what you don't need.
|
|
|
|
**Good**:
|
|
```python
|
|
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
|
|
"""Generate slug from content."""
|
|
slug = normalize_slug_text(content)
|
|
|
|
# Only generate timestamp if needed
|
|
if not slug:
|
|
created_at = created_at or datetime.utcnow()
|
|
slug = created_at.strftime("%Y%m%d-%H%M%S")
|
|
|
|
return slug
|
|
```
|
|
|
|
### 2. Compile Regex Once
|
|
|
|
**Pattern**: Define regex patterns as module constants.
|
|
|
|
**Good**:
|
|
```python
|
|
# Module level - compiled once
|
|
SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$')
|
|
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')
|
|
|
|
def validate_slug(slug: str) -> bool:
|
|
"""Validate slug."""
|
|
return bool(SLUG_PATTERN.match(slug))
|
|
|
|
def normalize_slug_text(text: str) -> str:
|
|
"""Normalize text for slug."""
|
|
return SAFE_SLUG_PATTERN.sub('', text)
|
|
```
|
|
|
|
**Bad**:
|
|
```python
|
|
def validate_slug(slug: str) -> bool:
|
|
"""Validate slug."""
|
|
# Compiles regex every call - inefficient
|
|
return bool(re.match(r'^[a-z0-9]+(?:-[a-z0-9]+)*$', slug))
|
|
```
|
|
|
|
### 3. Avoid Premature Optimization
|
|
|
|
**Pattern**: Write clear code first, optimize if needed.
|
|
|
|
**Good**:
|
|
```python
|
|
def extract_first_words(text: str, max_words: int = 5) -> str:
|
|
"""Extract first N words from text."""
|
|
words = text.split()
|
|
return ' '.join(words[:max_words])
|
|
```
|
|
|
|
**Don't Do This Unless Profiling Shows It's Necessary**:
|
|
```python
|
|
def extract_first_words(text: str, max_words: int = 5) -> str:
|
|
"""Extract first N words from text."""
|
|
# Premature optimization - more complex, minimal gain
|
|
words = []
|
|
count = 0
|
|
for word in text.split():
|
|
words.append(word)
|
|
count += 1
|
|
if count >= max_words:
|
|
break
|
|
return ' '.join(words)
|
|
```
|
|
|
|
## Constants Pattern
|
|
|
|
### 1. Module-Level Constants
|
|
|
|
**Pattern**: Define configuration as constants at module level.
|
|
|
|
```python
|
|
# Slug configuration
|
|
MAX_SLUG_LENGTH = 100
|
|
MIN_SLUG_LENGTH = 1
|
|
SLUG_WORDS_COUNT = 5
|
|
RANDOM_SUFFIX_LENGTH = 4
|
|
|
|
# File operations
|
|
TEMP_FILE_SUFFIX = '.tmp'
|
|
TRASH_DIR_NAME = '.trash'
|
|
|
|
# Hashing
|
|
CONTENT_HASH_ALGORITHM = 'sha256'
|
|
```
|
|
|
|
**Why**: Easy to modify, clear intent, compile-time resolution.
|
|
|
|
### 2. Naming Convention
|
|
|
|
**Pattern**: ALL_CAPS_WITH_UNDERSCORES
|
|
|
|
```python
|
|
MAX_NOTE_LENGTH = 10000 # Good
|
|
DEFAULT_ENCODING = 'utf-8' # Good
|
|
maxNoteLength = 10000 # Bad
|
|
max_note_length = 10000 # Bad (looks like variable)
|
|
```
|
|
|
|
### 3. Magic Numbers
|
|
|
|
**Pattern**: Replace magic numbers with named constants.
|
|
|
|
**Good**:
|
|
```python
|
|
SLUG_WORDS_COUNT = 5
|
|
|
|
def generate_slug(content: str) -> str:
|
|
"""Generate slug from content."""
|
|
words = content.split()[:SLUG_WORDS_COUNT]
|
|
return normalize_slug_text(' '.join(words))
|
|
```
|
|
|
|
**Bad**:
|
|
```python
|
|
def generate_slug(content: str) -> str:
|
|
"""Generate slug from content."""
|
|
words = content.split()[:5] # What is 5? Why 5?
|
|
return normalize_slug_text(' '.join(words))
|
|
```
|
|
|
|
## Anti-Patterns to Avoid
|
|
|
|
### 1. God Functions
|
|
|
|
**Bad**:
|
|
```python
|
|
def process_note(content, do_hash=True, do_slug=True, do_path=True, ...):
|
|
"""Do everything with a note."""
|
|
# 500 lines of code doing too many things
|
|
pass
|
|
```
|
|
|
|
**Good**:
|
|
```python
|
|
def generate_slug(content: str) -> str:
|
|
"""Generate slug."""
|
|
pass
|
|
|
|
def calculate_hash(content: str) -> str:
|
|
"""Calculate hash."""
|
|
pass
|
|
|
|
def generate_path(slug: str, timestamp: datetime) -> Path:
|
|
"""Generate path."""
|
|
pass
|
|
```
|
|
|
|
### 2. Mutable Default Arguments
|
|
|
|
**Bad**:
|
|
```python
|
|
def create_note(content: str, tags: list = []) -> Note:
|
|
tags.append('note') # Modifies shared default list!
|
|
return Note(content, tags)
|
|
```
|
|
|
|
**Good**:
|
|
```python
|
|
def create_note(content: str, tags: Optional[List[str]] = None) -> Note:
|
|
if tags is None:
|
|
tags = []
|
|
tags.append('note')
|
|
return Note(content, tags)
|
|
```
|
|
|
|
### 3. Returning Multiple Types
|
|
|
|
**Bad**:
|
|
```python
|
|
def find_note(slug: str) -> Union[Note, bool, None]:
|
|
"""Find note by slug."""
|
|
if slug_invalid:
|
|
return False
|
|
note = db.query(slug)
|
|
return note if note else None
|
|
```
|
|
|
|
**Good**:
|
|
```python
|
|
def find_note(slug: str) -> Optional[Note]:
|
|
"""Find note by slug, returns None if not found."""
|
|
if not validate_slug(slug):
|
|
raise ValueError(f"Invalid slug: {slug}")
|
|
return db.query(slug) # Returns Note or None
|
|
```
|
|
|
|
### 4. Silent Failures
|
|
|
|
**Bad**:
|
|
```python
|
|
def generate_slug(content: str) -> str:
|
|
"""Generate slug."""
|
|
try:
|
|
slug = normalize_slug_text(content)
|
|
return slug if slug else "untitled" # Silent fallback
|
|
except Exception:
|
|
return "untitled" # Swallowing errors
|
|
```
|
|
|
|
**Good**:
|
|
```python
|
|
def generate_slug(content: str) -> str:
|
|
"""Generate slug."""
|
|
if not content or not content.strip():
|
|
raise ValueError("Content cannot be empty")
|
|
|
|
slug = normalize_slug_text(content)
|
|
if not slug:
|
|
# Explicit fallback with timestamp
|
|
slug = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
|
|
|
|
return slug
|
|
```
|
|
|
|
## Summary
|
|
|
|
**Key Principles**:
|
|
1. Pure functions are preferred
|
|
2. Specific exceptions with clear messages
|
|
3. Type hints on all functions
|
|
4. Comprehensive docstrings
|
|
5. Security-first validation
|
|
6. Test everything thoroughly
|
|
7. Constants for configuration
|
|
8. Clear over clever
|
|
|
|
**Remember**: Utility functions are the foundation. Make them rock-solid.
|
|
|
|
## References
|
|
|
|
- [Python Coding Standards](/home/phil/Projects/starpunk/docs/standards/python-coding-standards.md)
|
|
- [PEP 8 - Style Guide](https://peps.python.org/pep-0008/)
|
|
- [PEP 257 - Docstring Conventions](https://peps.python.org/pep-0257/)
|
|
- [PEP 484 - Type Hints](https://peps.python.org/pep-0484/)
|
|
- [Python secrets Documentation](https://docs.python.org/3/library/secrets.html)
|
|
- [Pytest Best Practices](https://docs.pytest.org/en/stable/goodpractices.html)
|