that initial commit

This commit is contained in:
2025-11-18 19:21:31 -07:00
commit a68fd570c7
69 changed files with 31070 additions and 0 deletions

View File

@@ -0,0 +1,734 @@
# Utility Function Patterns
## Purpose
This document establishes coding patterns and conventions specifically for utility functions in StarPunk. These patterns ensure utilities are consistent, testable, and maintainable.
## Philosophy
Utility functions should be:
- **Pure**: No side effects where possible
- **Focused**: One responsibility per function
- **Predictable**: Same input always produces same output
- **Testable**: Easy to test in isolation
- **Documented**: Clear purpose and usage
## Function Design Patterns
### 1. Pure Functions (Preferred)
**Pattern**: Functions that don't modify state or have side effects.
**Good Example**:
```python
def calculate_content_hash(content: str) -> str:
"""Calculate SHA-256 hash of content."""
return hashlib.sha256(content.encode('utf-8')).hexdigest()
```
**Why**: Easy to test, no hidden dependencies, predictable behavior.
**When to Use**: Calculations, transformations, validations.
### 2. Functions with I/O Side Effects
**Pattern**: Functions that read/write files or interact with external systems.
**Good Example**:
```python
def write_note_file(file_path: Path, content: str) -> None:
"""Write note content to file atomically."""
temp_path = file_path.with_suffix(file_path.suffix + '.tmp')
try:
temp_path.write_text(content, encoding='utf-8')
temp_path.replace(file_path)
except Exception:
temp_path.unlink(missing_ok=True)
raise
```
**Why**: Side effects are isolated, error handling is explicit, cleanup is guaranteed.
**When to Use**: File operations, database operations, network calls.
### 3. Validation Functions
**Pattern**: Functions that check validity and return boolean or raise exception.
**Good Example** (Boolean Return):
```python
def validate_slug(slug: str) -> bool:
"""Validate that slug meets requirements."""
if not slug:
return False
if len(slug) > MAX_SLUG_LENGTH:
return False
return bool(SLUG_PATTERN.match(slug))
```
**Good Example** (Exception Raising):
```python
def require_valid_slug(slug: str) -> None:
"""Require slug to be valid, raise ValueError if not."""
if not validate_slug(slug):
raise ValueError(
f"Invalid slug '{slug}': must be 1-100 characters, "
f"lowercase alphanumeric with hyphens only"
)
```
**When to Use**: Input validation, precondition checking, security checks.
### 4. Generator Functions
**Pattern**: Functions that yield values instead of returning lists.
**Good Example**:
```python
def iter_note_files(notes_dir: Path) -> Iterator[Path]:
"""Iterate over all note files in directory."""
for year_dir in sorted(notes_dir.iterdir()):
if not year_dir.is_dir():
continue
for month_dir in sorted(year_dir.iterdir()):
if not month_dir.is_dir():
continue
for note_file in sorted(month_dir.glob("*.md")):
yield note_file
```
**Why**: Memory efficient, lazy evaluation, can be interrupted.
**When to Use**: Processing many items, large datasets, streaming operations.
## Error Handling Patterns
### 1. Specific Exceptions
**Pattern**: Raise specific exception types with descriptive messages.
**Good**:
```python
def generate_slug(content: str) -> str:
"""Generate slug from content."""
if not content or not content.strip():
raise ValueError("Content cannot be empty or whitespace-only")
# ... rest of function
```
**Bad**:
```python
def generate_slug(content: str) -> str:
if not content:
raise Exception("Bad input") # Too generic
```
### 2. Error Message Format
**Pattern**: Include context, expected behavior, and actual problem.
**Good**:
```python
raise ValueError(
f"Invalid slug '{slug}': must contain only lowercase letters, "
f"numbers, and hyphens (pattern: {SLUG_PATTERN.pattern})"
)
raise FileNotFoundError(
f"Note file not found: {file_path}. "
f"Database may be out of sync with filesystem."
)
```
**Bad**:
```python
raise ValueError("Invalid slug")
raise FileNotFoundError("File missing")
```
### 3. Exception Chaining
**Pattern**: Preserve original exception context when re-raising.
**Good**:
```python
def read_note_file(file_path: Path) -> str:
"""Read note content from file."""
try:
return file_path.read_text(encoding='utf-8')
except UnicodeDecodeError as e:
raise ValueError(
f"Failed to read {file_path}: invalid UTF-8 encoding"
) from e
```
**Why**: Preserves stack trace, shows root cause.
### 4. Cleanup on Error
**Pattern**: Use try/finally or context managers to ensure cleanup.
**Good**:
```python
def write_note_file(file_path: Path, content: str) -> None:
"""Write note content to file atomically."""
temp_path = file_path.with_suffix('.tmp')
try:
temp_path.write_text(content, encoding='utf-8')
temp_path.replace(file_path)
except Exception:
temp_path.unlink(missing_ok=True) # Cleanup on error
raise
```
## Type Hint Patterns
### 1. Basic Types
**Pattern**: Use built-in types where possible.
```python
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
"""Generate URL-safe slug from content."""
pass
def validate_note_path(file_path: Path, data_dir: Path) -> bool:
"""Validate file path is within data directory."""
pass
```
### 2. Collection Types
**Pattern**: Specify element types for collections.
```python
from typing import List, Dict, Set, Optional
def make_slug_unique(base_slug: str, existing_slugs: Set[str]) -> str:
"""Make slug unique by adding suffix if needed."""
pass
def get_note_paths(notes_dir: Path) -> List[Path]:
"""Get all note file paths."""
pass
```
### 3. Optional Types
**Pattern**: Use Optional[T] for nullable parameters and returns.
```python
from typing import Optional
def find_note(slug: str) -> Optional[Path]:
"""Find note file by slug, returns None if not found."""
pass
```
### 4. Union Types (Use Sparingly)
**Pattern**: Use Union when a parameter truly accepts multiple types.
```python
from typing import Union
from pathlib import Path
def ensure_path(path: Union[str, Path]) -> Path:
"""Convert string or Path to Path object."""
return Path(path) if isinstance(path, str) else path
```
**Note**: Prefer single types. Only use Union when necessary.
## Documentation Patterns
### 1. Function Docstrings
**Pattern**: Google-style docstrings with sections.
```python
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
"""
Generate URL-safe slug from note content
Creates a slug by extracting the first few words from the content and
normalizing them to lowercase with hyphens. If content is insufficient,
falls back to timestamp-based slug.
Args:
content: The note content (markdown text)
created_at: Optional timestamp for fallback slug (defaults to now)
Returns:
URL-safe slug string (lowercase, alphanumeric + hyphens only)
Raises:
ValueError: If content is empty or contains only whitespace
Examples:
>>> generate_slug("Hello World! This is my first note.")
'hello-world-this-is-my'
>>> generate_slug("Testing... with special chars!@#")
'testing-with-special-chars'
Notes:
- This function does NOT check for uniqueness
- Caller must verify slug doesn't exist in database
"""
```
**Required Sections**:
- Summary (first line)
- Description (paragraph after summary)
- Args (if any parameters)
- Returns (if returns value)
- Raises (if raises exceptions)
**Optional Sections**:
- Examples (highly recommended)
- Notes (for important caveats)
- References (for external specs)
### 2. Inline Comments
**Pattern**: Explain why, not what.
**Good**:
```python
# Use atomic rename to prevent file corruption if interrupted
temp_path.replace(file_path)
# Random suffix prevents enumeration attacks
suffix = secrets.token_urlsafe(4)[:4]
```
**Bad**:
```python
# Rename temp file to final path
temp_path.replace(file_path)
# Generate suffix
suffix = secrets.token_urlsafe(4)[:4]
```
### 3. Module Docstrings
**Pattern**: Describe module purpose and contents.
```python
"""
Core utility functions for StarPunk
This module provides essential utilities for slug generation, file operations,
hashing, and date/time handling. These utilities are used throughout the
application and have no external dependencies beyond standard library and
Flask configuration.
Functions:
generate_slug: Create URL-safe slug from content
calculate_content_hash: Calculate SHA-256 hash
write_note_file: Atomically write note to file
format_rfc822: Format datetime for RSS feeds
Constants:
MAX_SLUG_LENGTH: Maximum slug length (100)
SLUG_PATTERN: Regex for valid slugs
"""
```
## Testing Patterns
### 1. Test Function Naming
**Pattern**: `test_{function_name}_{scenario}`
```python
def test_generate_slug_from_content():
"""Test basic slug generation from content."""
slug = generate_slug("Hello World This Is My Note")
assert slug == "hello-world-this-is-my"
def test_generate_slug_empty_content():
"""Test slug generation raises error on empty content."""
with pytest.raises(ValueError):
generate_slug("")
def test_generate_slug_special_characters():
"""Test slug generation removes special characters."""
slug = generate_slug("Testing... with!@# special chars")
assert slug == "testing-with-special-chars"
```
### 2. Test Organization
**Pattern**: Group related tests in classes.
```python
class TestSlugGeneration:
"""Test slug generation functions"""
def test_generate_slug_from_content(self):
"""Test basic slug generation."""
pass
def test_generate_slug_empty_content(self):
"""Test error on empty content."""
pass
class TestContentHashing:
"""Test content hashing functions"""
def test_calculate_hash_consistency(self):
"""Test hash is consistent."""
pass
```
### 3. Fixtures for Common Setup
**Pattern**: Use pytest fixtures for reusable test data.
```python
import pytest
from pathlib import Path
@pytest.fixture
def temp_note_file(tmp_path):
"""Create temporary note file for testing."""
file_path = tmp_path / "test.md"
file_path.write_text("# Test Note")
return file_path
def test_read_note_file(temp_note_file):
"""Test reading note file."""
content = read_note_file(temp_note_file)
assert content == "# Test Note"
```
### 4. Parameterized Tests
**Pattern**: Test multiple cases with one test function.
```python
@pytest.mark.parametrize("content,expected", [
("Hello World", "hello-world"),
("Testing 123", "testing-123"),
("Special!@# Chars", "special-chars"),
])
def test_generate_slug_variations(content, expected):
"""Test slug generation with various inputs."""
slug = generate_slug(content)
assert slug == expected
```
## Security Patterns
### 1. Path Validation
**Pattern**: Always validate paths before file operations.
```python
def safe_file_operation(file_path: Path, data_dir: Path) -> None:
"""Perform file operation with path validation."""
# ALWAYS validate first
if not validate_note_path(file_path, data_dir):
raise ValueError(f"Invalid file path: {file_path}")
# Now safe to operate
file_path.write_text("content")
```
### 2. Use secrets for Random
**Pattern**: Use `secrets` module, not `random` for security-sensitive operations.
**Good**:
```python
import secrets
def generate_random_suffix(length: int = 4) -> str:
"""Generate cryptographically secure random suffix."""
chars = 'abcdefghijklmnopqrstuvwxyz0123456789'
return ''.join(secrets.choice(chars) for _ in range(length))
```
**Bad**:
```python
import random
def generate_random_suffix(length: int = 4) -> str:
"""Generate random suffix."""
chars = 'abcdefghijklmnopqrstuvwxyz0123456789'
return ''.join(random.choice(chars) for _ in range(length)) # NOT secure
```
### 3. Input Sanitization
**Pattern**: Validate and sanitize all external input.
```python
def generate_slug(content: str) -> str:
"""Generate slug from content."""
# Validate input
if not content or not content.strip():
raise ValueError("Content cannot be empty")
# Sanitize by removing dangerous characters
normalized = normalize_slug_text(content)
# Additional validation
if not validate_slug(normalized):
# Fallback to safe default
normalized = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
return normalized
```
## Performance Patterns
### 1. Lazy Evaluation
**Pattern**: Don't compute what you don't need.
**Good**:
```python
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
"""Generate slug from content."""
slug = normalize_slug_text(content)
# Only generate timestamp if needed
if not slug:
created_at = created_at or datetime.utcnow()
slug = created_at.strftime("%Y%m%d-%H%M%S")
return slug
```
### 2. Compile Regex Once
**Pattern**: Define regex patterns as module constants.
**Good**:
```python
# Module level - compiled once
SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$')
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')
def validate_slug(slug: str) -> bool:
"""Validate slug."""
return bool(SLUG_PATTERN.match(slug))
def normalize_slug_text(text: str) -> str:
"""Normalize text for slug."""
return SAFE_SLUG_PATTERN.sub('', text)
```
**Bad**:
```python
def validate_slug(slug: str) -> bool:
"""Validate slug."""
# Compiles regex every call - inefficient
return bool(re.match(r'^[a-z0-9]+(?:-[a-z0-9]+)*$', slug))
```
### 3. Avoid Premature Optimization
**Pattern**: Write clear code first, optimize if needed.
**Good**:
```python
def extract_first_words(text: str, max_words: int = 5) -> str:
"""Extract first N words from text."""
words = text.split()
return ' '.join(words[:max_words])
```
**Don't Do This Unless Profiling Shows It's Necessary**:
```python
def extract_first_words(text: str, max_words: int = 5) -> str:
"""Extract first N words from text."""
# Premature optimization - more complex, minimal gain
words = []
count = 0
for word in text.split():
words.append(word)
count += 1
if count >= max_words:
break
return ' '.join(words)
```
## Constants Pattern
### 1. Module-Level Constants
**Pattern**: Define configuration as constants at module level.
```python
# Slug configuration
MAX_SLUG_LENGTH = 100
MIN_SLUG_LENGTH = 1
SLUG_WORDS_COUNT = 5
RANDOM_SUFFIX_LENGTH = 4
# File operations
TEMP_FILE_SUFFIX = '.tmp'
TRASH_DIR_NAME = '.trash'
# Hashing
CONTENT_HASH_ALGORITHM = 'sha256'
```
**Why**: Easy to modify, clear intent, compile-time resolution.
### 2. Naming Convention
**Pattern**: ALL_CAPS_WITH_UNDERSCORES
```python
MAX_NOTE_LENGTH = 10000 # Good
DEFAULT_ENCODING = 'utf-8' # Good
maxNoteLength = 10000 # Bad
max_note_length = 10000 # Bad (looks like variable)
```
### 3. Magic Numbers
**Pattern**: Replace magic numbers with named constants.
**Good**:
```python
SLUG_WORDS_COUNT = 5
def generate_slug(content: str) -> str:
"""Generate slug from content."""
words = content.split()[:SLUG_WORDS_COUNT]
return normalize_slug_text(' '.join(words))
```
**Bad**:
```python
def generate_slug(content: str) -> str:
"""Generate slug from content."""
words = content.split()[:5] # What is 5? Why 5?
return normalize_slug_text(' '.join(words))
```
## Anti-Patterns to Avoid
### 1. God Functions
**Bad**:
```python
def process_note(content, do_hash=True, do_slug=True, do_path=True, ...):
"""Do everything with a note."""
# 500 lines of code doing too many things
pass
```
**Good**:
```python
def generate_slug(content: str) -> str:
"""Generate slug."""
pass
def calculate_hash(content: str) -> str:
"""Calculate hash."""
pass
def generate_path(slug: str, timestamp: datetime) -> Path:
"""Generate path."""
pass
```
### 2. Mutable Default Arguments
**Bad**:
```python
def create_note(content: str, tags: list = []) -> Note:
tags.append('note') # Modifies shared default list!
return Note(content, tags)
```
**Good**:
```python
def create_note(content: str, tags: Optional[List[str]] = None) -> Note:
if tags is None:
tags = []
tags.append('note')
return Note(content, tags)
```
### 3. Returning Multiple Types
**Bad**:
```python
def find_note(slug: str) -> Union[Note, bool, None]:
"""Find note by slug."""
if slug_invalid:
return False
note = db.query(slug)
return note if note else None
```
**Good**:
```python
def find_note(slug: str) -> Optional[Note]:
"""Find note by slug, returns None if not found."""
if not validate_slug(slug):
raise ValueError(f"Invalid slug: {slug}")
return db.query(slug) # Returns Note or None
```
### 4. Silent Failures
**Bad**:
```python
def generate_slug(content: str) -> str:
"""Generate slug."""
try:
slug = normalize_slug_text(content)
return slug if slug else "untitled" # Silent fallback
except Exception:
return "untitled" # Swallowing errors
```
**Good**:
```python
def generate_slug(content: str) -> str:
"""Generate slug."""
if not content or not content.strip():
raise ValueError("Content cannot be empty")
slug = normalize_slug_text(content)
if not slug:
# Explicit fallback with timestamp
slug = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
return slug
```
## Summary
**Key Principles**:
1. Pure functions are preferred
2. Specific exceptions with clear messages
3. Type hints on all functions
4. Comprehensive docstrings
5. Security-first validation
6. Test everything thoroughly
7. Constants for configuration
8. Clear over clever
**Remember**: Utility functions are the foundation. Make them rock-solid.
## References
- [Python Coding Standards](/home/phil/Projects/starpunk/docs/standards/python-coding-standards.md)
- [PEP 8 - Style Guide](https://peps.python.org/pep-0008/)
- [PEP 257 - Docstring Conventions](https://peps.python.org/pep-0257/)
- [PEP 484 - Type Hints](https://peps.python.org/pep-0484/)
- [Python secrets Documentation](https://docs.python.org/3/library/secrets.html)
- [Pytest Best Practices](https://docs.pytest.org/en/stable/goodpractices.html)