8.3 KiB
8.3 KiB
Phase 1.1 Quick Reference: Core Utilities
Quick Start
File: starpunk/utils.py
Tests: tests/test_utils.py
Estimated Time: 2-3 hours
Implementation Order
- Constants and imports
- Helper functions (extract_first_words, normalize_slug_text, generate_random_suffix)
- Slug functions (generate_slug, make_slug_unique, validate_slug)
- Content hashing (calculate_content_hash)
- Path functions (generate_note_path, ensure_note_directory, validate_note_path)
- File operations (write_note_file, read_note_file, delete_note_file)
- Date/time functions (format_rfc822, format_iso8601, parse_iso8601)
Function Checklist
Slug Generation (3 functions)
generate_slug(content: str, created_at: Optional[datetime] = None) -> strmake_slug_unique(base_slug: str, existing_slugs: Set[str]) -> strvalidate_slug(slug: str) -> bool
Content Hashing (1 function)
calculate_content_hash(content: str) -> str
Path Operations (3 functions)
generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Pathensure_note_directory(note_path: Path) -> Pathvalidate_note_path(file_path: Path, data_dir: Path) -> bool
File Operations (3 functions)
write_note_file(file_path: Path, content: str) -> Noneread_note_file(file_path: Path) -> strdelete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None
Date/Time (3 functions)
format_rfc822(dt: datetime) -> strformat_iso8601(dt: datetime) -> strparse_iso8601(date_string: str) -> datetime
Helper Functions (3 functions)
extract_first_words(text: str, max_words: int = 5) -> strnormalize_slug_text(text: str) -> strgenerate_random_suffix(length: int = 4) -> str
Total: 16 functions
Constants Required
# Slug configuration
MAX_SLUG_LENGTH = 100
MIN_SLUG_LENGTH = 1
SLUG_WORDS_COUNT = 5
RANDOM_SUFFIX_LENGTH = 4
# File operations
TEMP_FILE_SUFFIX = '.tmp'
TRASH_DIR_NAME = '.trash'
# Hashing
CONTENT_HASH_ALGORITHM = 'sha256'
# Regex patterns
SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$')
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')
MULTIPLE_HYPHENS_PATTERN = re.compile(r'-+')
# Character set
RANDOM_CHARS = 'abcdefghijklmnopqrstuvwxyz0123456789'
Key Algorithms
Slug Generation Algorithm
1. Extract first 5 words from content
2. Convert to lowercase
3. Replace spaces with hyphens
4. Remove all characters except a-z, 0-9, hyphens
5. Collapse multiple hyphens to single hyphen
6. Strip leading/trailing hyphens
7. Truncate to 100 characters
8. If empty or too short → timestamp fallback (YYYYMMDD-HHMMSS)
9. Return slug
Atomic File Write Algorithm
1. Create temp file path: file_path.with_suffix('.tmp')
2. Write content to temp file
3. Atomically rename temp to final path
4. On error: delete temp file, re-raise exception
Path Validation Algorithm
1. Resolve both paths to absolute
2. Check if file_path.is_relative_to(data_dir)
3. Return boolean
Test Coverage Requirements
- Minimum 90% code coverage
- Test all functions
- Test edge cases (empty, whitespace, unicode, special chars)
- Test error cases (invalid input, file errors)
- Test security (path traversal)
Example Test Structure
class TestSlugGeneration:
def test_generate_slug_from_content(self): pass
def test_generate_slug_empty_content(self): pass
def test_generate_slug_special_characters(self): pass
def test_make_slug_unique_no_collision(self): pass
def test_make_slug_unique_with_collision(self): pass
def test_validate_slug_valid(self): pass
def test_validate_slug_invalid(self): pass
class TestContentHashing:
def test_calculate_content_hash_consistency(self): pass
def test_calculate_content_hash_different(self): pass
def test_calculate_content_hash_empty(self): pass
class TestFilePathOperations:
def test_generate_note_path(self): pass
def test_validate_note_path_safe(self): pass
def test_validate_note_path_traversal(self): pass
class TestAtomicFileOperations:
def test_write_and_read_note_file(self): pass
def test_write_note_file_atomic(self): pass
def test_delete_note_file_hard(self): pass
def test_delete_note_file_soft(self): pass
class TestDateTimeFormatting:
def test_format_rfc822(self): pass
def test_format_iso8601(self): pass
def test_parse_iso8601(self): pass
Common Pitfalls to Avoid
- Don't use
randommodule → Usesecretsfor security - Don't forget path validation → Always validate before file operations
- Don't use magic numbers → Define as constants
- Don't skip temp file cleanup → Use try/finally
- Don't use bare
except:→ Catch specific exceptions - Don't forget type hints → All functions need type hints
- Don't skip docstrings → All functions need docstrings with examples
- Don't forget edge cases → Test empty, whitespace, unicode, special chars
Security Checklist
- Path validation prevents directory traversal
- Use
secretsmodule for random generation - Validate all external input
- Use atomic file writes
- Handle symlinks correctly (resolve paths)
- No hardcoded credentials or paths
- Error messages don't leak sensitive info
Performance Targets
- Slug generation: < 1ms
- File write: < 10ms
- File read: < 5ms
- Path validation: < 1ms
- Hash calculation: < 5ms for 10KB content
Module Structure Template
"""
Core utility functions for StarPunk
This module provides essential utilities for slug generation, file operations,
hashing, and date/time handling.
"""
# Standard library
import hashlib
import re
import secrets
from datetime import datetime
from pathlib import Path
from typing import Optional
# Third-party
# (none for utils.py)
# Constants
MAX_SLUG_LENGTH = 100
# ... more constants
# Helper functions
def extract_first_words(text: str, max_words: int = 5) -> str:
"""Extract first N words from text."""
pass
# ... more helpers
# Slug functions
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
"""Generate URL-safe slug from content."""
pass
# ... more slug functions
# Content hashing
def calculate_content_hash(content: str) -> str:
"""Calculate SHA-256 hash of content."""
pass
# Path operations
def generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path:
"""Generate file path for note."""
pass
# ... more path functions
# File operations
def write_note_file(file_path: Path, content: str) -> None:
"""Write note content to file atomically."""
pass
# ... more file functions
# Date/time functions
def format_rfc822(dt: datetime) -> str:
"""Format datetime as RFC-822 string."""
pass
# ... more date/time functions
Verification Checklist
Before marking Phase 1.1 complete:
- All 16 functions implemented
- All functions have type hints
- All functions have docstrings with examples
- All constants defined
- Test file created with >90% coverage
- All tests pass
- Code formatted with Black
- Code passes flake8
- No security issues
- No hardcoded values
- Error messages are clear
- Performance targets met
Next Steps After Implementation
Once starpunk/utils.py is complete:
- Move to Phase 1.2: Data Models (
starpunk/models.py) - Models will import and use these utilities
- Integration tests will verify utilities work with models
References
- Full design:
/home/phil/Projects/starpunk/docs/design/phase-1.1-core-utilities.md - ADR-007: Slug generation algorithm
- Python coding standards
- Utility function patterns
Quick Command Reference
# Run tests
pytest tests/test_utils.py -v
# Run tests with coverage
pytest tests/test_utils.py --cov=starpunk.utils --cov-report=term-missing
# Format code
black starpunk/utils.py tests/test_utils.py
# Lint code
flake8 starpunk/utils.py tests/test_utils.py
# Type check (optional)
mypy starpunk/utils.py
Estimated Time Breakdown
- Constants and imports: 10 minutes
- Helper functions: 20 minutes
- Slug functions: 30 minutes
- Content hashing: 10 minutes
- Path functions: 25 minutes
- File operations: 35 minutes
- Date/time functions: 15 minutes
- Tests: 60-90 minutes
- Documentation review: 15 minutes
Total: 2-3 hours