# Phase 1.1 Quick Reference: Core Utilities ## Quick Start **File**: `starpunk/utils.py` **Tests**: `tests/test_utils.py` **Estimated Time**: 2-3 hours ## Implementation Order 1. Constants and imports 2. Helper functions (extract_first_words, normalize_slug_text, generate_random_suffix) 3. Slug functions (generate_slug, make_slug_unique, validate_slug) 4. Content hashing (calculate_content_hash) 5. Path functions (generate_note_path, ensure_note_directory, validate_note_path) 6. File operations (write_note_file, read_note_file, delete_note_file) 7. Date/time functions (format_rfc822, format_iso8601, parse_iso8601) ## Function Checklist ### Slug Generation (3 functions) - [ ] `generate_slug(content: str, created_at: Optional[datetime] = None) -> str` - [ ] `make_slug_unique(base_slug: str, existing_slugs: Set[str]) -> str` - [ ] `validate_slug(slug: str) -> bool` ### Content Hashing (1 function) - [ ] `calculate_content_hash(content: str) -> str` ### Path Operations (3 functions) - [ ] `generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path` - [ ] `ensure_note_directory(note_path: Path) -> Path` - [ ] `validate_note_path(file_path: Path, data_dir: Path) -> bool` ### File Operations (3 functions) - [ ] `write_note_file(file_path: Path, content: str) -> None` - [ ] `read_note_file(file_path: Path) -> str` - [ ] `delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None` ### Date/Time (3 functions) - [ ] `format_rfc822(dt: datetime) -> str` - [ ] `format_iso8601(dt: datetime) -> str` - [ ] `parse_iso8601(date_string: str) -> datetime` ### Helper Functions (3 functions) - [ ] `extract_first_words(text: str, max_words: int = 5) -> str` - [ ] `normalize_slug_text(text: str) -> str` - [ ] `generate_random_suffix(length: int = 4) -> str` **Total**: 16 functions ## Constants Required ```python # Slug configuration MAX_SLUG_LENGTH = 100 MIN_SLUG_LENGTH = 1 SLUG_WORDS_COUNT = 5 RANDOM_SUFFIX_LENGTH = 4 # File operations TEMP_FILE_SUFFIX = '.tmp' TRASH_DIR_NAME = '.trash' # Hashing CONTENT_HASH_ALGORITHM = 'sha256' # Regex patterns SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$') SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]') MULTIPLE_HYPHENS_PATTERN = re.compile(r'-+') # Character set RANDOM_CHARS = 'abcdefghijklmnopqrstuvwxyz0123456789' ``` ## Key Algorithms ### Slug Generation Algorithm ``` 1. Extract first 5 words from content 2. Convert to lowercase 3. Replace spaces with hyphens 4. Remove all characters except a-z, 0-9, hyphens 5. Collapse multiple hyphens to single hyphen 6. Strip leading/trailing hyphens 7. Truncate to 100 characters 8. If empty or too short → timestamp fallback (YYYYMMDD-HHMMSS) 9. Return slug ``` ### Atomic File Write Algorithm ``` 1. Create temp file path: file_path.with_suffix('.tmp') 2. Write content to temp file 3. Atomically rename temp to final path 4. On error: delete temp file, re-raise exception ``` ### Path Validation Algorithm ``` 1. Resolve both paths to absolute 2. Check if file_path.is_relative_to(data_dir) 3. Return boolean ``` ## Test Coverage Requirements - Minimum 90% code coverage - Test all functions - Test edge cases (empty, whitespace, unicode, special chars) - Test error cases (invalid input, file errors) - Test security (path traversal) ## Example Test Structure ```python class TestSlugGeneration: def test_generate_slug_from_content(self): pass def test_generate_slug_empty_content(self): pass def test_generate_slug_special_characters(self): pass def test_make_slug_unique_no_collision(self): pass def test_make_slug_unique_with_collision(self): pass def test_validate_slug_valid(self): pass def test_validate_slug_invalid(self): pass class TestContentHashing: def test_calculate_content_hash_consistency(self): pass def test_calculate_content_hash_different(self): pass def test_calculate_content_hash_empty(self): pass class TestFilePathOperations: def test_generate_note_path(self): pass def test_validate_note_path_safe(self): pass def test_validate_note_path_traversal(self): pass class TestAtomicFileOperations: def test_write_and_read_note_file(self): pass def test_write_note_file_atomic(self): pass def test_delete_note_file_hard(self): pass def test_delete_note_file_soft(self): pass class TestDateTimeFormatting: def test_format_rfc822(self): pass def test_format_iso8601(self): pass def test_parse_iso8601(self): pass ``` ## Common Pitfalls to Avoid 1. **Don't use `random` module** → Use `secrets` for security 2. **Don't forget path validation** → Always validate before file operations 3. **Don't use magic numbers** → Define as constants 4. **Don't skip temp file cleanup** → Use try/finally 5. **Don't use bare `except:`** → Catch specific exceptions 6. **Don't forget type hints** → All functions need type hints 7. **Don't skip docstrings** → All functions need docstrings with examples 8. **Don't forget edge cases** → Test empty, whitespace, unicode, special chars ## Security Checklist - [ ] Path validation prevents directory traversal - [ ] Use `secrets` module for random generation - [ ] Validate all external input - [ ] Use atomic file writes - [ ] Handle symlinks correctly (resolve paths) - [ ] No hardcoded credentials or paths - [ ] Error messages don't leak sensitive info ## Performance Targets - Slug generation: < 1ms - File write: < 10ms - File read: < 5ms - Path validation: < 1ms - Hash calculation: < 5ms for 10KB content ## Module Structure Template ```python """ Core utility functions for StarPunk This module provides essential utilities for slug generation, file operations, hashing, and date/time handling. """ # Standard library import hashlib import re import secrets from datetime import datetime from pathlib import Path from typing import Optional # Third-party # (none for utils.py) # Constants MAX_SLUG_LENGTH = 100 # ... more constants # Helper functions def extract_first_words(text: str, max_words: int = 5) -> str: """Extract first N words from text.""" pass # ... more helpers # Slug functions def generate_slug(content: str, created_at: Optional[datetime] = None) -> str: """Generate URL-safe slug from content.""" pass # ... more slug functions # Content hashing def calculate_content_hash(content: str) -> str: """Calculate SHA-256 hash of content.""" pass # Path operations def generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path: """Generate file path for note.""" pass # ... more path functions # File operations def write_note_file(file_path: Path, content: str) -> None: """Write note content to file atomically.""" pass # ... more file functions # Date/time functions def format_rfc822(dt: datetime) -> str: """Format datetime as RFC-822 string.""" pass # ... more date/time functions ``` ## Verification Checklist Before marking Phase 1.1 complete: - [ ] All 16 functions implemented - [ ] All functions have type hints - [ ] All functions have docstrings with examples - [ ] All constants defined - [ ] Test file created with >90% coverage - [ ] All tests pass - [ ] Code formatted with Black - [ ] Code passes flake8 - [ ] No security issues - [ ] No hardcoded values - [ ] Error messages are clear - [ ] Performance targets met ## Next Steps After Implementation Once `starpunk/utils.py` is complete: 1. Move to Phase 1.2: Data Models (`starpunk/models.py`) 2. Models will import and use these utilities 3. Integration tests will verify utilities work with models ## References - Full design: `/home/phil/Projects/starpunk/docs/design/phase-1.1-core-utilities.md` - ADR-007: Slug generation algorithm - Python coding standards - Utility function patterns ## Quick Command Reference ```bash # Run tests pytest tests/test_utils.py -v # Run tests with coverage pytest tests/test_utils.py --cov=starpunk.utils --cov-report=term-missing # Format code black starpunk/utils.py tests/test_utils.py # Lint code flake8 starpunk/utils.py tests/test_utils.py # Type check (optional) mypy starpunk/utils.py ``` ## Estimated Time Breakdown - Constants and imports: 10 minutes - Helper functions: 20 minutes - Slug functions: 30 minutes - Content hashing: 10 minutes - Path functions: 25 minutes - File operations: 35 minutes - Date/time functions: 15 minutes - Tests: 60-90 minutes - Documentation review: 15 minutes **Total**: 2-3 hours