Files
StarPunk/docs/design/phase-1.1-quick-reference.md
2025-11-18 19:21:31 -07:00

310 lines
8.3 KiB
Markdown

# Phase 1.1 Quick Reference: Core Utilities
## Quick Start
**File**: `starpunk/utils.py`
**Tests**: `tests/test_utils.py`
**Estimated Time**: 2-3 hours
## Implementation Order
1. Constants and imports
2. Helper functions (extract_first_words, normalize_slug_text, generate_random_suffix)
3. Slug functions (generate_slug, make_slug_unique, validate_slug)
4. Content hashing (calculate_content_hash)
5. Path functions (generate_note_path, ensure_note_directory, validate_note_path)
6. File operations (write_note_file, read_note_file, delete_note_file)
7. Date/time functions (format_rfc822, format_iso8601, parse_iso8601)
## Function Checklist
### Slug Generation (3 functions)
- [ ] `generate_slug(content: str, created_at: Optional[datetime] = None) -> str`
- [ ] `make_slug_unique(base_slug: str, existing_slugs: Set[str]) -> str`
- [ ] `validate_slug(slug: str) -> bool`
### Content Hashing (1 function)
- [ ] `calculate_content_hash(content: str) -> str`
### Path Operations (3 functions)
- [ ] `generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path`
- [ ] `ensure_note_directory(note_path: Path) -> Path`
- [ ] `validate_note_path(file_path: Path, data_dir: Path) -> bool`
### File Operations (3 functions)
- [ ] `write_note_file(file_path: Path, content: str) -> None`
- [ ] `read_note_file(file_path: Path) -> str`
- [ ] `delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None`
### Date/Time (3 functions)
- [ ] `format_rfc822(dt: datetime) -> str`
- [ ] `format_iso8601(dt: datetime) -> str`
- [ ] `parse_iso8601(date_string: str) -> datetime`
### Helper Functions (3 functions)
- [ ] `extract_first_words(text: str, max_words: int = 5) -> str`
- [ ] `normalize_slug_text(text: str) -> str`
- [ ] `generate_random_suffix(length: int = 4) -> str`
**Total**: 16 functions
## Constants Required
```python
# Slug configuration
MAX_SLUG_LENGTH = 100
MIN_SLUG_LENGTH = 1
SLUG_WORDS_COUNT = 5
RANDOM_SUFFIX_LENGTH = 4
# File operations
TEMP_FILE_SUFFIX = '.tmp'
TRASH_DIR_NAME = '.trash'
# Hashing
CONTENT_HASH_ALGORITHM = 'sha256'
# Regex patterns
SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$')
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')
MULTIPLE_HYPHENS_PATTERN = re.compile(r'-+')
# Character set
RANDOM_CHARS = 'abcdefghijklmnopqrstuvwxyz0123456789'
```
## Key Algorithms
### Slug Generation Algorithm
```
1. Extract first 5 words from content
2. Convert to lowercase
3. Replace spaces with hyphens
4. Remove all characters except a-z, 0-9, hyphens
5. Collapse multiple hyphens to single hyphen
6. Strip leading/trailing hyphens
7. Truncate to 100 characters
8. If empty or too short → timestamp fallback (YYYYMMDD-HHMMSS)
9. Return slug
```
### Atomic File Write Algorithm
```
1. Create temp file path: file_path.with_suffix('.tmp')
2. Write content to temp file
3. Atomically rename temp to final path
4. On error: delete temp file, re-raise exception
```
### Path Validation Algorithm
```
1. Resolve both paths to absolute
2. Check if file_path.is_relative_to(data_dir)
3. Return boolean
```
## Test Coverage Requirements
- Minimum 90% code coverage
- Test all functions
- Test edge cases (empty, whitespace, unicode, special chars)
- Test error cases (invalid input, file errors)
- Test security (path traversal)
## Example Test Structure
```python
class TestSlugGeneration:
def test_generate_slug_from_content(self): pass
def test_generate_slug_empty_content(self): pass
def test_generate_slug_special_characters(self): pass
def test_make_slug_unique_no_collision(self): pass
def test_make_slug_unique_with_collision(self): pass
def test_validate_slug_valid(self): pass
def test_validate_slug_invalid(self): pass
class TestContentHashing:
def test_calculate_content_hash_consistency(self): pass
def test_calculate_content_hash_different(self): pass
def test_calculate_content_hash_empty(self): pass
class TestFilePathOperations:
def test_generate_note_path(self): pass
def test_validate_note_path_safe(self): pass
def test_validate_note_path_traversal(self): pass
class TestAtomicFileOperations:
def test_write_and_read_note_file(self): pass
def test_write_note_file_atomic(self): pass
def test_delete_note_file_hard(self): pass
def test_delete_note_file_soft(self): pass
class TestDateTimeFormatting:
def test_format_rfc822(self): pass
def test_format_iso8601(self): pass
def test_parse_iso8601(self): pass
```
## Common Pitfalls to Avoid
1. **Don't use `random` module** → Use `secrets` for security
2. **Don't forget path validation** → Always validate before file operations
3. **Don't use magic numbers** → Define as constants
4. **Don't skip temp file cleanup** → Use try/finally
5. **Don't use bare `except:`** → Catch specific exceptions
6. **Don't forget type hints** → All functions need type hints
7. **Don't skip docstrings** → All functions need docstrings with examples
8. **Don't forget edge cases** → Test empty, whitespace, unicode, special chars
## Security Checklist
- [ ] Path validation prevents directory traversal
- [ ] Use `secrets` module for random generation
- [ ] Validate all external input
- [ ] Use atomic file writes
- [ ] Handle symlinks correctly (resolve paths)
- [ ] No hardcoded credentials or paths
- [ ] Error messages don't leak sensitive info
## Performance Targets
- Slug generation: < 1ms
- File write: < 10ms
- File read: < 5ms
- Path validation: < 1ms
- Hash calculation: < 5ms for 10KB content
## Module Structure Template
```python
"""
Core utility functions for StarPunk
This module provides essential utilities for slug generation, file operations,
hashing, and date/time handling.
"""
# Standard library
import hashlib
import re
import secrets
from datetime import datetime
from pathlib import Path
from typing import Optional
# Third-party
# (none for utils.py)
# Constants
MAX_SLUG_LENGTH = 100
# ... more constants
# Helper functions
def extract_first_words(text: str, max_words: int = 5) -> str:
"""Extract first N words from text."""
pass
# ... more helpers
# Slug functions
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
"""Generate URL-safe slug from content."""
pass
# ... more slug functions
# Content hashing
def calculate_content_hash(content: str) -> str:
"""Calculate SHA-256 hash of content."""
pass
# Path operations
def generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path:
"""Generate file path for note."""
pass
# ... more path functions
# File operations
def write_note_file(file_path: Path, content: str) -> None:
"""Write note content to file atomically."""
pass
# ... more file functions
# Date/time functions
def format_rfc822(dt: datetime) -> str:
"""Format datetime as RFC-822 string."""
pass
# ... more date/time functions
```
## Verification Checklist
Before marking Phase 1.1 complete:
- [ ] All 16 functions implemented
- [ ] All functions have type hints
- [ ] All functions have docstrings with examples
- [ ] All constants defined
- [ ] Test file created with >90% coverage
- [ ] All tests pass
- [ ] Code formatted with Black
- [ ] Code passes flake8
- [ ] No security issues
- [ ] No hardcoded values
- [ ] Error messages are clear
- [ ] Performance targets met
## Next Steps After Implementation
Once `starpunk/utils.py` is complete:
1. Move to Phase 1.2: Data Models (`starpunk/models.py`)
2. Models will import and use these utilities
3. Integration tests will verify utilities work with models
## References
- Full design: `/home/phil/Projects/starpunk/docs/design/phase-1.1-core-utilities.md`
- ADR-007: Slug generation algorithm
- Python coding standards
- Utility function patterns
## Quick Command Reference
```bash
# Run tests
pytest tests/test_utils.py -v
# Run tests with coverage
pytest tests/test_utils.py --cov=starpunk.utils --cov-report=term-missing
# Format code
black starpunk/utils.py tests/test_utils.py
# Lint code
flake8 starpunk/utils.py tests/test_utils.py
# Type check (optional)
mypy starpunk/utils.py
```
## Estimated Time Breakdown
- Constants and imports: 10 minutes
- Helper functions: 20 minutes
- Slug functions: 30 minutes
- Content hashing: 10 minutes
- Path functions: 25 minutes
- File operations: 35 minutes
- Date/time functions: 15 minutes
- Tests: 60-90 minutes
- Documentation review: 15 minutes
**Total**: 2-3 hours