Files
StarPunk/docs/design/phase-1.1-quick-reference.md
2025-11-18 19:21:31 -07:00

8.3 KiB

Phase 1.1 Quick Reference: Core Utilities

Quick Start

File: starpunk/utils.py Tests: tests/test_utils.py Estimated Time: 2-3 hours

Implementation Order

  1. Constants and imports
  2. Helper functions (extract_first_words, normalize_slug_text, generate_random_suffix)
  3. Slug functions (generate_slug, make_slug_unique, validate_slug)
  4. Content hashing (calculate_content_hash)
  5. Path functions (generate_note_path, ensure_note_directory, validate_note_path)
  6. File operations (write_note_file, read_note_file, delete_note_file)
  7. Date/time functions (format_rfc822, format_iso8601, parse_iso8601)

Function Checklist

Slug Generation (3 functions)

  • generate_slug(content: str, created_at: Optional[datetime] = None) -> str
  • make_slug_unique(base_slug: str, existing_slugs: Set[str]) -> str
  • validate_slug(slug: str) -> bool

Content Hashing (1 function)

  • calculate_content_hash(content: str) -> str

Path Operations (3 functions)

  • generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path
  • ensure_note_directory(note_path: Path) -> Path
  • validate_note_path(file_path: Path, data_dir: Path) -> bool

File Operations (3 functions)

  • write_note_file(file_path: Path, content: str) -> None
  • read_note_file(file_path: Path) -> str
  • delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None

Date/Time (3 functions)

  • format_rfc822(dt: datetime) -> str
  • format_iso8601(dt: datetime) -> str
  • parse_iso8601(date_string: str) -> datetime

Helper Functions (3 functions)

  • extract_first_words(text: str, max_words: int = 5) -> str
  • normalize_slug_text(text: str) -> str
  • generate_random_suffix(length: int = 4) -> str

Total: 16 functions

Constants Required

# Slug configuration
MAX_SLUG_LENGTH = 100
MIN_SLUG_LENGTH = 1
SLUG_WORDS_COUNT = 5
RANDOM_SUFFIX_LENGTH = 4

# File operations
TEMP_FILE_SUFFIX = '.tmp'
TRASH_DIR_NAME = '.trash'

# Hashing
CONTENT_HASH_ALGORITHM = 'sha256'

# Regex patterns
SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$')
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')
MULTIPLE_HYPHENS_PATTERN = re.compile(r'-+')

# Character set
RANDOM_CHARS = 'abcdefghijklmnopqrstuvwxyz0123456789'

Key Algorithms

Slug Generation Algorithm

1. Extract first 5 words from content
2. Convert to lowercase
3. Replace spaces with hyphens
4. Remove all characters except a-z, 0-9, hyphens
5. Collapse multiple hyphens to single hyphen
6. Strip leading/trailing hyphens
7. Truncate to 100 characters
8. If empty or too short → timestamp fallback (YYYYMMDD-HHMMSS)
9. Return slug

Atomic File Write Algorithm

1. Create temp file path: file_path.with_suffix('.tmp')
2. Write content to temp file
3. Atomically rename temp to final path
4. On error: delete temp file, re-raise exception

Path Validation Algorithm

1. Resolve both paths to absolute
2. Check if file_path.is_relative_to(data_dir)
3. Return boolean

Test Coverage Requirements

  • Minimum 90% code coverage
  • Test all functions
  • Test edge cases (empty, whitespace, unicode, special chars)
  • Test error cases (invalid input, file errors)
  • Test security (path traversal)

Example Test Structure

class TestSlugGeneration:
    def test_generate_slug_from_content(self): pass
    def test_generate_slug_empty_content(self): pass
    def test_generate_slug_special_characters(self): pass
    def test_make_slug_unique_no_collision(self): pass
    def test_make_slug_unique_with_collision(self): pass
    def test_validate_slug_valid(self): pass
    def test_validate_slug_invalid(self): pass

class TestContentHashing:
    def test_calculate_content_hash_consistency(self): pass
    def test_calculate_content_hash_different(self): pass
    def test_calculate_content_hash_empty(self): pass

class TestFilePathOperations:
    def test_generate_note_path(self): pass
    def test_validate_note_path_safe(self): pass
    def test_validate_note_path_traversal(self): pass

class TestAtomicFileOperations:
    def test_write_and_read_note_file(self): pass
    def test_write_note_file_atomic(self): pass
    def test_delete_note_file_hard(self): pass
    def test_delete_note_file_soft(self): pass

class TestDateTimeFormatting:
    def test_format_rfc822(self): pass
    def test_format_iso8601(self): pass
    def test_parse_iso8601(self): pass

Common Pitfalls to Avoid

  1. Don't use random module → Use secrets for security
  2. Don't forget path validation → Always validate before file operations
  3. Don't use magic numbers → Define as constants
  4. Don't skip temp file cleanup → Use try/finally
  5. Don't use bare except: → Catch specific exceptions
  6. Don't forget type hints → All functions need type hints
  7. Don't skip docstrings → All functions need docstrings with examples
  8. Don't forget edge cases → Test empty, whitespace, unicode, special chars

Security Checklist

  • Path validation prevents directory traversal
  • Use secrets module for random generation
  • Validate all external input
  • Use atomic file writes
  • Handle symlinks correctly (resolve paths)
  • No hardcoded credentials or paths
  • Error messages don't leak sensitive info

Performance Targets

  • Slug generation: < 1ms
  • File write: < 10ms
  • File read: < 5ms
  • Path validation: < 1ms
  • Hash calculation: < 5ms for 10KB content

Module Structure Template

"""
Core utility functions for StarPunk

This module provides essential utilities for slug generation, file operations,
hashing, and date/time handling.
"""

# Standard library
import hashlib
import re
import secrets
from datetime import datetime
from pathlib import Path
from typing import Optional

# Third-party
# (none for utils.py)

# Constants
MAX_SLUG_LENGTH = 100
# ... more constants

# Helper functions
def extract_first_words(text: str, max_words: int = 5) -> str:
    """Extract first N words from text."""
    pass

# ... more helpers

# Slug functions
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
    """Generate URL-safe slug from content."""
    pass

# ... more slug functions

# Content hashing
def calculate_content_hash(content: str) -> str:
    """Calculate SHA-256 hash of content."""
    pass

# Path operations
def generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path:
    """Generate file path for note."""
    pass

# ... more path functions

# File operations
def write_note_file(file_path: Path, content: str) -> None:
    """Write note content to file atomically."""
    pass

# ... more file functions

# Date/time functions
def format_rfc822(dt: datetime) -> str:
    """Format datetime as RFC-822 string."""
    pass

# ... more date/time functions

Verification Checklist

Before marking Phase 1.1 complete:

  • All 16 functions implemented
  • All functions have type hints
  • All functions have docstrings with examples
  • All constants defined
  • Test file created with >90% coverage
  • All tests pass
  • Code formatted with Black
  • Code passes flake8
  • No security issues
  • No hardcoded values
  • Error messages are clear
  • Performance targets met

Next Steps After Implementation

Once starpunk/utils.py is complete:

  1. Move to Phase 1.2: Data Models (starpunk/models.py)
  2. Models will import and use these utilities
  3. Integration tests will verify utilities work with models

References

  • Full design: /home/phil/Projects/starpunk/docs/design/phase-1.1-core-utilities.md
  • ADR-007: Slug generation algorithm
  • Python coding standards
  • Utility function patterns

Quick Command Reference

# Run tests
pytest tests/test_utils.py -v

# Run tests with coverage
pytest tests/test_utils.py --cov=starpunk.utils --cov-report=term-missing

# Format code
black starpunk/utils.py tests/test_utils.py

# Lint code
flake8 starpunk/utils.py tests/test_utils.py

# Type check (optional)
mypy starpunk/utils.py

Estimated Time Breakdown

  • Constants and imports: 10 minutes
  • Helper functions: 20 minutes
  • Slug functions: 30 minutes
  • Content hashing: 10 minutes
  • Path functions: 25 minutes
  • File operations: 35 minutes
  • Date/time functions: 15 minutes
  • Tests: 60-90 minutes
  • Documentation review: 15 minutes

Total: 2-3 hours