# Phase 1.1: Core Utilities Design ## Overview This document provides a complete, implementation-ready design for Phase 1.1 of the StarPunk V1 implementation plan: Core Utilities. The utilities module (`starpunk/utils.py`) provides foundational functions used by all other components of the system. **Priority**: CRITICAL - Required by all other features **Estimated Effort**: 2-3 hours **Dependencies**: None **File**: `starpunk/utils.py` ## Design Principles 1. **Pure functions** - No side effects where possible 2. **Type safety** - Full type hints on all functions 3. **Error handling** - Specific exceptions with clear messages 4. **Security first** - Validate all inputs, prevent path traversal 5. **Testable** - Small, focused functions that are easy to test 6. **Well documented** - Comprehensive docstrings with examples ## Module Structure ```python """ Core utility functions for StarPunk This module provides essential utilities for slug generation, file operations, hashing, and date/time handling. These utilities are used throughout the application and have no external dependencies beyond standard library and Flask configuration. """ # Standard library imports import hashlib import re import secrets from datetime import datetime from pathlib import Path from typing import Optional # Third-party imports from flask import current_app # Constants MAX_SLUG_LENGTH = 100 MIN_SLUG_LENGTH = 1 SLUG_WORDS_COUNT = 5 RANDOM_SUFFIX_LENGTH = 4 CONTENT_HASH_ALGORITHM = "sha256" SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]') MULTIPLE_HYPHENS_PATTERN = re.compile(r'-+') # Utility functions (defined below) ``` ## Function Specifications ### 1. Slug Generation #### `generate_slug(content: str, created_at: Optional[datetime] = None) -> str` **Purpose**: Generate a URL-safe slug from note content or timestamp. **Algorithm**: 1. Extract first N words from content (configurable, default 5) 2. Convert to lowercase 3. Replace spaces with hyphens 4. Remove all characters except a-z, 0-9, and hyphens 5. Remove leading/trailing hyphens 6. Collapse multiple consecutive hyphens to single hyphen 7. Truncate to maximum length 8. If result is empty or too short, fall back to timestamp-based slug 9. Return slug (uniqueness check handled by caller) **Type Signature**: ```python def generate_slug(content: str, created_at: Optional[datetime] = None) -> str: """ Generate URL-safe slug from note content Creates a slug by extracting the first few words from the content and normalizing them to lowercase with hyphens. If content is insufficient, falls back to timestamp-based slug. Args: content: The note content (markdown text) created_at: Optional timestamp for fallback slug (defaults to now) Returns: URL-safe slug string (lowercase, alphanumeric + hyphens only) Raises: ValueError: If content is empty or contains only whitespace Examples: >>> generate_slug("Hello World! This is my first note.") 'hello-world-this-is-my' >>> generate_slug("Testing... with special chars!@#") 'testing-with-special-chars' >>> generate_slug("A") # Too short, uses timestamp '20241118-143022' >>> generate_slug(" ") # Empty content ValueError: Content cannot be empty Notes: - This function does NOT check for uniqueness - Caller must verify slug doesn't exist in database - Use make_slug_unique() to add random suffix if needed """ ``` **Implementation Details**: - Use regex to remove non-alphanumeric characters (except hyphens) - Handle Unicode: normalize to ASCII first, remove non-ASCII characters - Maximum slug length: 100 characters (configurable constant) - Minimum slug length: 1 character (if shorter, use timestamp fallback) - Timestamp format: `YYYYMMDD-HHMMSS` (e.g., `20241118-143022`) - Extract words: split on whitespace, take first N non-empty tokens **Edge Cases**: - Empty content → ValueError - Content with only special characters → timestamp fallback - Content with only 1-2 characters → timestamp fallback - Very long first words → truncate to max length - Unicode/emoji content → strip to ASCII alphanumeric #### `make_slug_unique(base_slug: str, existing_slugs: set[str]) -> str` **Purpose**: Add random suffix to slug if it already exists. **Type Signature**: ```python def make_slug_unique(base_slug: str, existing_slugs: set[str]) -> str: """ Make a slug unique by adding random suffix if needed If the base_slug already exists in the provided set, appends a random alphanumeric suffix until a unique slug is found. Args: base_slug: The base slug to make unique existing_slugs: Set of existing slugs to check against Returns: Unique slug (base_slug or base_slug-{random}) Examples: >>> make_slug_unique("test-note", set()) 'test-note' >>> make_slug_unique("test-note", {"test-note"}) 'test-note-a7c9' # Random suffix >>> make_slug_unique("test-note", {"test-note", "test-note-a7c9"}) 'test-note-x3k2' # Different random suffix Notes: - Random suffix is 4 lowercase alphanumeric characters - Extremely low collision probability (36^4 = 1.6M combinations) - Will retry up to 100 times if collision occurs (should never happen) """ ``` **Implementation Details**: - Check if base_slug exists in set - If not, return base_slug unchanged - If exists, append `-{random}` where random is 4 alphanumeric chars - Use `secrets.token_urlsafe()` for random generation (secure) - Retry if collision occurs (max 100 attempts, raise error if all fail) #### `validate_slug(slug: str) -> bool` **Purpose**: Validate that a slug meets all requirements. **Type Signature**: ```python def validate_slug(slug: str) -> bool: """ Validate that a slug meets all requirements Checks that slug contains only allowed characters and is within length limits. Args: slug: The slug to validate Returns: True if slug is valid, False otherwise Rules: - Must contain only: a-z, 0-9, hyphen (-) - Must be between 1 and 100 characters - Cannot start or end with hyphen - Cannot contain consecutive hyphens Examples: >>> validate_slug("hello-world") True >>> validate_slug("Hello-World") # Uppercase False >>> validate_slug("-hello") # Leading hyphen False >>> validate_slug("hello--world") # Double hyphen False """ ``` **Implementation Details**: - Regex pattern: `^[a-z0-9]+(?:-[a-z0-9]+)*$` - Length check: 1 <= len(slug) <= 100 - No additional normalization (validation only) --- ### 2. Content Hashing #### `calculate_content_hash(content: str) -> str` **Purpose**: Calculate SHA-256 hash of note content for change detection. **Type Signature**: ```python def calculate_content_hash(content: str) -> str: """ Calculate SHA-256 hash of content Generates a cryptographic hash of the content for change detection and cache invalidation. Uses UTF-8 encoding. Args: content: The content to hash (markdown text) Returns: Hexadecimal hash string (64 characters) Examples: >>> calculate_content_hash("Hello World") 'a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e' >>> calculate_content_hash("") 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' Notes: - Same content always produces same hash - Hash is deterministic across systems - Useful for detecting external file modifications - SHA-256 chosen for security and wide support """ ``` **Implementation Details**: - Use `hashlib.sha256()` - Encode content as UTF-8 bytes - Return hexadecimal digest (lowercase) - Handle empty content (valid use case) --- ### 3. File Path Operations #### `generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path` **Purpose**: Generate file path for note using year/month directory structure. **Type Signature**: ```python def generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path: """ Generate file path for a note Creates path following pattern: data/notes/YYYY/MM/slug.md Args: slug: URL-safe slug for the note created_at: Creation timestamp (determines YYYY/MM) data_dir: Base data directory path Returns: Full Path object for the note file Raises: ValueError: If slug is invalid Examples: >>> from datetime import datetime >>> from pathlib import Path >>> dt = datetime(2024, 11, 18, 14, 30) >>> generate_note_path("test-note", dt, Path("data")) PosixPath('data/notes/2024/11/test-note.md') Notes: - Does NOT create directories (use ensure_note_directory) - Does NOT check if file exists - Validates slug before generating path """ ``` **Implementation Details**: - Validate slug using `validate_slug()` - Extract year and month from created_at: `created_at.strftime("%Y")`, `"%m"` - Build path: `data_dir / "notes" / year / month / f"{slug}.md"` - Return Path object (not string) #### `ensure_note_directory(note_path: Path) -> Path` **Purpose**: Create directory structure for note if it doesn't exist. **Type Signature**: ```python def ensure_note_directory(note_path: Path) -> Path: """ Ensure directory exists for note file Creates parent directories if they don't exist. Safe to call even if directories already exist. Args: note_path: Full path to note file Returns: Parent directory path Raises: OSError: If directory cannot be created (permissions, etc.) Examples: >>> note_path = Path("data/notes/2024/11/test-note.md") >>> ensure_note_directory(note_path) PosixPath('data/notes/2024/11') """ ``` **Implementation Details**: - Use `note_path.parent.mkdir(parents=True, exist_ok=True)` - Return parent directory path - Let OSError propagate if creation fails (permissions, disk full, etc.) #### `validate_note_path(file_path: Path, data_dir: Path) -> bool` **Purpose**: Validate that file path is within data directory (prevent path traversal). **Type Signature**: ```python def validate_note_path(file_path: Path, data_dir: Path) -> bool: """ Validate that file path is within data directory Security check to prevent path traversal attacks. Ensures the resolved path is within the allowed data directory. Args: file_path: Path to validate data_dir: Base data directory that must contain file_path Returns: True if path is safe, False otherwise Examples: >>> validate_note_path( ... Path("data/notes/2024/11/note.md"), ... Path("data") ... ) True >>> validate_note_path( ... Path("data/notes/../../etc/passwd"), ... Path("data") ... ) False Security: - Resolves symlinks and relative paths - Checks if resolved path is child of data_dir - Prevents directory traversal attacks """ ``` **Implementation Details**: - Resolve both paths to absolute: `file_path.resolve()`, `data_dir.resolve()` - Check if file_path is relative to data_dir: use `file_path.is_relative_to(data_dir)` - Return boolean (no exceptions) - Critical for security - MUST be called before any file operations --- ### 4. Atomic File Operations #### `write_note_file(file_path: Path, content: str) -> None` **Purpose**: Write note content to file atomically. **Type Signature**: ```python def write_note_file(file_path: Path, content: str) -> None: """ Write note content to file atomically Writes to temporary file first, then atomically renames to final path. This prevents corruption if write is interrupted. Args: file_path: Destination file path content: Content to write (markdown text) Raises: OSError: If file cannot be written ValueError: If file_path is invalid Examples: >>> write_note_file(Path("data/notes/2024/11/test.md"), "# Test") Implementation: 1. Create temp file: {file_path}.tmp 2. Write content to temp file 3. Atomically rename temp to final path 4. If any step fails, clean up temp file Notes: - Atomic rename is guaranteed on POSIX systems - Temp file created in same directory as target - UTF-8 encoding used for all text """ ``` **Implementation Details**: - Create temp path: `temp_path = file_path.with_suffix(file_path.suffix + '.tmp')` - Write to temp file: `temp_path.write_text(content, encoding='utf-8')` - Atomic rename: `temp_path.replace(file_path)` - On exception: delete temp file if exists, re-raise - Use context manager or try/finally for cleanup #### `read_note_file(file_path: Path) -> str` **Purpose**: Read note content from file. **Type Signature**: ```python def read_note_file(file_path: Path) -> str: """ Read note content from file Args: file_path: Path to note file Returns: File content as string Raises: FileNotFoundError: If file doesn't exist OSError: If file cannot be read Examples: >>> content = read_note_file(Path("data/notes/2024/11/test.md")) >>> print(content) # Test Note """ ``` **Implementation Details**: - Use `file_path.read_text(encoding='utf-8')` - Let exceptions propagate (FileNotFoundError, OSError, etc.) - No additional error handling needed #### `delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None` **Purpose**: Delete note file (soft or hard delete). **Type Signature**: ```python def delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None: """ Delete note file from filesystem Supports soft delete (move to trash) or hard delete (permanent removal). Args: file_path: Path to note file soft: If True, move to .trash/ directory; if False, delete permanently data_dir: Required if soft=True, base data directory Raises: FileNotFoundError: If file doesn't exist ValueError: If soft=True but data_dir not provided OSError: If file cannot be deleted or moved Examples: >>> # Hard delete >>> delete_note_file(Path("data/notes/2024/11/test.md")) >>> # Soft delete (move to trash) >>> delete_note_file( ... Path("data/notes/2024/11/test.md"), ... soft=True, ... data_dir=Path("data") ... ) """ ``` **Implementation Details**: - If soft delete: - Require data_dir parameter - Create trash directory: `data_dir / ".trash" / year / month` - Move file to trash with same filename - Use `shutil.move()` or `file_path.replace()` - If hard delete: - Use `file_path.unlink()` - Handle FileNotFoundError gracefully (file already gone is success) --- ### 5. Date/Time Utilities #### `format_rfc822(dt: datetime) -> str` **Purpose**: Format datetime as RFC-822 string for RSS feeds. **Type Signature**: ```python def format_rfc822(dt: datetime) -> str: """ Format datetime as RFC-822 string Converts datetime to RFC-822 format required by RSS 2.0 specification. Assumes UTC timezone. Args: dt: Datetime to format (assumed UTC) Returns: RFC-822 formatted string Examples: >>> from datetime import datetime >>> dt = datetime(2024, 11, 18, 14, 30, 45) >>> format_rfc822(dt) 'Mon, 18 Nov 2024 14:30:45 +0000' References: - RSS 2.0 spec: https://www.rssboard.org/rss-specification - RFC-822 date format """ ``` **Implementation Details**: - Use `dt.strftime()` with format: `"%a, %d %b %Y %H:%M:%S +0000"` - Assume UTC timezone (all timestamps stored as UTC) - Format pattern: "DDD, DD MMM YYYY HH:MM:SS +0000" #### `format_iso8601(dt: datetime) -> str` **Purpose**: Format datetime as ISO 8601 string. **Type Signature**: ```python def format_iso8601(dt: datetime) -> str: """ Format datetime as ISO 8601 string Converts datetime to ISO 8601 format for timestamps and APIs. Args: dt: Datetime to format Returns: ISO 8601 formatted string Examples: >>> from datetime import datetime >>> dt = datetime(2024, 11, 18, 14, 30, 45) >>> format_iso8601(dt) '2024-11-18T14:30:45Z' """ ``` **Implementation Details**: - Use `dt.isoformat()` with 'Z' suffix for UTC - Format: `YYYY-MM-DDTHH:MM:SSZ` - Or use `dt.strftime("%Y-%m-%dT%H:%M:%SZ")` #### `parse_iso8601(date_string: str) -> datetime` **Purpose**: Parse ISO 8601 string to datetime. **Type Signature**: ```python def parse_iso8601(date_string: str) -> datetime: """ Parse ISO 8601 string to datetime Args: date_string: ISO 8601 formatted string Returns: Datetime object (UTC) Raises: ValueError: If string is not valid ISO 8601 format Examples: >>> parse_iso8601("2024-11-18T14:30:45Z") datetime.datetime(2024, 11, 18, 14, 30, 45) """ ``` **Implementation Details**: - Use `datetime.fromisoformat()` (remove 'Z' suffix first if present) - Or use `datetime.strptime()` with format string - Let ValueError propagate if invalid format --- ### 6. Helper Functions #### `extract_first_words(text: str, max_words: int = 5) -> str` **Purpose**: Extract first N words from text (helper for slug generation). **Type Signature**: ```python def extract_first_words(text: str, max_words: int = 5) -> str: """ Extract first N words from text Helper function for slug generation. Splits text on whitespace and returns first N non-empty words. Args: text: Text to extract words from max_words: Maximum number of words to extract Returns: Space-separated string of first N words Examples: >>> extract_first_words("Hello world this is a test", 3) 'Hello world this' >>> extract_first_words(" Multiple spaces ", 2) 'Multiple spaces' """ ``` **Implementation Details**: - Strip whitespace from text - Split on whitespace: `text.split()` - Filter empty strings - Take first max_words: `words[:max_words]` - Join with single space: `" ".join(words)` #### `normalize_slug_text(text: str) -> str` **Purpose**: Normalize text for slug (lowercase, hyphens, alphanumeric only). **Type Signature**: ```python def normalize_slug_text(text: str) -> str: """ Normalize text for use in slug Converts to lowercase, replaces spaces with hyphens, removes special characters, and collapses multiple hyphens. Args: text: Text to normalize Returns: Normalized slug-safe text Examples: >>> normalize_slug_text("Hello World!") 'hello-world' >>> normalize_slug_text("Testing... with -- special chars!") 'testing-with-special-chars' """ ``` **Implementation Details**: - Convert to lowercase: `text.lower()` - Replace spaces with hyphens: `text.replace(' ', '-')` - Remove non-alphanumeric (except hyphens): regex `[^a-z0-9-]` → `''` - Collapse multiple hyphens: regex `-+` → `-` - Strip leading/trailing hyphens: `text.strip('-')` #### `generate_random_suffix(length: int = 4) -> str` **Purpose**: Generate random alphanumeric suffix for slug uniqueness. **Type Signature**: ```python def generate_random_suffix(length: int = 4) -> str: """ Generate random alphanumeric suffix Creates a secure random string for making slugs unique. Uses lowercase letters and numbers only. Args: length: Length of suffix (default 4) Returns: Random alphanumeric string Examples: >>> suffix = generate_random_suffix() >>> len(suffix) 4 >>> suffix.isalnum() True """ ``` **Implementation Details**: - Use `secrets.token_urlsafe()` or `secrets.choice()` - Character set: `a-z0-9` (36 characters) - Generate random string of specified length - Example: `'a7c9'`, `'x3k2'`, `'p9m1'` --- ## Constants ```python # Slug configuration MAX_SLUG_LENGTH = 100 # Maximum slug length in characters MIN_SLUG_LENGTH = 1 # Minimum slug length in characters SLUG_WORDS_COUNT = 5 # Number of words to extract for slug RANDOM_SUFFIX_LENGTH = 4 # Length of random suffix for uniqueness # File operations TEMP_FILE_SUFFIX = '.tmp' # Suffix for temporary files TRASH_DIR_NAME = '.trash' # Directory name for soft-deleted files # Hashing CONTENT_HASH_ALGORITHM = 'sha256' # Hash algorithm for content # Regex patterns SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$') SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]') MULTIPLE_HYPHENS_PATTERN = re.compile(r'-+') WHITESPACE_PATTERN = re.compile(r'\s+') # Character sets RANDOM_CHARS = 'abcdefghijklmnopqrstuvwxyz0123456789' ``` --- ## Error Handling ### Exception Types The module uses standard Python exceptions: - **ValueError**: Invalid input (empty content, invalid slug, etc.) - **FileNotFoundError**: File doesn't exist when expected - **FileExistsError**: File already exists (not used in utils, but caller may use) - **OSError**: General I/O errors (permissions, disk full, etc.) ### Error Messages Error messages should be specific and actionable: **Good**: ```python raise ValueError( f"Invalid slug '{slug}': must contain only lowercase letters, " f"numbers, and hyphens" ) raise ValueError("Content cannot be empty or whitespace-only") raise OSError( f"Failed to write note file: {file_path}. " f"Check permissions and disk space." ) ``` **Bad**: ```python raise ValueError("Invalid slug") raise ValueError("Bad input") raise OSError("Write failed") ``` --- ## Testing Strategy ### Test Coverage Requirements - Minimum 90% code coverage for utils.py - Test all functions with multiple scenarios - Test edge cases and error conditions - Test security (path traversal, etc.) ### Test Organization File: `tests/test_utils.py` ```python """ Tests for utility functions Organized by function category: - Slug generation tests - Content hashing tests - File path tests - Atomic file operations tests - Date/time formatting tests """ import pytest from pathlib import Path from datetime import datetime from starpunk.utils import ( generate_slug, make_slug_unique, validate_slug, calculate_content_hash, generate_note_path, # ... etc ) class TestSlugGeneration: """Test slug generation functions""" def test_generate_slug_from_content(self): """Test basic slug generation from content""" slug = generate_slug("Hello World This Is My Note") assert slug == "hello-world-this-is-my" def test_generate_slug_special_characters(self): """Test slug generation removes special characters""" slug = generate_slug("Testing... with!@# special chars") assert slug == "testing-with-special-chars" def test_generate_slug_empty_content(self): """Test slug generation raises error on empty content""" with pytest.raises(ValueError): generate_slug("") def test_generate_slug_whitespace_only(self): """Test slug generation raises error on whitespace-only content""" with pytest.raises(ValueError): generate_slug(" \n\t ") def test_generate_slug_short_content_fallback(self): """Test slug generation falls back to timestamp for short content""" slug = generate_slug("A", created_at=datetime(2024, 11, 18, 14, 30)) assert slug == "20241118-143000" # Timestamp format def test_generate_slug_unicode_content(self): """Test slug generation handles unicode""" slug = generate_slug("Hello 世界 World") # Should strip non-ASCII and keep ASCII words assert "hello" in slug assert "world" in slug def test_make_slug_unique_no_collision(self): """Test make_slug_unique returns original if no collision""" slug = make_slug_unique("test-note", set()) assert slug == "test-note" def test_make_slug_unique_with_collision(self): """Test make_slug_unique adds suffix on collision""" slug = make_slug_unique("test-note", {"test-note"}) assert slug.startswith("test-note-") assert len(slug) == len("test-note-") + 4 # 4 char suffix def test_validate_slug_valid(self): """Test validate_slug accepts valid slugs""" assert validate_slug("hello-world") is True assert validate_slug("test-note-123") is True assert validate_slug("a") is True def test_validate_slug_invalid(self): """Test validate_slug rejects invalid slugs""" assert validate_slug("Hello-World") is False # Uppercase assert validate_slug("-hello") is False # Leading hyphen assert validate_slug("hello-") is False # Trailing hyphen assert validate_slug("hello--world") is False # Double hyphen assert validate_slug("hello_world") is False # Underscore assert validate_slug("") is False # Empty class TestContentHashing: """Test content hashing functions""" def test_calculate_content_hash_consistency(self): """Test hash is consistent for same content""" hash1 = calculate_content_hash("Test content") hash2 = calculate_content_hash("Test content") assert hash1 == hash2 def test_calculate_content_hash_different(self): """Test different content produces different hash""" hash1 = calculate_content_hash("Test content 1") hash2 = calculate_content_hash("Test content 2") assert hash1 != hash2 def test_calculate_content_hash_empty(self): """Test hash of empty string""" hash_empty = calculate_content_hash("") assert len(hash_empty) == 64 # SHA-256 produces 64 hex chars def test_calculate_content_hash_unicode(self): """Test hash handles unicode correctly""" hash_val = calculate_content_hash("Hello 世界") assert len(hash_val) == 64 class TestFilePathOperations: """Test file path generation and validation""" def test_generate_note_path(self): """Test note path generation""" dt = datetime(2024, 11, 18, 14, 30) path = generate_note_path("test-note", dt, Path("data")) assert path == Path("data/notes/2024/11/test-note.md") def test_generate_note_path_invalid_slug(self): """Test note path generation rejects invalid slug""" dt = datetime(2024, 11, 18) with pytest.raises(ValueError): generate_note_path("Invalid Slug!", dt, Path("data")) def test_validate_note_path_safe(self): """Test path validation accepts safe paths""" assert validate_note_path( Path("data/notes/2024/11/note.md"), Path("data") ) is True def test_validate_note_path_traversal(self): """Test path validation rejects traversal attempts""" assert validate_note_path( Path("data/notes/../../etc/passwd"), Path("data") ) is False def test_validate_note_path_absolute_outside(self): """Test path validation rejects paths outside data dir""" assert validate_note_path( Path("/etc/passwd"), Path("data") ) is False class TestAtomicFileOperations: """Test atomic file write/read/delete operations""" def test_write_and_read_note_file(self, tmp_path): """Test writing and reading note file""" file_path = tmp_path / "test.md" content = "# Test Note\n\nThis is a test." write_note_file(file_path, content) assert file_path.exists() read_content = read_note_file(file_path) assert read_content == content def test_write_note_file_atomic(self, tmp_path): """Test write is atomic (temp file used)""" file_path = tmp_path / "test.md" temp_path = Path(str(file_path) + '.tmp') # Temp file should not exist after write write_note_file(file_path, "Test") assert not temp_path.exists() assert file_path.exists() def test_read_note_file_not_found(self, tmp_path): """Test reading non-existent file raises error""" file_path = tmp_path / "nonexistent.md" with pytest.raises(FileNotFoundError): read_note_file(file_path) def test_delete_note_file_hard(self, tmp_path): """Test hard delete removes file""" file_path = tmp_path / "test.md" file_path.write_text("Test") delete_note_file(file_path, soft=False) assert not file_path.exists() def test_delete_note_file_soft(self, tmp_path): """Test soft delete moves file to trash""" # Create note file notes_dir = tmp_path / "notes" / "2024" / "11" notes_dir.mkdir(parents=True) file_path = notes_dir / "test.md" file_path.write_text("Test") # Soft delete delete_note_file(file_path, soft=True, data_dir=tmp_path) # Original should be gone assert not file_path.exists() # Should be in trash trash_path = tmp_path / ".trash" / "2024" / "11" / "test.md" assert trash_path.exists() class TestDateTimeFormatting: """Test date/time formatting functions""" def test_format_rfc822(self): """Test RFC-822 date formatting""" dt = datetime(2024, 11, 18, 14, 30, 45) formatted = format_rfc822(dt) assert formatted == "Mon, 18 Nov 2024 14:30:45 +0000" def test_format_iso8601(self): """Test ISO 8601 date formatting""" dt = datetime(2024, 11, 18, 14, 30, 45) formatted = format_iso8601(dt) assert formatted == "2024-11-18T14:30:45Z" def test_parse_iso8601(self): """Test ISO 8601 date parsing""" dt = parse_iso8601("2024-11-18T14:30:45Z") assert dt.year == 2024 assert dt.month == 11 assert dt.day == 18 assert dt.hour == 14 assert dt.minute == 30 assert dt.second == 45 def test_parse_iso8601_invalid(self): """Test ISO 8601 parsing rejects invalid format""" with pytest.raises(ValueError): parse_iso8601("not-a-date") ``` ### Additional Test Cases **Security Tests**: - Path traversal attempts with various patterns - Symlink handling - Unicode/emoji in slugs - Very long input strings - Malicious input patterns **Edge Cases**: - Empty content - Whitespace-only content - Very long content (>1MB) - Unicode normalization issues - Concurrent file operations (if applicable) **Performance Tests**: - Slug generation with 10,000 character input - Hash calculation with large content - Path validation with deep directory structures --- ## Usage Examples ### Creating a Note ```python from datetime import datetime from pathlib import Path from starpunk.utils import ( generate_slug, make_slug_unique, generate_note_path, ensure_note_directory, write_note_file, calculate_content_hash ) # Note content content = "This is my first note about Python programming" created_at = datetime.utcnow() data_dir = Path("data") # Generate slug base_slug = generate_slug(content, created_at) # "this-is-my-first-note" # Check uniqueness (pseudo-code, actual check via database) existing_slugs = get_existing_slugs_from_db() # Returns set of slugs slug = make_slug_unique(base_slug, existing_slugs) # Generate file path note_path = generate_note_path(slug, created_at, data_dir) # Result: data/notes/2024/11/this-is-my-first-note.md # Create directory ensure_note_directory(note_path) # Write file write_note_file(note_path, content) # Calculate hash for database content_hash = calculate_content_hash(content) # Save metadata to database (pseudo-code) save_note_metadata( slug=slug, file_path=str(note_path.relative_to(data_dir)), created_at=created_at, content_hash=content_hash ) ``` ### Reading a Note ```python from pathlib import Path from starpunk.utils import read_note_file, validate_note_path # Get file path from database (pseudo-code) note = get_note_from_db(slug="my-first-note") file_path = Path("data") / note.file_path # Validate path (security check) if not validate_note_path(file_path, Path("data")): raise ValueError("Invalid file path") # Read content content = read_note_file(file_path) ``` ### Deleting a Note ```python from pathlib import Path from starpunk.utils import delete_note_file # Get note path from database note = get_note_from_db(slug="old-note") file_path = Path("data") / note.file_path # Soft delete (move to trash) delete_note_file(file_path, soft=True, data_dir=Path("data")) # Update database to mark as deleted (pseudo-code) mark_note_deleted(note.id) ``` --- ## Integration with Other Modules ### Database Integration The database module will use utilities like this: ```python # In starpunk/notes.py from starpunk.utils import ( generate_slug, make_slug_unique, generate_note_path, ensure_note_directory, write_note_file, calculate_content_hash ) from starpunk.database import get_db def create_note(content: str, published: bool = False) -> Note: """Create a new note""" db = get_db() created_at = datetime.utcnow() # Generate unique slug base_slug = generate_slug(content, created_at) existing_slugs = set(row[0] for row in db.execute("SELECT slug FROM notes").fetchall()) slug = make_slug_unique(base_slug, existing_slugs) # Generate path and write file note_path = generate_note_path(slug, created_at, Path(current_app.config['DATA_PATH'])) ensure_note_directory(note_path) write_note_file(note_path, content) # Calculate hash content_hash = calculate_content_hash(content) # Insert into database db.execute( "INSERT INTO notes (slug, file_path, published, created_at, updated_at, content_hash) " "VALUES (?, ?, ?, ?, ?, ?)", (slug, str(note_path.relative_to(Path(current_app.config['DATA_PATH']))), published, created_at, created_at, content_hash) ) db.commit() return Note(slug=slug, content=content, created_at=created_at, ...) ``` ### RSS Feed Integration The feed module will use date formatting: ```python # In starpunk/feed.py from starpunk.utils import format_rfc822 def generate_rss_feed(): """Generate RSS feed""" fg = FeedGenerator() for note in get_published_notes(): fe = fg.add_entry() fe.pubDate(format_rfc822(note.created_at)) ``` --- ## Performance Considerations ### Slug Generation Performance - Expected input: 50-1000 characters - Target: <1ms per slug generation - Bottleneck: None (string operations are fast) ### File Operations Performance - Write operation: <10ms for typical note (1-10KB) - Read operation: <5ms for typical note - Atomic rename: ~1ms (OS-level operation) ### Path Validation Performance - Target: <1ms per validation - Use cached absolute paths where possible --- ## Security Considerations ### Path Traversal Prevention CRITICAL: All file paths MUST be validated before use. ```python # Always validate before file operations if not validate_note_path(file_path, data_dir): raise ValueError("Invalid file path: potential directory traversal") ``` ### Slug Validation Prevent injection attacks through slugs: ```python # Validate slug before using in paths if not validate_slug(slug): raise ValueError(f"Invalid slug: {slug}") ``` ### Random Suffix Security Use cryptographically secure random: ```python # Use secrets module, not random module import secrets suffix = secrets.token_urlsafe(4)[:4].lower() ``` --- ## Dependencies ### Standard Library - `hashlib` - SHA-256 hashing - `re` - Regular expressions - `secrets` - Secure random generation - `pathlib` - Path operations - `datetime` - Date/time handling - `shutil` - File operations (move) ### Third-Party - `flask` - For `current_app` context (configuration access) ### Internal - None (this is the foundation module) --- ## Future Enhancements (V2+) ### Slug Customization Allow user to specify custom slug: ```python def generate_slug(content: str, custom_slug: Optional[str] = None, ...) -> str: if custom_slug: if validate_slug(custom_slug): return custom_slug raise ValueError("Invalid custom slug") # ... existing logic ``` ### Unicode Slug Support Support non-ASCII characters in slugs: ```python def generate_unicode_slug(content: str) -> str: """Generate slug with unicode support""" # Use unicode normalization # Support CJK characters # Support diacritics ``` ### Content Preview Extraction Extract preview/excerpt from content: ```python def extract_preview(content: str, max_length: int = 200) -> str: """Extract preview text from content""" # Strip markdown formatting # Truncate to max_length # Add ellipsis if truncated ``` ### Title Extraction Extract title from first line: ```python def extract_title(content: str) -> Optional[str]: """Extract title from content (first line or # heading)""" # Check for # heading # Or use first line # Return None if empty ``` --- ## Acceptance Criteria - [ ] All functions implemented with type hints - [ ] All functions have comprehensive docstrings - [ ] Slug generation creates valid, URL-safe slugs - [ ] Slug uniqueness enforcement works correctly - [ ] Content hashing is consistent and correct - [ ] Path validation prevents directory traversal - [ ] Atomic file writes prevent corruption - [ ] Date formatting matches RFC-822 and ISO 8601 specs - [ ] All security checks are in place - [ ] Test coverage >90% - [ ] All tests pass - [ ] Code formatted with Black - [ ] Code passes flake8 linting - [ ] No hardcoded values (use constants) - [ ] Error messages are clear and actionable --- ## References - [ADR-004: File-Based Note Storage](/home/phil/Projects/starpunk/docs/decisions/ADR-004-file-based-note-storage.md) - [Python Coding Standards](/home/phil/Projects/starpunk/docs/standards/python-coding-standards.md) - [Project Structure](/home/phil/Projects/starpunk/docs/design/project-structure.md) - [CommonMark Specification](https://spec.commonmark.org/) - [RSS 2.0 Specification](https://www.rssboard.org/rss-specification) - [ISO 8601 Date Format](https://en.wikipedia.org/wiki/ISO_8601) - [RFC-822 Date Format](https://www.rfc-editor.org/rfc/rfc822) - [Python pathlib Documentation](https://docs.python.org/3/library/pathlib.html) - [Python secrets Documentation](https://docs.python.org/3/library/secrets.html) --- ## Implementation Checklist When implementing `starpunk/utils.py`, complete in this order: 1. [ ] Create file with module docstring 2. [ ] Add imports and constants 3. [ ] Implement helper functions (extract_first_words, normalize_slug_text, generate_random_suffix) 4. [ ] Implement slug functions (generate_slug, make_slug_unique, validate_slug) 5. [ ] Implement content hashing (calculate_content_hash) 6. [ ] Implement path functions (generate_note_path, ensure_note_directory, validate_note_path) 7. [ ] Implement file operations (write_note_file, read_note_file, delete_note_file) 8. [ ] Implement date/time functions (format_rfc822, format_iso8601, parse_iso8601) 9. [ ] Create tests/test_utils.py 10. [ ] Write tests for all functions 11. [ ] Run tests and achieve >90% coverage 12. [ ] Format with Black 13. [ ] Lint with flake8 14. [ ] Review all docstrings 15. [ ] Review all error messages **Estimated Time**: 2-3 hours for implementation + tests