StarPunk/docs/design/phase-1.1-core-utilities.md

# Phase 1.1: Core Utilities Design

## Overview

This document provides a complete, implementation-ready design for Phase 1.1 of the StarPunk V1 implementation plan: Core Utilities. The utilities module (`starpunk/utils.py`) provides foundational functions used by all other components of the system.

**Priority**: CRITICAL - Required by all other features
**Estimated Effort**: 2-3 hours
**Dependencies**: None
**File**: `starpunk/utils.py`

## Design Principles

1. **Pure functions** - No side effects where possible
2. **Type safety** - Full type hints on all functions
3. **Error handling** - Specific exceptions with clear messages
4. **Security first** - Validate all inputs, prevent path traversal
5. **Testable** - Small, focused functions that are easy to test
6. **Well documented** - Comprehensive docstrings with examples

## Module Structure

```python
"""
Core utility functions for StarPunk

This module provides essential utilities for slug generation, file operations,
hashing, and date/time handling. These utilities are used throughout the
application and have no external dependencies beyond standard library and
Flask configuration.
"""

# Standard library imports
import hashlib
import re
import secrets
from datetime import datetime
from pathlib import Path
from typing import Optional

# Third-party imports
from flask import current_app

# Constants
MAX_SLUG_LENGTH = 100
MIN_SLUG_LENGTH = 1
SLUG_WORDS_COUNT = 5
RANDOM_SUFFIX_LENGTH = 4
CONTENT_HASH_ALGORITHM = "sha256"
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')
MULTIPLE_HYPHENS_PATTERN = re.compile(r'-+')

# Utility functions (defined below)
```

## Function Specifications

### 1. Slug Generation

#### `generate_slug(content: str, created_at: Optional[datetime] = None) -> str`

**Purpose**: Generate a URL-safe slug from note content or timestamp.

**Algorithm**:
1. Extract first N words from content (configurable, default 5)
2. Convert to lowercase
3. Replace spaces with hyphens
4. Remove all characters except a-z, 0-9, and hyphens
5. Remove leading/trailing hyphens
6. Collapse multiple consecutive hyphens to single hyphen
7. Truncate to maximum length
8. If result is empty or too short, fall back to timestamp-based slug
9. Return slug (uniqueness check handled by caller)

**Type Signature**:
```python
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
    """
    Generate URL-safe slug from note content

    Creates a slug by extracting the first few words from the content and
    normalizing them to lowercase with hyphens. If content is insufficient,
    falls back to timestamp-based slug.

    Args:
        content: The note content (markdown text)
        created_at: Optional timestamp for fallback slug (defaults to now)

    Returns:
        URL-safe slug string (lowercase, alphanumeric + hyphens only)

    Raises:
        ValueError: If content is empty or contains only whitespace

    Examples:
        >>> generate_slug("Hello World! This is my first note.")
        'hello-world-this-is-my'

        >>> generate_slug("Testing... with special chars!@#")
        'testing-with-special-chars'

        >>> generate_slug("A")  # Too short, uses timestamp
        '20241118-143022'

        >>> generate_slug("   ")  # Empty content
        ValueError: Content cannot be empty

    Notes:
        - This function does NOT check for uniqueness
        - Caller must verify slug doesn't exist in database
        - Use make_slug_unique() to add random suffix if needed
    """
```

**Implementation Details**:
- Use regex to remove non-alphanumeric characters (except hyphens)
- Handle Unicode: normalize to ASCII first, remove non-ASCII characters
- Maximum slug length: 100 characters (configurable constant)
- Minimum slug length: 1 character (if shorter, use timestamp fallback)
- Timestamp format: `YYYYMMDD-HHMMSS` (e.g., `20241118-143022`)
- Extract words: split on whitespace, take first N non-empty tokens

**Edge Cases**:
- Empty content → ValueError
- Content with only special characters → timestamp fallback
- Content with only 1-2 characters → timestamp fallback
- Very long first words → truncate to max length
- Unicode/emoji content → strip to ASCII alphanumeric

#### `make_slug_unique(base_slug: str, existing_slugs: set[str]) -> str`

**Purpose**: Add random suffix to slug if it already exists.

**Type Signature**:
```python
def make_slug_unique(base_slug: str, existing_slugs: set[str]) -> str:
    """
    Make a slug unique by adding random suffix if needed

    If the base_slug already exists in the provided set, appends a random
    alphanumeric suffix until a unique slug is found.

    Args:
        base_slug: The base slug to make unique
        existing_slugs: Set of existing slugs to check against

    Returns:
        Unique slug (base_slug or base_slug-{random})

    Examples:
        >>> make_slug_unique("test-note", set())
        'test-note'

        >>> make_slug_unique("test-note", {"test-note"})
        'test-note-a7c9'  # Random suffix

        >>> make_slug_unique("test-note", {"test-note", "test-note-a7c9"})
        'test-note-x3k2'  # Different random suffix

    Notes:
        - Random suffix is 4 lowercase alphanumeric characters
        - Extremely low collision probability (36^4 = 1.6M combinations)
        - Will retry up to 100 times if collision occurs (should never happen)
    """
```

**Implementation Details**:
- Check if base_slug exists in set
- If not, return base_slug unchanged
- If exists, append `-{random}` where random is 4 alphanumeric chars
- Use `secrets.token_urlsafe()` for random generation (secure)
- Retry if collision occurs (max 100 attempts, raise error if all fail)

#### `validate_slug(slug: str) -> bool`

**Purpose**: Validate that a slug meets all requirements.

**Type Signature**:
```python
def validate_slug(slug: str) -> bool:
    """
    Validate that a slug meets all requirements

    Checks that slug contains only allowed characters and is within
    length limits.

    Args:
        slug: The slug to validate

    Returns:
        True if slug is valid, False otherwise

    Rules:
        - Must contain only: a-z, 0-9, hyphen (-)
        - Must be between 1 and 100 characters
        - Cannot start or end with hyphen
        - Cannot contain consecutive hyphens

    Examples:
        >>> validate_slug("hello-world")
        True

        >>> validate_slug("Hello-World")  # Uppercase
        False

        >>> validate_slug("-hello")  # Leading hyphen
        False

        >>> validate_slug("hello--world")  # Double hyphen
        False
    """
```

**Implementation Details**:
- Regex pattern: `^[a-z0-9]+(?:-[a-z0-9]+)*$`
- Length check: 1 <= len(slug) <= 100
- No additional normalization (validation only)

---

### 2. Content Hashing

#### `calculate_content_hash(content: str) -> str`

**Purpose**: Calculate SHA-256 hash of note content for change detection.

**Type Signature**:
```python
def calculate_content_hash(content: str) -> str:
    """
    Calculate SHA-256 hash of content

    Generates a cryptographic hash of the content for change detection
    and cache invalidation. Uses UTF-8 encoding.

    Args:
        content: The content to hash (markdown text)

    Returns:
        Hexadecimal hash string (64 characters)

    Examples:
        >>> calculate_content_hash("Hello World")
        'a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e'

        >>> calculate_content_hash("")
        'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'

    Notes:
        - Same content always produces same hash
        - Hash is deterministic across systems
        - Useful for detecting external file modifications
        - SHA-256 chosen for security and wide support
    """
```

**Implementation Details**:
- Use `hashlib.sha256()`
- Encode content as UTF-8 bytes
- Return hexadecimal digest (lowercase)
- Handle empty content (valid use case)

---

### 3. File Path Operations

#### `generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path`

**Purpose**: Generate file path for note using year/month directory structure.

**Type Signature**:
```python
def generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path:
    """
    Generate file path for a note

    Creates path following pattern: data/notes/YYYY/MM/slug.md

    Args:
        slug: URL-safe slug for the note
        created_at: Creation timestamp (determines YYYY/MM)
        data_dir: Base data directory path

    Returns:
        Full Path object for the note file

    Raises:
        ValueError: If slug is invalid

    Examples:
        >>> from datetime import datetime
        >>> from pathlib import Path
        >>> dt = datetime(2024, 11, 18, 14, 30)
        >>> generate_note_path("test-note", dt, Path("data"))
        PosixPath('data/notes/2024/11/test-note.md')

    Notes:
        - Does NOT create directories (use ensure_note_directory)
        - Does NOT check if file exists
        - Validates slug before generating path
    """
```

**Implementation Details**:
- Validate slug using `validate_slug()`
- Extract year and month from created_at: `created_at.strftime("%Y")`, `"%m"`
- Build path: `data_dir / "notes" / year / month / f"{slug}.md"`
- Return Path object (not string)

#### `ensure_note_directory(note_path: Path) -> Path`

**Purpose**: Create directory structure for note if it doesn't exist.

**Type Signature**:
```python
def ensure_note_directory(note_path: Path) -> Path:
    """
    Ensure directory exists for note file

    Creates parent directories if they don't exist. Safe to call
    even if directories already exist.

    Args:
        note_path: Full path to note file

    Returns:
        Parent directory path

    Raises:
        OSError: If directory cannot be created (permissions, etc.)

    Examples:
        >>> note_path = Path("data/notes/2024/11/test-note.md")
        >>> ensure_note_directory(note_path)
        PosixPath('data/notes/2024/11')
    """
```

**Implementation Details**:
- Use `note_path.parent.mkdir(parents=True, exist_ok=True)`
- Return parent directory path
- Let OSError propagate if creation fails (permissions, disk full, etc.)

#### `validate_note_path(file_path: Path, data_dir: Path) -> bool`

**Purpose**: Validate that file path is within data directory (prevent path traversal).

**Type Signature**:
```python
def validate_note_path(file_path: Path, data_dir: Path) -> bool:
    """
    Validate that file path is within data directory

    Security check to prevent path traversal attacks. Ensures the
    resolved path is within the allowed data directory.

    Args:
        file_path: Path to validate
        data_dir: Base data directory that must contain file_path

    Returns:
        True if path is safe, False otherwise

    Examples:
        >>> validate_note_path(
        ...     Path("data/notes/2024/11/note.md"),
        ...     Path("data")
        ... )
        True

        >>> validate_note_path(
        ...     Path("data/notes/../../etc/passwd"),
        ...     Path("data")
        ... )
        False

    Security:
        - Resolves symlinks and relative paths
        - Checks if resolved path is child of data_dir
        - Prevents directory traversal attacks
    """
```

**Implementation Details**:
- Resolve both paths to absolute: `file_path.resolve()`, `data_dir.resolve()`
- Check if file_path is relative to data_dir: use `file_path.is_relative_to(data_dir)`
- Return boolean (no exceptions)
- Critical for security - MUST be called before any file operations

---

### 4. Atomic File Operations

#### `write_note_file(file_path: Path, content: str) -> None`

**Purpose**: Write note content to file atomically.

**Type Signature**:
```python
def write_note_file(file_path: Path, content: str) -> None:
    """
    Write note content to file atomically

    Writes to temporary file first, then atomically renames to final path.
    This prevents corruption if write is interrupted.

    Args:
        file_path: Destination file path
        content: Content to write (markdown text)

    Raises:
        OSError: If file cannot be written
        ValueError: If file_path is invalid

    Examples:
        >>> write_note_file(Path("data/notes/2024/11/test.md"), "# Test")

    Implementation:
        1. Create temp file: {file_path}.tmp
        2. Write content to temp file
        3. Atomically rename temp to final path
        4. If any step fails, clean up temp file

    Notes:
        - Atomic rename is guaranteed on POSIX systems
        - Temp file created in same directory as target
        - UTF-8 encoding used for all text
    """
```

**Implementation Details**:
- Create temp path: `temp_path = file_path.with_suffix(file_path.suffix + '.tmp')`
- Write to temp file: `temp_path.write_text(content, encoding='utf-8')`
- Atomic rename: `temp_path.replace(file_path)`
- On exception: delete temp file if exists, re-raise
- Use context manager or try/finally for cleanup

#### `read_note_file(file_path: Path) -> str`

**Purpose**: Read note content from file.

**Type Signature**:
```python
def read_note_file(file_path: Path) -> str:
    """
    Read note content from file

    Args:
        file_path: Path to note file

    Returns:
        File content as string

    Raises:
        FileNotFoundError: If file doesn't exist
        OSError: If file cannot be read

    Examples:
        >>> content = read_note_file(Path("data/notes/2024/11/test.md"))
        >>> print(content)
        # Test Note
    """
```

**Implementation Details**:
- Use `file_path.read_text(encoding='utf-8')`
- Let exceptions propagate (FileNotFoundError, OSError, etc.)
- No additional error handling needed

#### `delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None`

**Purpose**: Delete note file (soft or hard delete).

**Type Signature**:
```python
def delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None:
    """
    Delete note file from filesystem

    Supports soft delete (move to trash) or hard delete (permanent removal).

    Args:
        file_path: Path to note file
        soft: If True, move to .trash/ directory; if False, delete permanently
        data_dir: Required if soft=True, base data directory

    Raises:
        FileNotFoundError: If file doesn't exist
        ValueError: If soft=True but data_dir not provided
        OSError: If file cannot be deleted or moved

    Examples:
        >>> # Hard delete
        >>> delete_note_file(Path("data/notes/2024/11/test.md"))

        >>> # Soft delete (move to trash)
        >>> delete_note_file(
        ...     Path("data/notes/2024/11/test.md"),
        ...     soft=True,
        ...     data_dir=Path("data")
        ... )
    """
```

**Implementation Details**:
- If soft delete:
  - Require data_dir parameter
  - Create trash directory: `data_dir / ".trash" / year / month`
  - Move file to trash with same filename
  - Use `shutil.move()` or `file_path.replace()`
- If hard delete:
  - Use `file_path.unlink()`
- Handle FileNotFoundError gracefully (file already gone is success)

---

### 5. Date/Time Utilities

#### `format_rfc822(dt: datetime) -> str`

**Purpose**: Format datetime as RFC-822 string for RSS feeds.

**Type Signature**:
```python
def format_rfc822(dt: datetime) -> str:
    """
    Format datetime as RFC-822 string

    Converts datetime to RFC-822 format required by RSS 2.0 specification.
    Assumes UTC timezone.

    Args:
        dt: Datetime to format (assumed UTC)

    Returns:
        RFC-822 formatted string

    Examples:
        >>> from datetime import datetime
        >>> dt = datetime(2024, 11, 18, 14, 30, 45)
        >>> format_rfc822(dt)
        'Mon, 18 Nov 2024 14:30:45 +0000'

    References:
        - RSS 2.0 spec: https://www.rssboard.org/rss-specification
        - RFC-822 date format
    """
```

**Implementation Details**:
- Use `dt.strftime()` with format: `"%a, %d %b %Y %H:%M:%S +0000"`
- Assume UTC timezone (all timestamps stored as UTC)
- Format pattern: "DDD, DD MMM YYYY HH:MM:SS +0000"

#### `format_iso8601(dt: datetime) -> str`

**Purpose**: Format datetime as ISO 8601 string.

**Type Signature**:
```python
def format_iso8601(dt: datetime) -> str:
    """
    Format datetime as ISO 8601 string

    Converts datetime to ISO 8601 format for timestamps and APIs.

    Args:
        dt: Datetime to format

    Returns:
        ISO 8601 formatted string

    Examples:
        >>> from datetime import datetime
        >>> dt = datetime(2024, 11, 18, 14, 30, 45)
        >>> format_iso8601(dt)
        '2024-11-18T14:30:45Z'
    """
```

**Implementation Details**:
- Use `dt.isoformat()` with 'Z' suffix for UTC
- Format: `YYYY-MM-DDTHH:MM:SSZ`
- Or use `dt.strftime("%Y-%m-%dT%H:%M:%SZ")`

#### `parse_iso8601(date_string: str) -> datetime`

**Purpose**: Parse ISO 8601 string to datetime.

**Type Signature**:
```python
def parse_iso8601(date_string: str) -> datetime:
    """
    Parse ISO 8601 string to datetime

    Args:
        date_string: ISO 8601 formatted string

    Returns:
        Datetime object (UTC)

    Raises:
        ValueError: If string is not valid ISO 8601 format

    Examples:
        >>> parse_iso8601("2024-11-18T14:30:45Z")
        datetime.datetime(2024, 11, 18, 14, 30, 45)
    """
```

**Implementation Details**:
- Use `datetime.fromisoformat()` (remove 'Z' suffix first if present)
- Or use `datetime.strptime()` with format string
- Let ValueError propagate if invalid format

---

### 6. Helper Functions

#### `extract_first_words(text: str, max_words: int = 5) -> str`

**Purpose**: Extract first N words from text (helper for slug generation).

**Type Signature**:
```python
def extract_first_words(text: str, max_words: int = 5) -> str:
    """
    Extract first N words from text

    Helper function for slug generation. Splits text on whitespace
    and returns first N non-empty words.

    Args:
        text: Text to extract words from
        max_words: Maximum number of words to extract

    Returns:
        Space-separated string of first N words

    Examples:
        >>> extract_first_words("Hello world this is a test", 3)
        'Hello world this'

        >>> extract_first_words("  Multiple   spaces  ", 2)
        'Multiple spaces'
    """
```

**Implementation Details**:
- Strip whitespace from text
- Split on whitespace: `text.split()`
- Filter empty strings
- Take first max_words: `words[:max_words]`
- Join with single space: `" ".join(words)`

#### `normalize_slug_text(text: str) -> str`

**Purpose**: Normalize text for slug (lowercase, hyphens, alphanumeric only).

**Type Signature**:
```python
def normalize_slug_text(text: str) -> str:
    """
    Normalize text for use in slug

    Converts to lowercase, replaces spaces with hyphens, removes
    special characters, and collapses multiple hyphens.

    Args:
        text: Text to normalize

    Returns:
        Normalized slug-safe text

    Examples:
        >>> normalize_slug_text("Hello World!")
        'hello-world'

        >>> normalize_slug_text("Testing... with -- special chars!")
        'testing-with-special-chars'
    """
```

**Implementation Details**:
- Convert to lowercase: `text.lower()`
- Replace spaces with hyphens: `text.replace(' ', '-')`
- Remove non-alphanumeric (except hyphens): regex `[^a-z0-9-]` → `''`
- Collapse multiple hyphens: regex `-+` → `-`
- Strip leading/trailing hyphens: `text.strip('-')`

#### `generate_random_suffix(length: int = 4) -> str`

**Purpose**: Generate random alphanumeric suffix for slug uniqueness.

**Type Signature**:
```python
def generate_random_suffix(length: int = 4) -> str:
    """
    Generate random alphanumeric suffix

    Creates a secure random string for making slugs unique.
    Uses lowercase letters and numbers only.

    Args:
        length: Length of suffix (default 4)

    Returns:
        Random alphanumeric string

    Examples:
        >>> suffix = generate_random_suffix()
        >>> len(suffix)
        4
        >>> suffix.isalnum()
        True
    """
```

**Implementation Details**:
- Use `secrets.token_urlsafe()` or `secrets.choice()`
- Character set: `a-z0-9` (36 characters)
- Generate random string of specified length
- Example: `'a7c9'`, `'x3k2'`, `'p9m1'`

---

## Constants

```python
# Slug configuration
MAX_SLUG_LENGTH = 100           # Maximum slug length in characters
MIN_SLUG_LENGTH = 1             # Minimum slug length in characters
SLUG_WORDS_COUNT = 5            # Number of words to extract for slug
RANDOM_SUFFIX_LENGTH = 4        # Length of random suffix for uniqueness

# File operations
TEMP_FILE_SUFFIX = '.tmp'       # Suffix for temporary files
TRASH_DIR_NAME = '.trash'       # Directory name for soft-deleted files

# Hashing
CONTENT_HASH_ALGORITHM = 'sha256'  # Hash algorithm for content

# Regex patterns
SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$')
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')
MULTIPLE_HYPHENS_PATTERN = re.compile(r'-+')
WHITESPACE_PATTERN = re.compile(r'\s+')

# Character sets
RANDOM_CHARS = 'abcdefghijklmnopqrstuvwxyz0123456789'
```

---

## Error Handling

### Exception Types

The module uses standard Python exceptions:

- **ValueError**: Invalid input (empty content, invalid slug, etc.)
- **FileNotFoundError**: File doesn't exist when expected
- **FileExistsError**: File already exists (not used in utils, but caller may use)
- **OSError**: General I/O errors (permissions, disk full, etc.)

### Error Messages

Error messages should be specific and actionable:

**Good**:
```python
raise ValueError(
    f"Invalid slug '{slug}': must contain only lowercase letters, "
    f"numbers, and hyphens"
)

raise ValueError("Content cannot be empty or whitespace-only")

raise OSError(
    f"Failed to write note file: {file_path}. "
    f"Check permissions and disk space."
)
```

**Bad**:
```python
raise ValueError("Invalid slug")
raise ValueError("Bad input")
raise OSError("Write failed")
```

---

## Testing Strategy

### Test Coverage Requirements

- Minimum 90% code coverage for utils.py
- Test all functions with multiple scenarios
- Test edge cases and error conditions
- Test security (path traversal, etc.)

### Test Organization

File: `tests/test_utils.py`

```python
"""
Tests for utility functions

Organized by function category:
- Slug generation tests
- Content hashing tests
- File path tests
- Atomic file operations tests
- Date/time formatting tests
"""

import pytest
from pathlib import Path
from datetime import datetime
from starpunk.utils import (
    generate_slug,
    make_slug_unique,
    validate_slug,
    calculate_content_hash,
    generate_note_path,
    # ... etc
)

class TestSlugGeneration:
    """Test slug generation functions"""

    def test_generate_slug_from_content(self):
        """Test basic slug generation from content"""
        slug = generate_slug("Hello World This Is My Note")
        assert slug == "hello-world-this-is-my"

    def test_generate_slug_special_characters(self):
        """Test slug generation removes special characters"""
        slug = generate_slug("Testing... with!@# special chars")
        assert slug == "testing-with-special-chars"

    def test_generate_slug_empty_content(self):
        """Test slug generation raises error on empty content"""
        with pytest.raises(ValueError):
            generate_slug("")

    def test_generate_slug_whitespace_only(self):
        """Test slug generation raises error on whitespace-only content"""
        with pytest.raises(ValueError):
            generate_slug("   \n\t   ")

    def test_generate_slug_short_content_fallback(self):
        """Test slug generation falls back to timestamp for short content"""
        slug = generate_slug("A", created_at=datetime(2024, 11, 18, 14, 30))
        assert slug == "20241118-143000"  # Timestamp format

    def test_generate_slug_unicode_content(self):
        """Test slug generation handles unicode"""
        slug = generate_slug("Hello 世界 World")
        # Should strip non-ASCII and keep ASCII words
        assert "hello" in slug
        assert "world" in slug

    def test_make_slug_unique_no_collision(self):
        """Test make_slug_unique returns original if no collision"""
        slug = make_slug_unique("test-note", set())
        assert slug == "test-note"

    def test_make_slug_unique_with_collision(self):
        """Test make_slug_unique adds suffix on collision"""
        slug = make_slug_unique("test-note", {"test-note"})
        assert slug.startswith("test-note-")
        assert len(slug) == len("test-note-") + 4  # 4 char suffix

    def test_validate_slug_valid(self):
        """Test validate_slug accepts valid slugs"""
        assert validate_slug("hello-world") is True
        assert validate_slug("test-note-123") is True
        assert validate_slug("a") is True

    def test_validate_slug_invalid(self):
        """Test validate_slug rejects invalid slugs"""
        assert validate_slug("Hello-World") is False  # Uppercase
        assert validate_slug("-hello") is False  # Leading hyphen
        assert validate_slug("hello-") is False  # Trailing hyphen
        assert validate_slug("hello--world") is False  # Double hyphen
        assert validate_slug("hello_world") is False  # Underscore
        assert validate_slug("") is False  # Empty


class TestContentHashing:
    """Test content hashing functions"""

    def test_calculate_content_hash_consistency(self):
        """Test hash is consistent for same content"""
        hash1 = calculate_content_hash("Test content")
        hash2 = calculate_content_hash("Test content")
        assert hash1 == hash2

    def test_calculate_content_hash_different(self):
        """Test different content produces different hash"""
        hash1 = calculate_content_hash("Test content 1")
        hash2 = calculate_content_hash("Test content 2")
        assert hash1 != hash2

    def test_calculate_content_hash_empty(self):
        """Test hash of empty string"""
        hash_empty = calculate_content_hash("")
        assert len(hash_empty) == 64  # SHA-256 produces 64 hex chars

    def test_calculate_content_hash_unicode(self):
        """Test hash handles unicode correctly"""
        hash_val = calculate_content_hash("Hello 世界")
        assert len(hash_val) == 64


class TestFilePathOperations:
    """Test file path generation and validation"""

    def test_generate_note_path(self):
        """Test note path generation"""
        dt = datetime(2024, 11, 18, 14, 30)
        path = generate_note_path("test-note", dt, Path("data"))
        assert path == Path("data/notes/2024/11/test-note.md")

    def test_generate_note_path_invalid_slug(self):
        """Test note path generation rejects invalid slug"""
        dt = datetime(2024, 11, 18)
        with pytest.raises(ValueError):
            generate_note_path("Invalid Slug!", dt, Path("data"))

    def test_validate_note_path_safe(self):
        """Test path validation accepts safe paths"""
        assert validate_note_path(
            Path("data/notes/2024/11/note.md"),
            Path("data")
        ) is True

    def test_validate_note_path_traversal(self):
        """Test path validation rejects traversal attempts"""
        assert validate_note_path(
            Path("data/notes/../../etc/passwd"),
            Path("data")
        ) is False

    def test_validate_note_path_absolute_outside(self):
        """Test path validation rejects paths outside data dir"""
        assert validate_note_path(
            Path("/etc/passwd"),
            Path("data")
        ) is False


class TestAtomicFileOperations:
    """Test atomic file write/read/delete operations"""

    def test_write_and_read_note_file(self, tmp_path):
        """Test writing and reading note file"""
        file_path = tmp_path / "test.md"
        content = "# Test Note\n\nThis is a test."

        write_note_file(file_path, content)
        assert file_path.exists()

        read_content = read_note_file(file_path)
        assert read_content == content

    def test_write_note_file_atomic(self, tmp_path):
        """Test write is atomic (temp file used)"""
        file_path = tmp_path / "test.md"
        temp_path = Path(str(file_path) + '.tmp')

        # Temp file should not exist after write
        write_note_file(file_path, "Test")
        assert not temp_path.exists()
        assert file_path.exists()

    def test_read_note_file_not_found(self, tmp_path):
        """Test reading non-existent file raises error"""
        file_path = tmp_path / "nonexistent.md"
        with pytest.raises(FileNotFoundError):
            read_note_file(file_path)

    def test_delete_note_file_hard(self, tmp_path):
        """Test hard delete removes file"""
        file_path = tmp_path / "test.md"
        file_path.write_text("Test")

        delete_note_file(file_path, soft=False)
        assert not file_path.exists()

    def test_delete_note_file_soft(self, tmp_path):
        """Test soft delete moves file to trash"""
        # Create note file
        notes_dir = tmp_path / "notes" / "2024" / "11"
        notes_dir.mkdir(parents=True)
        file_path = notes_dir / "test.md"
        file_path.write_text("Test")

        # Soft delete
        delete_note_file(file_path, soft=True, data_dir=tmp_path)

        # Original should be gone
        assert not file_path.exists()

        # Should be in trash
        trash_path = tmp_path / ".trash" / "2024" / "11" / "test.md"
        assert trash_path.exists()


class TestDateTimeFormatting:
    """Test date/time formatting functions"""

    def test_format_rfc822(self):
        """Test RFC-822 date formatting"""
        dt = datetime(2024, 11, 18, 14, 30, 45)
        formatted = format_rfc822(dt)
        assert formatted == "Mon, 18 Nov 2024 14:30:45 +0000"

    def test_format_iso8601(self):
        """Test ISO 8601 date formatting"""
        dt = datetime(2024, 11, 18, 14, 30, 45)
        formatted = format_iso8601(dt)
        assert formatted == "2024-11-18T14:30:45Z"

    def test_parse_iso8601(self):
        """Test ISO 8601 date parsing"""
        dt = parse_iso8601("2024-11-18T14:30:45Z")
        assert dt.year == 2024
        assert dt.month == 11
        assert dt.day == 18
        assert dt.hour == 14
        assert dt.minute == 30
        assert dt.second == 45

    def test_parse_iso8601_invalid(self):
        """Test ISO 8601 parsing rejects invalid format"""
        with pytest.raises(ValueError):
            parse_iso8601("not-a-date")
```

### Additional Test Cases

**Security Tests**:
- Path traversal attempts with various patterns
- Symlink handling
- Unicode/emoji in slugs
- Very long input strings
- Malicious input patterns

**Edge Cases**:
- Empty content
- Whitespace-only content
- Very long content (>1MB)
- Unicode normalization issues
- Concurrent file operations (if applicable)

**Performance Tests**:
- Slug generation with 10,000 character input
- Hash calculation with large content
- Path validation with deep directory structures

---

## Usage Examples

### Creating a Note

```python
from datetime import datetime
from pathlib import Path
from starpunk.utils import (
    generate_slug,
    make_slug_unique,
    generate_note_path,
    ensure_note_directory,
    write_note_file,
    calculate_content_hash
)

# Note content
content = "This is my first note about Python programming"
created_at = datetime.utcnow()
data_dir = Path("data")

# Generate slug
base_slug = generate_slug(content, created_at)  # "this-is-my-first-note"

# Check uniqueness (pseudo-code, actual check via database)
existing_slugs = get_existing_slugs_from_db()  # Returns set of slugs
slug = make_slug_unique(base_slug, existing_slugs)

# Generate file path
note_path = generate_note_path(slug, created_at, data_dir)
# Result: data/notes/2024/11/this-is-my-first-note.md

# Create directory
ensure_note_directory(note_path)

# Write file
write_note_file(note_path, content)

# Calculate hash for database
content_hash = calculate_content_hash(content)

# Save metadata to database (pseudo-code)
save_note_metadata(
    slug=slug,
    file_path=str(note_path.relative_to(data_dir)),
    created_at=created_at,
    content_hash=content_hash
)
```

### Reading a Note

```python
from pathlib import Path
from starpunk.utils import read_note_file, validate_note_path

# Get file path from database (pseudo-code)
note = get_note_from_db(slug="my-first-note")
file_path = Path("data") / note.file_path

# Validate path (security check)
if not validate_note_path(file_path, Path("data")):
    raise ValueError("Invalid file path")

# Read content
content = read_note_file(file_path)
```

### Deleting a Note

```python
from pathlib import Path
from starpunk.utils import delete_note_file

# Get note path from database
note = get_note_from_db(slug="old-note")
file_path = Path("data") / note.file_path

# Soft delete (move to trash)
delete_note_file(file_path, soft=True, data_dir=Path("data"))

# Update database to mark as deleted (pseudo-code)
mark_note_deleted(note.id)
```

---

## Integration with Other Modules

### Database Integration

The database module will use utilities like this:

```python
# In starpunk/notes.py
from starpunk.utils import (
    generate_slug, make_slug_unique, generate_note_path,
    ensure_note_directory, write_note_file, calculate_content_hash
)
from starpunk.database import get_db

def create_note(content: str, published: bool = False) -> Note:
    """Create a new note"""
    db = get_db()
    created_at = datetime.utcnow()

    # Generate unique slug
    base_slug = generate_slug(content, created_at)
    existing_slugs = set(row[0] for row in db.execute("SELECT slug FROM notes").fetchall())
    slug = make_slug_unique(base_slug, existing_slugs)

    # Generate path and write file
    note_path = generate_note_path(slug, created_at, Path(current_app.config['DATA_PATH']))
    ensure_note_directory(note_path)
    write_note_file(note_path, content)

    # Calculate hash
    content_hash = calculate_content_hash(content)

    # Insert into database
    db.execute(
        "INSERT INTO notes (slug, file_path, published, created_at, updated_at, content_hash) "
        "VALUES (?, ?, ?, ?, ?, ?)",
        (slug, str(note_path.relative_to(Path(current_app.config['DATA_PATH']))),
         published, created_at, created_at, content_hash)
    )
    db.commit()

    return Note(slug=slug, content=content, created_at=created_at, ...)
```

### RSS Feed Integration

The feed module will use date formatting:

```python
# In starpunk/feed.py
from starpunk.utils import format_rfc822

def generate_rss_feed():
    """Generate RSS feed"""
    fg = FeedGenerator()

    for note in get_published_notes():
        fe = fg.add_entry()
        fe.pubDate(format_rfc822(note.created_at))
```

---

## Performance Considerations

### Slug Generation Performance

- Expected input: 50-1000 characters
- Target: <1ms per slug generation
- Bottleneck: None (string operations are fast)

### File Operations Performance

- Write operation: <10ms for typical note (1-10KB)
- Read operation: <5ms for typical note
- Atomic rename: ~1ms (OS-level operation)

### Path Validation Performance

- Target: <1ms per validation
- Use cached absolute paths where possible

---

## Security Considerations

### Path Traversal Prevention

CRITICAL: All file paths MUST be validated before use.

```python
# Always validate before file operations
if not validate_note_path(file_path, data_dir):
    raise ValueError("Invalid file path: potential directory traversal")
```

### Slug Validation

Prevent injection attacks through slugs:

```python
# Validate slug before using in paths
if not validate_slug(slug):
    raise ValueError(f"Invalid slug: {slug}")
```

### Random Suffix Security

Use cryptographically secure random:

```python
# Use secrets module, not random module
import secrets
suffix = secrets.token_urlsafe(4)[:4].lower()
```

---

## Dependencies

### Standard Library

- `hashlib` - SHA-256 hashing
- `re` - Regular expressions
- `secrets` - Secure random generation
- `pathlib` - Path operations
- `datetime` - Date/time handling
- `shutil` - File operations (move)

### Third-Party

- `flask` - For `current_app` context (configuration access)

### Internal

- None (this is the foundation module)

---

## Future Enhancements (V2+)

### Slug Customization

Allow user to specify custom slug:

```python
def generate_slug(content: str, custom_slug: Optional[str] = None, ...) -> str:
    if custom_slug:
        if validate_slug(custom_slug):
            return custom_slug
        raise ValueError("Invalid custom slug")
    # ... existing logic
```

### Unicode Slug Support

Support non-ASCII characters in slugs:

```python
def generate_unicode_slug(content: str) -> str:
    """Generate slug with unicode support"""
    # Use unicode normalization
    # Support CJK characters
    # Support diacritics
```

### Content Preview Extraction

Extract preview/excerpt from content:

```python
def extract_preview(content: str, max_length: int = 200) -> str:
    """Extract preview text from content"""
    # Strip markdown formatting
    # Truncate to max_length
    # Add ellipsis if truncated
```

### Title Extraction

Extract title from first line:

```python
def extract_title(content: str) -> Optional[str]:
    """Extract title from content (first line or # heading)"""
    # Check for # heading
    # Or use first line
    # Return None if empty
```

---

## Acceptance Criteria

- [ ] All functions implemented with type hints
- [ ] All functions have comprehensive docstrings
- [ ] Slug generation creates valid, URL-safe slugs
- [ ] Slug uniqueness enforcement works correctly
- [ ] Content hashing is consistent and correct
- [ ] Path validation prevents directory traversal
- [ ] Atomic file writes prevent corruption
- [ ] Date formatting matches RFC-822 and ISO 8601 specs
- [ ] All security checks are in place
- [ ] Test coverage >90%
- [ ] All tests pass
- [ ] Code formatted with Black
- [ ] Code passes flake8 linting
- [ ] No hardcoded values (use constants)
- [ ] Error messages are clear and actionable

---

## References

- [ADR-004: File-Based Note Storage](/home/phil/Projects/starpunk/docs/decisions/ADR-004-file-based-note-storage.md)
- [Python Coding Standards](/home/phil/Projects/starpunk/docs/standards/python-coding-standards.md)
- [Project Structure](/home/phil/Projects/starpunk/docs/design/project-structure.md)
- [CommonMark Specification](https://spec.commonmark.org/)
- [RSS 2.0 Specification](https://www.rssboard.org/rss-specification)
- [ISO 8601 Date Format](https://en.wikipedia.org/wiki/ISO_8601)
- [RFC-822 Date Format](https://www.rfc-editor.org/rfc/rfc822)
- [Python pathlib Documentation](https://docs.python.org/3/library/pathlib.html)
- [Python secrets Documentation](https://docs.python.org/3/library/secrets.html)

---

## Implementation Checklist

When implementing `starpunk/utils.py`, complete in this order:

1. [ ] Create file with module docstring
2. [ ] Add imports and constants
3. [ ] Implement helper functions (extract_first_words, normalize_slug_text, generate_random_suffix)
4. [ ] Implement slug functions (generate_slug, make_slug_unique, validate_slug)
5. [ ] Implement content hashing (calculate_content_hash)
6. [ ] Implement path functions (generate_note_path, ensure_note_directory, validate_note_path)
7. [ ] Implement file operations (write_note_file, read_note_file, delete_note_file)
8. [ ] Implement date/time functions (format_rfc822, format_iso8601, parse_iso8601)
9. [ ] Create tests/test_utils.py
10. [ ] Write tests for all functions
11. [ ] Run tests and achieve >90% coverage
12. [ ] Format with Black
13. [ ] Lint with flake8
14. [ ] Review all docstrings
15. [ ] Review all error messages

**Estimated Time**: 2-3 hours for implementation + tests