Files

Phil Skentelbery a68fd570c7 that initial commit

2025-11-18 19:21:31 -07:00

39 KiB

Raw Blame History

Phase 1.1: Core Utilities Design

Overview

This document provides a complete, implementation-ready design for Phase 1.1 of the StarPunk V1 implementation plan: Core Utilities. The utilities module (starpunk/utils.py) provides foundational functions used by all other components of the system.

Priority: CRITICAL - Required by all other features Estimated Effort: 2-3 hours Dependencies: None File: starpunk/utils.py

Design Principles

Pure functions - No side effects where possible
Type safety - Full type hints on all functions
Error handling - Specific exceptions with clear messages
Security first - Validate all inputs, prevent path traversal
Testable - Small, focused functions that are easy to test
Well documented - Comprehensive docstrings with examples

Module Structure

"""
Core utility functions for StarPunk

This module provides essential utilities for slug generation, file operations,
hashing, and date/time handling. These utilities are used throughout the
application and have no external dependencies beyond standard library and
Flask configuration.
"""

# Standard library imports
import hashlib
import re
import secrets
from datetime import datetime
from pathlib import Path
from typing import Optional

# Third-party imports
from flask import current_app

# Constants
MAX_SLUG_LENGTH = 100
MIN_SLUG_LENGTH = 1
SLUG_WORDS_COUNT = 5
RANDOM_SUFFIX_LENGTH = 4
CONTENT_HASH_ALGORITHM = "sha256"
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')
MULTIPLE_HYPHENS_PATTERN = re.compile(r'-+')

# Utility functions (defined below)

Function Specifications

1. Slug Generation

`generate_slug(content: str, created_at: Optional[datetime] = None) -> str`

Purpose: Generate a URL-safe slug from note content or timestamp.

Algorithm:

Extract first N words from content (configurable, default 5)
Convert to lowercase
Replace spaces with hyphens
Remove all characters except a-z, 0-9, and hyphens
Remove leading/trailing hyphens
Collapse multiple consecutive hyphens to single hyphen
Truncate to maximum length
If result is empty or too short, fall back to timestamp-based slug
Return slug (uniqueness check handled by caller)

Type Signature:

def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
    """
    Generate URL-safe slug from note content

    Creates a slug by extracting the first few words from the content and
    normalizing them to lowercase with hyphens. If content is insufficient,
    falls back to timestamp-based slug.

    Args:
        content: The note content (markdown text)
        created_at: Optional timestamp for fallback slug (defaults to now)

    Returns:
        URL-safe slug string (lowercase, alphanumeric + hyphens only)

    Raises:
        ValueError: If content is empty or contains only whitespace

    Examples:
        >>> generate_slug("Hello World! This is my first note.")
        'hello-world-this-is-my'

        >>> generate_slug("Testing... with special chars!@#")
        'testing-with-special-chars'

        >>> generate_slug("A")  # Too short, uses timestamp
        '20241118-143022'

        >>> generate_slug("   ")  # Empty content
        ValueError: Content cannot be empty

    Notes:
        - This function does NOT check for uniqueness
        - Caller must verify slug doesn't exist in database
        - Use make_slug_unique() to add random suffix if needed
    """

Implementation Details:

Use regex to remove non-alphanumeric characters (except hyphens)
Handle Unicode: normalize to ASCII first, remove non-ASCII characters
Maximum slug length: 100 characters (configurable constant)
Minimum slug length: 1 character (if shorter, use timestamp fallback)
Timestamp format: YYYYMMDD-HHMMSS (e.g., 20241118-143022)
Extract words: split on whitespace, take first N non-empty tokens

Edge Cases:

Empty content → ValueError
Content with only special characters → timestamp fallback
Content with only 1-2 characters → timestamp fallback
Very long first words → truncate to max length
Unicode/emoji content → strip to ASCII alphanumeric

`make_slug_unique(base_slug: str, existing_slugs: set[str]) -> str`

Purpose: Add random suffix to slug if it already exists.

Type Signature:

def make_slug_unique(base_slug: str, existing_slugs: set[str]) -> str:
    """
    Make a slug unique by adding random suffix if needed

    If the base_slug already exists in the provided set, appends a random
    alphanumeric suffix until a unique slug is found.

    Args:
        base_slug: The base slug to make unique
        existing_slugs: Set of existing slugs to check against

    Returns:
        Unique slug (base_slug or base_slug-{random})

    Examples:
        >>> make_slug_unique("test-note", set())
        'test-note'

        >>> make_slug_unique("test-note", {"test-note"})
        'test-note-a7c9'  # Random suffix

        >>> make_slug_unique("test-note", {"test-note", "test-note-a7c9"})
        'test-note-x3k2'  # Different random suffix

    Notes:
        - Random suffix is 4 lowercase alphanumeric characters
        - Extremely low collision probability (36^4 = 1.6M combinations)
        - Will retry up to 100 times if collision occurs (should never happen)
    """

Implementation Details:

Check if base_slug exists in set
If not, return base_slug unchanged
If exists, append -{random} where random is 4 alphanumeric chars
Use secrets.token_urlsafe() for random generation (secure)
Retry if collision occurs (max 100 attempts, raise error if all fail)

`validate_slug(slug: str) -> bool`

Purpose: Validate that a slug meets all requirements.

Type Signature:

def validate_slug(slug: str) -> bool:
    """
    Validate that a slug meets all requirements

    Checks that slug contains only allowed characters and is within
    length limits.

    Args:
        slug: The slug to validate

    Returns:
        True if slug is valid, False otherwise

    Rules:
        - Must contain only: a-z, 0-9, hyphen (-)
        - Must be between 1 and 100 characters
        - Cannot start or end with hyphen
        - Cannot contain consecutive hyphens

    Examples:
        >>> validate_slug("hello-world")
        True

        >>> validate_slug("Hello-World")  # Uppercase
        False

        >>> validate_slug("-hello")  # Leading hyphen
        False

        >>> validate_slug("hello--world")  # Double hyphen
        False
    """

Implementation Details:

Regex pattern: ^[a-z0-9]+(?:-[a-z0-9]+)*$
Length check: 1 <= len(slug) <= 100
No additional normalization (validation only)

2. Content Hashing

`calculate_content_hash(content: str) -> str`

Purpose: Calculate SHA-256 hash of note content for change detection.

Type Signature:

def calculate_content_hash(content: str) -> str:
    """
    Calculate SHA-256 hash of content

    Generates a cryptographic hash of the content for change detection
    and cache invalidation. Uses UTF-8 encoding.

    Args:
        content: The content to hash (markdown text)

    Returns:
        Hexadecimal hash string (64 characters)

    Examples:
        >>> calculate_content_hash("Hello World")
        'a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e'

        >>> calculate_content_hash("")
        'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'

    Notes:
        - Same content always produces same hash
        - Hash is deterministic across systems
        - Useful for detecting external file modifications
        - SHA-256 chosen for security and wide support
    """

Implementation Details:

Use hashlib.sha256()
Encode content as UTF-8 bytes
Return hexadecimal digest (lowercase)
Handle empty content (valid use case)

3. File Path Operations

`generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path`

Purpose: Generate file path for note using year/month directory structure.

Type Signature:

def generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path:
    """
    Generate file path for a note

    Creates path following pattern: data/notes/YYYY/MM/slug.md

    Args:
        slug: URL-safe slug for the note
        created_at: Creation timestamp (determines YYYY/MM)
        data_dir: Base data directory path

    Returns:
        Full Path object for the note file

    Raises:
        ValueError: If slug is invalid

    Examples:
        >>> from datetime import datetime
        >>> from pathlib import Path
        >>> dt = datetime(2024, 11, 18, 14, 30)
        >>> generate_note_path("test-note", dt, Path("data"))
        PosixPath('data/notes/2024/11/test-note.md')

    Notes:
        - Does NOT create directories (use ensure_note_directory)
        - Does NOT check if file exists
        - Validates slug before generating path
    """

Implementation Details:

Validate slug using validate_slug()
Extract year and month from created_at: created_at.strftime("%Y"), "%m"
Build path: data_dir / "notes" / year / month / f"{slug}.md"
Return Path object (not string)

`ensure_note_directory(note_path: Path) -> Path`

Purpose: Create directory structure for note if it doesn't exist.

Type Signature:

def ensure_note_directory(note_path: Path) -> Path:
    """
    Ensure directory exists for note file

    Creates parent directories if they don't exist. Safe to call
    even if directories already exist.

    Args:
        note_path: Full path to note file

    Returns:
        Parent directory path

    Raises:
        OSError: If directory cannot be created (permissions, etc.)

    Examples:
        >>> note_path = Path("data/notes/2024/11/test-note.md")
        >>> ensure_note_directory(note_path)
        PosixPath('data/notes/2024/11')
    """

Implementation Details:

Use note_path.parent.mkdir(parents=True, exist_ok=True)
Return parent directory path
Let OSError propagate if creation fails (permissions, disk full, etc.)

`validate_note_path(file_path: Path, data_dir: Path) -> bool`

Purpose: Validate that file path is within data directory (prevent path traversal).

Type Signature:

def validate_note_path(file_path: Path, data_dir: Path) -> bool:
    """
    Validate that file path is within data directory

    Security check to prevent path traversal attacks. Ensures the
    resolved path is within the allowed data directory.

    Args:
        file_path: Path to validate
        data_dir: Base data directory that must contain file_path

    Returns:
        True if path is safe, False otherwise

    Examples:
        >>> validate_note_path(
        ...     Path("data/notes/2024/11/note.md"),
        ...     Path("data")
        ... )
        True

        >>> validate_note_path(
        ...     Path("data/notes/../../etc/passwd"),
        ...     Path("data")
        ... )
        False

    Security:
        - Resolves symlinks and relative paths
        - Checks if resolved path is child of data_dir
        - Prevents directory traversal attacks
    """

Implementation Details:

Resolve both paths to absolute: file_path.resolve(), data_dir.resolve()
Check if file_path is relative to data_dir: use file_path.is_relative_to(data_dir)
Return boolean (no exceptions)
Critical for security - MUST be called before any file operations

4. Atomic File Operations

`write_note_file(file_path: Path, content: str) -> None`

Purpose: Write note content to file atomically.

Type Signature:

def write_note_file(file_path: Path, content: str) -> None:
    """
    Write note content to file atomically

    Writes to temporary file first, then atomically renames to final path.
    This prevents corruption if write is interrupted.

    Args:
        file_path: Destination file path
        content: Content to write (markdown text)

    Raises:
        OSError: If file cannot be written
        ValueError: If file_path is invalid

    Examples:
        >>> write_note_file(Path("data/notes/2024/11/test.md"), "# Test")

    Implementation:
        1. Create temp file: {file_path}.tmp
        2. Write content to temp file
        3. Atomically rename temp to final path
        4. If any step fails, clean up temp file

    Notes:
        - Atomic rename is guaranteed on POSIX systems
        - Temp file created in same directory as target
        - UTF-8 encoding used for all text
    """

Implementation Details:

Create temp path: temp_path = file_path.with_suffix(file_path.suffix + '.tmp')
Write to temp file: temp_path.write_text(content, encoding='utf-8')
Atomic rename: temp_path.replace(file_path)
On exception: delete temp file if exists, re-raise
Use context manager or try/finally for cleanup

`read_note_file(file_path: Path) -> str`

Purpose: Read note content from file.

Type Signature:

def read_note_file(file_path: Path) -> str:
    """
    Read note content from file

    Args:
        file_path: Path to note file

    Returns:
        File content as string

    Raises:
        FileNotFoundError: If file doesn't exist
        OSError: If file cannot be read

    Examples:
        >>> content = read_note_file(Path("data/notes/2024/11/test.md"))
        >>> print(content)
        # Test Note
    """

Implementation Details:

Use file_path.read_text(encoding='utf-8')
Let exceptions propagate (FileNotFoundError, OSError, etc.)
No additional error handling needed

`delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None`

Purpose: Delete note file (soft or hard delete).

Type Signature:

def delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None:
    """
    Delete note file from filesystem

    Supports soft delete (move to trash) or hard delete (permanent removal).

    Args:
        file_path: Path to note file
        soft: If True, move to .trash/ directory; if False, delete permanently
        data_dir: Required if soft=True, base data directory

    Raises:
        FileNotFoundError: If file doesn't exist
        ValueError: If soft=True but data_dir not provided
        OSError: If file cannot be deleted or moved

    Examples:
        >>> # Hard delete
        >>> delete_note_file(Path("data/notes/2024/11/test.md"))

        >>> # Soft delete (move to trash)
        >>> delete_note_file(
        ...     Path("data/notes/2024/11/test.md"),
        ...     soft=True,
        ...     data_dir=Path("data")
        ... )
    """

Implementation Details:

If soft delete:
- Require data_dir parameter
- Create trash directory: data_dir / ".trash" / year / month
- Move file to trash with same filename
- Use shutil.move() or file_path.replace()
If hard delete:
- Use file_path.unlink()
Handle FileNotFoundError gracefully (file already gone is success)

5. Date/Time Utilities

`format_rfc822(dt: datetime) -> str`

Purpose: Format datetime as RFC-822 string for RSS feeds.

Type Signature:

def format_rfc822(dt: datetime) -> str:
    """
    Format datetime as RFC-822 string

    Converts datetime to RFC-822 format required by RSS 2.0 specification.
    Assumes UTC timezone.

    Args:
        dt: Datetime to format (assumed UTC)

    Returns:
        RFC-822 formatted string

    Examples:
        >>> from datetime import datetime
        >>> dt = datetime(2024, 11, 18, 14, 30, 45)
        >>> format_rfc822(dt)
        'Mon, 18 Nov 2024 14:30:45 +0000'

    References:
        - RSS 2.0 spec: https://www.rssboard.org/rss-specification
        - RFC-822 date format
    """

Implementation Details:

Use dt.strftime() with format: "%a, %d %b %Y %H:%M:%S +0000"
Assume UTC timezone (all timestamps stored as UTC)
Format pattern: "DDD, DD MMM YYYY HH:MM:SS +0000"

`format_iso8601(dt: datetime) -> str`

Purpose: Format datetime as ISO 8601 string.

Type Signature:

def format_iso8601(dt: datetime) -> str:
    """
    Format datetime as ISO 8601 string

    Converts datetime to ISO 8601 format for timestamps and APIs.

    Args:
        dt: Datetime to format

    Returns:
        ISO 8601 formatted string

    Examples:
        >>> from datetime import datetime
        >>> dt = datetime(2024, 11, 18, 14, 30, 45)
        >>> format_iso8601(dt)
        '2024-11-18T14:30:45Z'
    """

Implementation Details:

Use dt.isoformat() with 'Z' suffix for UTC
Format: YYYY-MM-DDTHH:MM:SSZ
Or use dt.strftime("%Y-%m-%dT%H:%M:%SZ")

`parse_iso8601(date_string: str) -> datetime`

Purpose: Parse ISO 8601 string to datetime.

Type Signature:

def parse_iso8601(date_string: str) -> datetime:
    """
    Parse ISO 8601 string to datetime

    Args:
        date_string: ISO 8601 formatted string

    Returns:
        Datetime object (UTC)

    Raises:
        ValueError: If string is not valid ISO 8601 format

    Examples:
        >>> parse_iso8601("2024-11-18T14:30:45Z")
        datetime.datetime(2024, 11, 18, 14, 30, 45)
    """

Implementation Details:

Use datetime.fromisoformat() (remove 'Z' suffix first if present)
Or use datetime.strptime() with format string
Let ValueError propagate if invalid format

6. Helper Functions

`extract_first_words(text: str, max_words: int = 5) -> str`

Purpose: Extract first N words from text (helper for slug generation).

Type Signature:

def extract_first_words(text: str, max_words: int = 5) -> str:
    """
    Extract first N words from text

    Helper function for slug generation. Splits text on whitespace
    and returns first N non-empty words.

    Args:
        text: Text to extract words from
        max_words: Maximum number of words to extract

    Returns:
        Space-separated string of first N words

    Examples:
        >>> extract_first_words("Hello world this is a test", 3)
        'Hello world this'

        >>> extract_first_words("  Multiple   spaces  ", 2)
        'Multiple spaces'
    """

Implementation Details:

Strip whitespace from text
Split on whitespace: text.split()
Filter empty strings
Take first max_words: words[:max_words]
Join with single space: " ".join(words)

`normalize_slug_text(text: str) -> str`

Purpose: Normalize text for slug (lowercase, hyphens, alphanumeric only).

Type Signature:

def normalize_slug_text(text: str) -> str:
    """
    Normalize text for use in slug

    Converts to lowercase, replaces spaces with hyphens, removes
    special characters, and collapses multiple hyphens.

    Args:
        text: Text to normalize

    Returns:
        Normalized slug-safe text

    Examples:
        >>> normalize_slug_text("Hello World!")
        'hello-world'

        >>> normalize_slug_text("Testing... with -- special chars!")
        'testing-with-special-chars'
    """

Implementation Details:

Convert to lowercase: text.lower()
Replace spaces with hyphens: text.replace(' ', '-')
Remove non-alphanumeric (except hyphens): regex [^a-z0-9-] → ''
Collapse multiple hyphens: regex -+ → -
Strip leading/trailing hyphens: text.strip('-')

`generate_random_suffix(length: int = 4) -> str`

Purpose: Generate random alphanumeric suffix for slug uniqueness.

Type Signature:

def generate_random_suffix(length: int = 4) -> str:
    """
    Generate random alphanumeric suffix

    Creates a secure random string for making slugs unique.
    Uses lowercase letters and numbers only.

    Args:
        length: Length of suffix (default 4)

    Returns:
        Random alphanumeric string

    Examples:
        >>> suffix = generate_random_suffix()
        >>> len(suffix)
        4
        >>> suffix.isalnum()
        True
    """

Implementation Details:

Use secrets.token_urlsafe() or secrets.choice()
Character set: a-z0-9 (36 characters)
Generate random string of specified length
Example: 'a7c9', 'x3k2', 'p9m1'

Constants

# Slug configuration
MAX_SLUG_LENGTH = 100           # Maximum slug length in characters
MIN_SLUG_LENGTH = 1             # Minimum slug length in characters
SLUG_WORDS_COUNT = 5            # Number of words to extract for slug
RANDOM_SUFFIX_LENGTH = 4        # Length of random suffix for uniqueness

# File operations
TEMP_FILE_SUFFIX = '.tmp'       # Suffix for temporary files
TRASH_DIR_NAME = '.trash'       # Directory name for soft-deleted files

# Hashing
CONTENT_HASH_ALGORITHM = 'sha256'  # Hash algorithm for content

# Regex patterns
SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$')
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')
MULTIPLE_HYPHENS_PATTERN = re.compile(r'-+')
WHITESPACE_PATTERN = re.compile(r'\s+')

# Character sets
RANDOM_CHARS = 'abcdefghijklmnopqrstuvwxyz0123456789'

Error Handling

Exception Types

The module uses standard Python exceptions:

ValueError: Invalid input (empty content, invalid slug, etc.)
FileNotFoundError: File doesn't exist when expected
FileExistsError: File already exists (not used in utils, but caller may use)
OSError: General I/O errors (permissions, disk full, etc.)

Error Messages

Error messages should be specific and actionable:

Good:

raise ValueError(
    f"Invalid slug '{slug}': must contain only lowercase letters, "
    f"numbers, and hyphens"
)

raise ValueError("Content cannot be empty or whitespace-only")

raise OSError(
    f"Failed to write note file: {file_path}. "
    f"Check permissions and disk space."
)

Bad:

raise ValueError("Invalid slug")
raise ValueError("Bad input")
raise OSError("Write failed")

Testing Strategy

Test Coverage Requirements

Minimum 90% code coverage for utils.py
Test all functions with multiple scenarios
Test edge cases and error conditions
Test security (path traversal, etc.)

Test Organization

File: tests/test_utils.py

"""
Tests for utility functions

Organized by function category:
- Slug generation tests
- Content hashing tests
- File path tests
- Atomic file operations tests
- Date/time formatting tests
"""

import pytest
from pathlib import Path
from datetime import datetime
from starpunk.utils import (
    generate_slug,
    make_slug_unique,
    validate_slug,
    calculate_content_hash,
    generate_note_path,
    # ... etc
)

class TestSlugGeneration:
    """Test slug generation functions"""

    def test_generate_slug_from_content(self):
        """Test basic slug generation from content"""
        slug = generate_slug("Hello World This Is My Note")
        assert slug == "hello-world-this-is-my"

    def test_generate_slug_special_characters(self):
        """Test slug generation removes special characters"""
        slug = generate_slug("Testing... with!@# special chars")
        assert slug == "testing-with-special-chars"

    def test_generate_slug_empty_content(self):
        """Test slug generation raises error on empty content"""
        with pytest.raises(ValueError):
            generate_slug("")

    def test_generate_slug_whitespace_only(self):
        """Test slug generation raises error on whitespace-only content"""
        with pytest.raises(ValueError):
            generate_slug("   \n\t   ")

    def test_generate_slug_short_content_fallback(self):
        """Test slug generation falls back to timestamp for short content"""
        slug = generate_slug("A", created_at=datetime(2024, 11, 18, 14, 30))
        assert slug == "20241118-143000"  # Timestamp format

    def test_generate_slug_unicode_content(self):
        """Test slug generation handles unicode"""
        slug = generate_slug("Hello 世界 World")
        # Should strip non-ASCII and keep ASCII words
        assert "hello" in slug
        assert "world" in slug

    def test_make_slug_unique_no_collision(self):
        """Test make_slug_unique returns original if no collision"""
        slug = make_slug_unique("test-note", set())
        assert slug == "test-note"

    def test_make_slug_unique_with_collision(self):
        """Test make_slug_unique adds suffix on collision"""
        slug = make_slug_unique("test-note", {"test-note"})
        assert slug.startswith("test-note-")
        assert len(slug) == len("test-note-") + 4  # 4 char suffix

    def test_validate_slug_valid(self):
        """Test validate_slug accepts valid slugs"""
        assert validate_slug("hello-world") is True
        assert validate_slug("test-note-123") is True
        assert validate_slug("a") is True

    def test_validate_slug_invalid(self):
        """Test validate_slug rejects invalid slugs"""
        assert validate_slug("Hello-World") is False  # Uppercase
        assert validate_slug("-hello") is False  # Leading hyphen
        assert validate_slug("hello-") is False  # Trailing hyphen
        assert validate_slug("hello--world") is False  # Double hyphen
        assert validate_slug("hello_world") is False  # Underscore
        assert validate_slug("") is False  # Empty


class TestContentHashing:
    """Test content hashing functions"""

    def test_calculate_content_hash_consistency(self):
        """Test hash is consistent for same content"""
        hash1 = calculate_content_hash("Test content")
        hash2 = calculate_content_hash("Test content")
        assert hash1 == hash2

    def test_calculate_content_hash_different(self):
        """Test different content produces different hash"""
        hash1 = calculate_content_hash("Test content 1")
        hash2 = calculate_content_hash("Test content 2")
        assert hash1 != hash2

    def test_calculate_content_hash_empty(self):
        """Test hash of empty string"""
        hash_empty = calculate_content_hash("")
        assert len(hash_empty) == 64  # SHA-256 produces 64 hex chars

    def test_calculate_content_hash_unicode(self):
        """Test hash handles unicode correctly"""
        hash_val = calculate_content_hash("Hello 世界")
        assert len(hash_val) == 64


class TestFilePathOperations:
    """Test file path generation and validation"""

    def test_generate_note_path(self):
        """Test note path generation"""
        dt = datetime(2024, 11, 18, 14, 30)
        path = generate_note_path("test-note", dt, Path("data"))
        assert path == Path("data/notes/2024/11/test-note.md")

    def test_generate_note_path_invalid_slug(self):
        """Test note path generation rejects invalid slug"""
        dt = datetime(2024, 11, 18)
        with pytest.raises(ValueError):
            generate_note_path("Invalid Slug!", dt, Path("data"))

    def test_validate_note_path_safe(self):
        """Test path validation accepts safe paths"""
        assert validate_note_path(
            Path("data/notes/2024/11/note.md"),
            Path("data")
        ) is True

    def test_validate_note_path_traversal(self):
        """Test path validation rejects traversal attempts"""
        assert validate_note_path(
            Path("data/notes/../../etc/passwd"),
            Path("data")
        ) is False

    def test_validate_note_path_absolute_outside(self):
        """Test path validation rejects paths outside data dir"""
        assert validate_note_path(
            Path("/etc/passwd"),
            Path("data")
        ) is False


class TestAtomicFileOperations:
    """Test atomic file write/read/delete operations"""

    def test_write_and_read_note_file(self, tmp_path):
        """Test writing and reading note file"""
        file_path = tmp_path / "test.md"
        content = "# Test Note\n\nThis is a test."

        write_note_file(file_path, content)
        assert file_path.exists()

        read_content = read_note_file(file_path)
        assert read_content == content

    def test_write_note_file_atomic(self, tmp_path):
        """Test write is atomic (temp file used)"""
        file_path = tmp_path / "test.md"
        temp_path = Path(str(file_path) + '.tmp')

        # Temp file should not exist after write
        write_note_file(file_path, "Test")
        assert not temp_path.exists()
        assert file_path.exists()

    def test_read_note_file_not_found(self, tmp_path):
        """Test reading non-existent file raises error"""
        file_path = tmp_path / "nonexistent.md"
        with pytest.raises(FileNotFoundError):
            read_note_file(file_path)

    def test_delete_note_file_hard(self, tmp_path):
        """Test hard delete removes file"""
        file_path = tmp_path / "test.md"
        file_path.write_text("Test")

        delete_note_file(file_path, soft=False)
        assert not file_path.exists()

    def test_delete_note_file_soft(self, tmp_path):
        """Test soft delete moves file to trash"""
        # Create note file
        notes_dir = tmp_path / "notes" / "2024" / "11"
        notes_dir.mkdir(parents=True)
        file_path = notes_dir / "test.md"
        file_path.write_text("Test")

        # Soft delete
        delete_note_file(file_path, soft=True, data_dir=tmp_path)

        # Original should be gone
        assert not file_path.exists()

        # Should be in trash
        trash_path = tmp_path / ".trash" / "2024" / "11" / "test.md"
        assert trash_path.exists()


class TestDateTimeFormatting:
    """Test date/time formatting functions"""

    def test_format_rfc822(self):
        """Test RFC-822 date formatting"""
        dt = datetime(2024, 11, 18, 14, 30, 45)
        formatted = format_rfc822(dt)
        assert formatted == "Mon, 18 Nov 2024 14:30:45 +0000"

    def test_format_iso8601(self):
        """Test ISO 8601 date formatting"""
        dt = datetime(2024, 11, 18, 14, 30, 45)
        formatted = format_iso8601(dt)
        assert formatted == "2024-11-18T14:30:45Z"

    def test_parse_iso8601(self):
        """Test ISO 8601 date parsing"""
        dt = parse_iso8601("2024-11-18T14:30:45Z")
        assert dt.year == 2024
        assert dt.month == 11
        assert dt.day == 18
        assert dt.hour == 14
        assert dt.minute == 30
        assert dt.second == 45

    def test_parse_iso8601_invalid(self):
        """Test ISO 8601 parsing rejects invalid format"""
        with pytest.raises(ValueError):
            parse_iso8601("not-a-date")

Additional Test Cases

Security Tests:

Path traversal attempts with various patterns
Symlink handling
Unicode/emoji in slugs
Very long input strings
Malicious input patterns

Edge Cases:

Empty content
Whitespace-only content
Very long content (>1MB)
Unicode normalization issues
Concurrent file operations (if applicable)

Performance Tests:

Slug generation with 10,000 character input
Hash calculation with large content
Path validation with deep directory structures

Usage Examples

Creating a Note

from datetime import datetime
from pathlib import Path
from starpunk.utils import (
    generate_slug,
    make_slug_unique,
    generate_note_path,
    ensure_note_directory,
    write_note_file,
    calculate_content_hash
)

# Note content
content = "This is my first note about Python programming"
created_at = datetime.utcnow()
data_dir = Path("data")

# Generate slug
base_slug = generate_slug(content, created_at)  # "this-is-my-first-note"

# Check uniqueness (pseudo-code, actual check via database)
existing_slugs = get_existing_slugs_from_db()  # Returns set of slugs
slug = make_slug_unique(base_slug, existing_slugs)

# Generate file path
note_path = generate_note_path(slug, created_at, data_dir)
# Result: data/notes/2024/11/this-is-my-first-note.md

# Create directory
ensure_note_directory(note_path)

# Write file
write_note_file(note_path, content)

# Calculate hash for database
content_hash = calculate_content_hash(content)

# Save metadata to database (pseudo-code)
save_note_metadata(
    slug=slug,
    file_path=str(note_path.relative_to(data_dir)),
    created_at=created_at,
    content_hash=content_hash
)

Reading a Note

from pathlib import Path
from starpunk.utils import read_note_file, validate_note_path

# Get file path from database (pseudo-code)
note = get_note_from_db(slug="my-first-note")
file_path = Path("data") / note.file_path

# Validate path (security check)
if not validate_note_path(file_path, Path("data")):
    raise ValueError("Invalid file path")

# Read content
content = read_note_file(file_path)

Deleting a Note

from pathlib import Path
from starpunk.utils import delete_note_file

# Get note path from database
note = get_note_from_db(slug="old-note")
file_path = Path("data") / note.file_path

# Soft delete (move to trash)
delete_note_file(file_path, soft=True, data_dir=Path("data"))

# Update database to mark as deleted (pseudo-code)
mark_note_deleted(note.id)

Integration with Other Modules

Database Integration

The database module will use utilities like this:

# In starpunk/notes.py
from starpunk.utils import (
    generate_slug, make_slug_unique, generate_note_path,
    ensure_note_directory, write_note_file, calculate_content_hash
)
from starpunk.database import get_db

def create_note(content: str, published: bool = False) -> Note:
    """Create a new note"""
    db = get_db()
    created_at = datetime.utcnow()

    # Generate unique slug
    base_slug = generate_slug(content, created_at)
    existing_slugs = set(row[0] for row in db.execute("SELECT slug FROM notes").fetchall())
    slug = make_slug_unique(base_slug, existing_slugs)

    # Generate path and write file
    note_path = generate_note_path(slug, created_at, Path(current_app.config['DATA_PATH']))
    ensure_note_directory(note_path)
    write_note_file(note_path, content)

    # Calculate hash
    content_hash = calculate_content_hash(content)

    # Insert into database
    db.execute(
        "INSERT INTO notes (slug, file_path, published, created_at, updated_at, content_hash) "
        "VALUES (?, ?, ?, ?, ?, ?)",
        (slug, str(note_path.relative_to(Path(current_app.config['DATA_PATH']))),
         published, created_at, created_at, content_hash)
    )
    db.commit()

    return Note(slug=slug, content=content, created_at=created_at, ...)

RSS Feed Integration

The feed module will use date formatting:

# In starpunk/feed.py
from starpunk.utils import format_rfc822

def generate_rss_feed():
    """Generate RSS feed"""
    fg = FeedGenerator()

    for note in get_published_notes():
        fe = fg.add_entry()
        fe.pubDate(format_rfc822(note.created_at))

Performance Considerations

Slug Generation Performance

Expected input: 50-1000 characters
Target: <1ms per slug generation
Bottleneck: None (string operations are fast)

File Operations Performance

Write operation: <10ms for typical note (1-10KB)
Read operation: <5ms for typical note
Atomic rename: ~1ms (OS-level operation)

Path Validation Performance

Target: <1ms per validation
Use cached absolute paths where possible

Security Considerations

Path Traversal Prevention

CRITICAL: All file paths MUST be validated before use.

# Always validate before file operations
if not validate_note_path(file_path, data_dir):
    raise ValueError("Invalid file path: potential directory traversal")

Slug Validation

Prevent injection attacks through slugs:

# Validate slug before using in paths
if not validate_slug(slug):
    raise ValueError(f"Invalid slug: {slug}")

Random Suffix Security

Use cryptographically secure random:

# Use secrets module, not random module
import secrets
suffix = secrets.token_urlsafe(4)[:4].lower()

Dependencies

Standard Library

hashlib - SHA-256 hashing
re - Regular expressions
secrets - Secure random generation
pathlib - Path operations
datetime - Date/time handling
shutil - File operations (move)

Third-Party

flask - For current_app context (configuration access)

Internal

None (this is the foundation module)

Future Enhancements (V2+)

Slug Customization

Allow user to specify custom slug:

def generate_slug(content: str, custom_slug: Optional[str] = None, ...) -> str:
    if custom_slug:
        if validate_slug(custom_slug):
            return custom_slug
        raise ValueError("Invalid custom slug")
    # ... existing logic

Unicode Slug Support

Support non-ASCII characters in slugs:

def generate_unicode_slug(content: str) -> str:
    """Generate slug with unicode support"""
    # Use unicode normalization
    # Support CJK characters
    # Support diacritics

Content Preview Extraction

Extract preview/excerpt from content:

def extract_preview(content: str, max_length: int = 200) -> str:
    """Extract preview text from content"""
    # Strip markdown formatting
    # Truncate to max_length
    # Add ellipsis if truncated

Title Extraction

Extract title from first line:

def extract_title(content: str) -> Optional[str]:
    """Extract title from content (first line or # heading)"""
    # Check for # heading
    # Or use first line
    # Return None if empty

Acceptance Criteria

All functions implemented with type hints
All functions have comprehensive docstrings
Slug generation creates valid, URL-safe slugs
Slug uniqueness enforcement works correctly
Content hashing is consistent and correct
Path validation prevents directory traversal
Atomic file writes prevent corruption
Date formatting matches RFC-822 and ISO 8601 specs
All security checks are in place
Test coverage >90%
All tests pass
Code formatted with Black
Code passes flake8 linting
No hardcoded values (use constants)
Error messages are clear and actionable

References

Implementation Checklist

When implementing starpunk/utils.py, complete in this order:

Create file with module docstring
Add imports and constants
Implement helper functions (extract_first_words, normalize_slug_text, generate_random_suffix)
Implement slug functions (generate_slug, make_slug_unique, validate_slug)
Implement content hashing (calculate_content_hash)
Implement path functions (generate_note_path, ensure_note_directory, validate_note_path)
Implement file operations (write_note_file, read_note_file, delete_note_file)
Implement date/time functions (format_rfc822, format_iso8601, parse_iso8601)
Create tests/test_utils.py
Write tests for all functions
Run tests and achieve >90% coverage
Format with Black
Lint with flake8
Review all docstrings
Review all error messages

Estimated Time: 2-3 hours for implementation + tests

39 KiB Raw Blame History

Phase 1.1: Core Utilities Design

Overview

Design Principles

Module Structure

Function Specifications

1. Slug Generation

generate_slug(content: str, created_at: Optional[datetime] = None) -> str

make_slug_unique(base_slug: str, existing_slugs: set[str]) -> str

validate_slug(slug: str) -> bool

2. Content Hashing

calculate_content_hash(content: str) -> str

3. File Path Operations

generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path

ensure_note_directory(note_path: Path) -> Path

validate_note_path(file_path: Path, data_dir: Path) -> bool

4. Atomic File Operations

write_note_file(file_path: Path, content: str) -> None

read_note_file(file_path: Path) -> str

delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None

5. Date/Time Utilities

format_rfc822(dt: datetime) -> str

format_iso8601(dt: datetime) -> str

parse_iso8601(date_string: str) -> datetime

6. Helper Functions

extract_first_words(text: str, max_words: int = 5) -> str

normalize_slug_text(text: str) -> str

generate_random_suffix(length: int = 4) -> str

Constants

Error Handling

Exception Types

Error Messages

Testing Strategy

Test Coverage Requirements

Test Organization

Additional Test Cases

Usage Examples

Creating a Note

Reading a Note

Deleting a Note

Integration with Other Modules

Database Integration

RSS Feed Integration

Performance Considerations

Slug Generation Performance

File Operations Performance

Path Validation Performance

Security Considerations

Path Traversal Prevention

Slug Validation

Random Suffix Security

Dependencies

Standard Library

Third-Party

Internal

Future Enhancements (V2+)

Slug Customization

Unicode Slug Support

Content Preview Extraction

Title Extraction

Acceptance Criteria

References

Implementation Checklist

39 KiB

Raw Blame History

`generate_slug(content: str, created_at: Optional[datetime] = None) -> str`

`make_slug_unique(base_slug: str, existing_slugs: set[str]) -> str`

`validate_slug(slug: str) -> bool`

`calculate_content_hash(content: str) -> str`

`generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path`

`ensure_note_directory(note_path: Path) -> Path`

`validate_note_path(file_path: Path, data_dir: Path) -> bool`

`write_note_file(file_path: Path, content: str) -> None`

`read_note_file(file_path: Path) -> str`

`delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None`

`format_rfc822(dt: datetime) -> str`

`format_iso8601(dt: datetime) -> str`

`parse_iso8601(date_string: str) -> datetime`

`extract_first_words(text: str, max_words: int = 5) -> str`

`normalize_slug_text(text: str) -> str`

`generate_random_suffix(length: int = 4) -> str`