39 KiB
Phase 1.1: Core Utilities Design
Overview
This document provides a complete, implementation-ready design for Phase 1.1 of the StarPunk V1 implementation plan: Core Utilities. The utilities module (starpunk/utils.py) provides foundational functions used by all other components of the system.
Priority: CRITICAL - Required by all other features
Estimated Effort: 2-3 hours
Dependencies: None
File: starpunk/utils.py
Design Principles
- Pure functions - No side effects where possible
- Type safety - Full type hints on all functions
- Error handling - Specific exceptions with clear messages
- Security first - Validate all inputs, prevent path traversal
- Testable - Small, focused functions that are easy to test
- Well documented - Comprehensive docstrings with examples
Module Structure
"""
Core utility functions for StarPunk
This module provides essential utilities for slug generation, file operations,
hashing, and date/time handling. These utilities are used throughout the
application and have no external dependencies beyond standard library and
Flask configuration.
"""
# Standard library imports
import hashlib
import re
import secrets
from datetime import datetime
from pathlib import Path
from typing import Optional
# Third-party imports
from flask import current_app
# Constants
MAX_SLUG_LENGTH = 100
MIN_SLUG_LENGTH = 1
SLUG_WORDS_COUNT = 5
RANDOM_SUFFIX_LENGTH = 4
CONTENT_HASH_ALGORITHM = "sha256"
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')
MULTIPLE_HYPHENS_PATTERN = re.compile(r'-+')
# Utility functions (defined below)
Function Specifications
1. Slug Generation
generate_slug(content: str, created_at: Optional[datetime] = None) -> str
Purpose: Generate a URL-safe slug from note content or timestamp.
Algorithm:
- Extract first N words from content (configurable, default 5)
- Convert to lowercase
- Replace spaces with hyphens
- Remove all characters except a-z, 0-9, and hyphens
- Remove leading/trailing hyphens
- Collapse multiple consecutive hyphens to single hyphen
- Truncate to maximum length
- If result is empty or too short, fall back to timestamp-based slug
- Return slug (uniqueness check handled by caller)
Type Signature:
def generate_slug(content: str, created_at: Optional[datetime] = None) -> str:
"""
Generate URL-safe slug from note content
Creates a slug by extracting the first few words from the content and
normalizing them to lowercase with hyphens. If content is insufficient,
falls back to timestamp-based slug.
Args:
content: The note content (markdown text)
created_at: Optional timestamp for fallback slug (defaults to now)
Returns:
URL-safe slug string (lowercase, alphanumeric + hyphens only)
Raises:
ValueError: If content is empty or contains only whitespace
Examples:
>>> generate_slug("Hello World! This is my first note.")
'hello-world-this-is-my'
>>> generate_slug("Testing... with special chars!@#")
'testing-with-special-chars'
>>> generate_slug("A") # Too short, uses timestamp
'20241118-143022'
>>> generate_slug(" ") # Empty content
ValueError: Content cannot be empty
Notes:
- This function does NOT check for uniqueness
- Caller must verify slug doesn't exist in database
- Use make_slug_unique() to add random suffix if needed
"""
Implementation Details:
- Use regex to remove non-alphanumeric characters (except hyphens)
- Handle Unicode: normalize to ASCII first, remove non-ASCII characters
- Maximum slug length: 100 characters (configurable constant)
- Minimum slug length: 1 character (if shorter, use timestamp fallback)
- Timestamp format:
YYYYMMDD-HHMMSS(e.g.,20241118-143022) - Extract words: split on whitespace, take first N non-empty tokens
Edge Cases:
- Empty content → ValueError
- Content with only special characters → timestamp fallback
- Content with only 1-2 characters → timestamp fallback
- Very long first words → truncate to max length
- Unicode/emoji content → strip to ASCII alphanumeric
make_slug_unique(base_slug: str, existing_slugs: set[str]) -> str
Purpose: Add random suffix to slug if it already exists.
Type Signature:
def make_slug_unique(base_slug: str, existing_slugs: set[str]) -> str:
"""
Make a slug unique by adding random suffix if needed
If the base_slug already exists in the provided set, appends a random
alphanumeric suffix until a unique slug is found.
Args:
base_slug: The base slug to make unique
existing_slugs: Set of existing slugs to check against
Returns:
Unique slug (base_slug or base_slug-{random})
Examples:
>>> make_slug_unique("test-note", set())
'test-note'
>>> make_slug_unique("test-note", {"test-note"})
'test-note-a7c9' # Random suffix
>>> make_slug_unique("test-note", {"test-note", "test-note-a7c9"})
'test-note-x3k2' # Different random suffix
Notes:
- Random suffix is 4 lowercase alphanumeric characters
- Extremely low collision probability (36^4 = 1.6M combinations)
- Will retry up to 100 times if collision occurs (should never happen)
"""
Implementation Details:
- Check if base_slug exists in set
- If not, return base_slug unchanged
- If exists, append
-{random}where random is 4 alphanumeric chars - Use
secrets.token_urlsafe()for random generation (secure) - Retry if collision occurs (max 100 attempts, raise error if all fail)
validate_slug(slug: str) -> bool
Purpose: Validate that a slug meets all requirements.
Type Signature:
def validate_slug(slug: str) -> bool:
"""
Validate that a slug meets all requirements
Checks that slug contains only allowed characters and is within
length limits.
Args:
slug: The slug to validate
Returns:
True if slug is valid, False otherwise
Rules:
- Must contain only: a-z, 0-9, hyphen (-)
- Must be between 1 and 100 characters
- Cannot start or end with hyphen
- Cannot contain consecutive hyphens
Examples:
>>> validate_slug("hello-world")
True
>>> validate_slug("Hello-World") # Uppercase
False
>>> validate_slug("-hello") # Leading hyphen
False
>>> validate_slug("hello--world") # Double hyphen
False
"""
Implementation Details:
- Regex pattern:
^[a-z0-9]+(?:-[a-z0-9]+)*$ - Length check: 1 <= len(slug) <= 100
- No additional normalization (validation only)
2. Content Hashing
calculate_content_hash(content: str) -> str
Purpose: Calculate SHA-256 hash of note content for change detection.
Type Signature:
def calculate_content_hash(content: str) -> str:
"""
Calculate SHA-256 hash of content
Generates a cryptographic hash of the content for change detection
and cache invalidation. Uses UTF-8 encoding.
Args:
content: The content to hash (markdown text)
Returns:
Hexadecimal hash string (64 characters)
Examples:
>>> calculate_content_hash("Hello World")
'a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e'
>>> calculate_content_hash("")
'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
Notes:
- Same content always produces same hash
- Hash is deterministic across systems
- Useful for detecting external file modifications
- SHA-256 chosen for security and wide support
"""
Implementation Details:
- Use
hashlib.sha256() - Encode content as UTF-8 bytes
- Return hexadecimal digest (lowercase)
- Handle empty content (valid use case)
3. File Path Operations
generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path
Purpose: Generate file path for note using year/month directory structure.
Type Signature:
def generate_note_path(slug: str, created_at: datetime, data_dir: Path) -> Path:
"""
Generate file path for a note
Creates path following pattern: data/notes/YYYY/MM/slug.md
Args:
slug: URL-safe slug for the note
created_at: Creation timestamp (determines YYYY/MM)
data_dir: Base data directory path
Returns:
Full Path object for the note file
Raises:
ValueError: If slug is invalid
Examples:
>>> from datetime import datetime
>>> from pathlib import Path
>>> dt = datetime(2024, 11, 18, 14, 30)
>>> generate_note_path("test-note", dt, Path("data"))
PosixPath('data/notes/2024/11/test-note.md')
Notes:
- Does NOT create directories (use ensure_note_directory)
- Does NOT check if file exists
- Validates slug before generating path
"""
Implementation Details:
- Validate slug using
validate_slug() - Extract year and month from created_at:
created_at.strftime("%Y"),"%m" - Build path:
data_dir / "notes" / year / month / f"{slug}.md" - Return Path object (not string)
ensure_note_directory(note_path: Path) -> Path
Purpose: Create directory structure for note if it doesn't exist.
Type Signature:
def ensure_note_directory(note_path: Path) -> Path:
"""
Ensure directory exists for note file
Creates parent directories if they don't exist. Safe to call
even if directories already exist.
Args:
note_path: Full path to note file
Returns:
Parent directory path
Raises:
OSError: If directory cannot be created (permissions, etc.)
Examples:
>>> note_path = Path("data/notes/2024/11/test-note.md")
>>> ensure_note_directory(note_path)
PosixPath('data/notes/2024/11')
"""
Implementation Details:
- Use
note_path.parent.mkdir(parents=True, exist_ok=True) - Return parent directory path
- Let OSError propagate if creation fails (permissions, disk full, etc.)
validate_note_path(file_path: Path, data_dir: Path) -> bool
Purpose: Validate that file path is within data directory (prevent path traversal).
Type Signature:
def validate_note_path(file_path: Path, data_dir: Path) -> bool:
"""
Validate that file path is within data directory
Security check to prevent path traversal attacks. Ensures the
resolved path is within the allowed data directory.
Args:
file_path: Path to validate
data_dir: Base data directory that must contain file_path
Returns:
True if path is safe, False otherwise
Examples:
>>> validate_note_path(
... Path("data/notes/2024/11/note.md"),
... Path("data")
... )
True
>>> validate_note_path(
... Path("data/notes/../../etc/passwd"),
... Path("data")
... )
False
Security:
- Resolves symlinks and relative paths
- Checks if resolved path is child of data_dir
- Prevents directory traversal attacks
"""
Implementation Details:
- Resolve both paths to absolute:
file_path.resolve(),data_dir.resolve() - Check if file_path is relative to data_dir: use
file_path.is_relative_to(data_dir) - Return boolean (no exceptions)
- Critical for security - MUST be called before any file operations
4. Atomic File Operations
write_note_file(file_path: Path, content: str) -> None
Purpose: Write note content to file atomically.
Type Signature:
def write_note_file(file_path: Path, content: str) -> None:
"""
Write note content to file atomically
Writes to temporary file first, then atomically renames to final path.
This prevents corruption if write is interrupted.
Args:
file_path: Destination file path
content: Content to write (markdown text)
Raises:
OSError: If file cannot be written
ValueError: If file_path is invalid
Examples:
>>> write_note_file(Path("data/notes/2024/11/test.md"), "# Test")
Implementation:
1. Create temp file: {file_path}.tmp
2. Write content to temp file
3. Atomically rename temp to final path
4. If any step fails, clean up temp file
Notes:
- Atomic rename is guaranteed on POSIX systems
- Temp file created in same directory as target
- UTF-8 encoding used for all text
"""
Implementation Details:
- Create temp path:
temp_path = file_path.with_suffix(file_path.suffix + '.tmp') - Write to temp file:
temp_path.write_text(content, encoding='utf-8') - Atomic rename:
temp_path.replace(file_path) - On exception: delete temp file if exists, re-raise
- Use context manager or try/finally for cleanup
read_note_file(file_path: Path) -> str
Purpose: Read note content from file.
Type Signature:
def read_note_file(file_path: Path) -> str:
"""
Read note content from file
Args:
file_path: Path to note file
Returns:
File content as string
Raises:
FileNotFoundError: If file doesn't exist
OSError: If file cannot be read
Examples:
>>> content = read_note_file(Path("data/notes/2024/11/test.md"))
>>> print(content)
# Test Note
"""
Implementation Details:
- Use
file_path.read_text(encoding='utf-8') - Let exceptions propagate (FileNotFoundError, OSError, etc.)
- No additional error handling needed
delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None
Purpose: Delete note file (soft or hard delete).
Type Signature:
def delete_note_file(file_path: Path, soft: bool = False, data_dir: Optional[Path] = None) -> None:
"""
Delete note file from filesystem
Supports soft delete (move to trash) or hard delete (permanent removal).
Args:
file_path: Path to note file
soft: If True, move to .trash/ directory; if False, delete permanently
data_dir: Required if soft=True, base data directory
Raises:
FileNotFoundError: If file doesn't exist
ValueError: If soft=True but data_dir not provided
OSError: If file cannot be deleted or moved
Examples:
>>> # Hard delete
>>> delete_note_file(Path("data/notes/2024/11/test.md"))
>>> # Soft delete (move to trash)
>>> delete_note_file(
... Path("data/notes/2024/11/test.md"),
... soft=True,
... data_dir=Path("data")
... )
"""
Implementation Details:
- If soft delete:
- Require data_dir parameter
- Create trash directory:
data_dir / ".trash" / year / month - Move file to trash with same filename
- Use
shutil.move()orfile_path.replace()
- If hard delete:
- Use
file_path.unlink()
- Use
- Handle FileNotFoundError gracefully (file already gone is success)
5. Date/Time Utilities
format_rfc822(dt: datetime) -> str
Purpose: Format datetime as RFC-822 string for RSS feeds.
Type Signature:
def format_rfc822(dt: datetime) -> str:
"""
Format datetime as RFC-822 string
Converts datetime to RFC-822 format required by RSS 2.0 specification.
Assumes UTC timezone.
Args:
dt: Datetime to format (assumed UTC)
Returns:
RFC-822 formatted string
Examples:
>>> from datetime import datetime
>>> dt = datetime(2024, 11, 18, 14, 30, 45)
>>> format_rfc822(dt)
'Mon, 18 Nov 2024 14:30:45 +0000'
References:
- RSS 2.0 spec: https://www.rssboard.org/rss-specification
- RFC-822 date format
"""
Implementation Details:
- Use
dt.strftime()with format:"%a, %d %b %Y %H:%M:%S +0000" - Assume UTC timezone (all timestamps stored as UTC)
- Format pattern: "DDD, DD MMM YYYY HH:MM:SS +0000"
format_iso8601(dt: datetime) -> str
Purpose: Format datetime as ISO 8601 string.
Type Signature:
def format_iso8601(dt: datetime) -> str:
"""
Format datetime as ISO 8601 string
Converts datetime to ISO 8601 format for timestamps and APIs.
Args:
dt: Datetime to format
Returns:
ISO 8601 formatted string
Examples:
>>> from datetime import datetime
>>> dt = datetime(2024, 11, 18, 14, 30, 45)
>>> format_iso8601(dt)
'2024-11-18T14:30:45Z'
"""
Implementation Details:
- Use
dt.isoformat()with 'Z' suffix for UTC - Format:
YYYY-MM-DDTHH:MM:SSZ - Or use
dt.strftime("%Y-%m-%dT%H:%M:%SZ")
parse_iso8601(date_string: str) -> datetime
Purpose: Parse ISO 8601 string to datetime.
Type Signature:
def parse_iso8601(date_string: str) -> datetime:
"""
Parse ISO 8601 string to datetime
Args:
date_string: ISO 8601 formatted string
Returns:
Datetime object (UTC)
Raises:
ValueError: If string is not valid ISO 8601 format
Examples:
>>> parse_iso8601("2024-11-18T14:30:45Z")
datetime.datetime(2024, 11, 18, 14, 30, 45)
"""
Implementation Details:
- Use
datetime.fromisoformat()(remove 'Z' suffix first if present) - Or use
datetime.strptime()with format string - Let ValueError propagate if invalid format
6. Helper Functions
extract_first_words(text: str, max_words: int = 5) -> str
Purpose: Extract first N words from text (helper for slug generation).
Type Signature:
def extract_first_words(text: str, max_words: int = 5) -> str:
"""
Extract first N words from text
Helper function for slug generation. Splits text on whitespace
and returns first N non-empty words.
Args:
text: Text to extract words from
max_words: Maximum number of words to extract
Returns:
Space-separated string of first N words
Examples:
>>> extract_first_words("Hello world this is a test", 3)
'Hello world this'
>>> extract_first_words(" Multiple spaces ", 2)
'Multiple spaces'
"""
Implementation Details:
- Strip whitespace from text
- Split on whitespace:
text.split() - Filter empty strings
- Take first max_words:
words[:max_words] - Join with single space:
" ".join(words)
normalize_slug_text(text: str) -> str
Purpose: Normalize text for slug (lowercase, hyphens, alphanumeric only).
Type Signature:
def normalize_slug_text(text: str) -> str:
"""
Normalize text for use in slug
Converts to lowercase, replaces spaces with hyphens, removes
special characters, and collapses multiple hyphens.
Args:
text: Text to normalize
Returns:
Normalized slug-safe text
Examples:
>>> normalize_slug_text("Hello World!")
'hello-world'
>>> normalize_slug_text("Testing... with -- special chars!")
'testing-with-special-chars'
"""
Implementation Details:
- Convert to lowercase:
text.lower() - Replace spaces with hyphens:
text.replace(' ', '-') - Remove non-alphanumeric (except hyphens): regex
[^a-z0-9-]→'' - Collapse multiple hyphens: regex
-+→- - Strip leading/trailing hyphens:
text.strip('-')
generate_random_suffix(length: int = 4) -> str
Purpose: Generate random alphanumeric suffix for slug uniqueness.
Type Signature:
def generate_random_suffix(length: int = 4) -> str:
"""
Generate random alphanumeric suffix
Creates a secure random string for making slugs unique.
Uses lowercase letters and numbers only.
Args:
length: Length of suffix (default 4)
Returns:
Random alphanumeric string
Examples:
>>> suffix = generate_random_suffix()
>>> len(suffix)
4
>>> suffix.isalnum()
True
"""
Implementation Details:
- Use
secrets.token_urlsafe()orsecrets.choice() - Character set:
a-z0-9(36 characters) - Generate random string of specified length
- Example:
'a7c9','x3k2','p9m1'
Constants
# Slug configuration
MAX_SLUG_LENGTH = 100 # Maximum slug length in characters
MIN_SLUG_LENGTH = 1 # Minimum slug length in characters
SLUG_WORDS_COUNT = 5 # Number of words to extract for slug
RANDOM_SUFFIX_LENGTH = 4 # Length of random suffix for uniqueness
# File operations
TEMP_FILE_SUFFIX = '.tmp' # Suffix for temporary files
TRASH_DIR_NAME = '.trash' # Directory name for soft-deleted files
# Hashing
CONTENT_HASH_ALGORITHM = 'sha256' # Hash algorithm for content
# Regex patterns
SLUG_PATTERN = re.compile(r'^[a-z0-9]+(?:-[a-z0-9]+)*$')
SAFE_SLUG_PATTERN = re.compile(r'[^a-z0-9-]')
MULTIPLE_HYPHENS_PATTERN = re.compile(r'-+')
WHITESPACE_PATTERN = re.compile(r'\s+')
# Character sets
RANDOM_CHARS = 'abcdefghijklmnopqrstuvwxyz0123456789'
Error Handling
Exception Types
The module uses standard Python exceptions:
- ValueError: Invalid input (empty content, invalid slug, etc.)
- FileNotFoundError: File doesn't exist when expected
- FileExistsError: File already exists (not used in utils, but caller may use)
- OSError: General I/O errors (permissions, disk full, etc.)
Error Messages
Error messages should be specific and actionable:
Good:
raise ValueError(
f"Invalid slug '{slug}': must contain only lowercase letters, "
f"numbers, and hyphens"
)
raise ValueError("Content cannot be empty or whitespace-only")
raise OSError(
f"Failed to write note file: {file_path}. "
f"Check permissions and disk space."
)
Bad:
raise ValueError("Invalid slug")
raise ValueError("Bad input")
raise OSError("Write failed")
Testing Strategy
Test Coverage Requirements
- Minimum 90% code coverage for utils.py
- Test all functions with multiple scenarios
- Test edge cases and error conditions
- Test security (path traversal, etc.)
Test Organization
File: tests/test_utils.py
"""
Tests for utility functions
Organized by function category:
- Slug generation tests
- Content hashing tests
- File path tests
- Atomic file operations tests
- Date/time formatting tests
"""
import pytest
from pathlib import Path
from datetime import datetime
from starpunk.utils import (
generate_slug,
make_slug_unique,
validate_slug,
calculate_content_hash,
generate_note_path,
# ... etc
)
class TestSlugGeneration:
"""Test slug generation functions"""
def test_generate_slug_from_content(self):
"""Test basic slug generation from content"""
slug = generate_slug("Hello World This Is My Note")
assert slug == "hello-world-this-is-my"
def test_generate_slug_special_characters(self):
"""Test slug generation removes special characters"""
slug = generate_slug("Testing... with!@# special chars")
assert slug == "testing-with-special-chars"
def test_generate_slug_empty_content(self):
"""Test slug generation raises error on empty content"""
with pytest.raises(ValueError):
generate_slug("")
def test_generate_slug_whitespace_only(self):
"""Test slug generation raises error on whitespace-only content"""
with pytest.raises(ValueError):
generate_slug(" \n\t ")
def test_generate_slug_short_content_fallback(self):
"""Test slug generation falls back to timestamp for short content"""
slug = generate_slug("A", created_at=datetime(2024, 11, 18, 14, 30))
assert slug == "20241118-143000" # Timestamp format
def test_generate_slug_unicode_content(self):
"""Test slug generation handles unicode"""
slug = generate_slug("Hello 世界 World")
# Should strip non-ASCII and keep ASCII words
assert "hello" in slug
assert "world" in slug
def test_make_slug_unique_no_collision(self):
"""Test make_slug_unique returns original if no collision"""
slug = make_slug_unique("test-note", set())
assert slug == "test-note"
def test_make_slug_unique_with_collision(self):
"""Test make_slug_unique adds suffix on collision"""
slug = make_slug_unique("test-note", {"test-note"})
assert slug.startswith("test-note-")
assert len(slug) == len("test-note-") + 4 # 4 char suffix
def test_validate_slug_valid(self):
"""Test validate_slug accepts valid slugs"""
assert validate_slug("hello-world") is True
assert validate_slug("test-note-123") is True
assert validate_slug("a") is True
def test_validate_slug_invalid(self):
"""Test validate_slug rejects invalid slugs"""
assert validate_slug("Hello-World") is False # Uppercase
assert validate_slug("-hello") is False # Leading hyphen
assert validate_slug("hello-") is False # Trailing hyphen
assert validate_slug("hello--world") is False # Double hyphen
assert validate_slug("hello_world") is False # Underscore
assert validate_slug("") is False # Empty
class TestContentHashing:
"""Test content hashing functions"""
def test_calculate_content_hash_consistency(self):
"""Test hash is consistent for same content"""
hash1 = calculate_content_hash("Test content")
hash2 = calculate_content_hash("Test content")
assert hash1 == hash2
def test_calculate_content_hash_different(self):
"""Test different content produces different hash"""
hash1 = calculate_content_hash("Test content 1")
hash2 = calculate_content_hash("Test content 2")
assert hash1 != hash2
def test_calculate_content_hash_empty(self):
"""Test hash of empty string"""
hash_empty = calculate_content_hash("")
assert len(hash_empty) == 64 # SHA-256 produces 64 hex chars
def test_calculate_content_hash_unicode(self):
"""Test hash handles unicode correctly"""
hash_val = calculate_content_hash("Hello 世界")
assert len(hash_val) == 64
class TestFilePathOperations:
"""Test file path generation and validation"""
def test_generate_note_path(self):
"""Test note path generation"""
dt = datetime(2024, 11, 18, 14, 30)
path = generate_note_path("test-note", dt, Path("data"))
assert path == Path("data/notes/2024/11/test-note.md")
def test_generate_note_path_invalid_slug(self):
"""Test note path generation rejects invalid slug"""
dt = datetime(2024, 11, 18)
with pytest.raises(ValueError):
generate_note_path("Invalid Slug!", dt, Path("data"))
def test_validate_note_path_safe(self):
"""Test path validation accepts safe paths"""
assert validate_note_path(
Path("data/notes/2024/11/note.md"),
Path("data")
) is True
def test_validate_note_path_traversal(self):
"""Test path validation rejects traversal attempts"""
assert validate_note_path(
Path("data/notes/../../etc/passwd"),
Path("data")
) is False
def test_validate_note_path_absolute_outside(self):
"""Test path validation rejects paths outside data dir"""
assert validate_note_path(
Path("/etc/passwd"),
Path("data")
) is False
class TestAtomicFileOperations:
"""Test atomic file write/read/delete operations"""
def test_write_and_read_note_file(self, tmp_path):
"""Test writing and reading note file"""
file_path = tmp_path / "test.md"
content = "# Test Note\n\nThis is a test."
write_note_file(file_path, content)
assert file_path.exists()
read_content = read_note_file(file_path)
assert read_content == content
def test_write_note_file_atomic(self, tmp_path):
"""Test write is atomic (temp file used)"""
file_path = tmp_path / "test.md"
temp_path = Path(str(file_path) + '.tmp')
# Temp file should not exist after write
write_note_file(file_path, "Test")
assert not temp_path.exists()
assert file_path.exists()
def test_read_note_file_not_found(self, tmp_path):
"""Test reading non-existent file raises error"""
file_path = tmp_path / "nonexistent.md"
with pytest.raises(FileNotFoundError):
read_note_file(file_path)
def test_delete_note_file_hard(self, tmp_path):
"""Test hard delete removes file"""
file_path = tmp_path / "test.md"
file_path.write_text("Test")
delete_note_file(file_path, soft=False)
assert not file_path.exists()
def test_delete_note_file_soft(self, tmp_path):
"""Test soft delete moves file to trash"""
# Create note file
notes_dir = tmp_path / "notes" / "2024" / "11"
notes_dir.mkdir(parents=True)
file_path = notes_dir / "test.md"
file_path.write_text("Test")
# Soft delete
delete_note_file(file_path, soft=True, data_dir=tmp_path)
# Original should be gone
assert not file_path.exists()
# Should be in trash
trash_path = tmp_path / ".trash" / "2024" / "11" / "test.md"
assert trash_path.exists()
class TestDateTimeFormatting:
"""Test date/time formatting functions"""
def test_format_rfc822(self):
"""Test RFC-822 date formatting"""
dt = datetime(2024, 11, 18, 14, 30, 45)
formatted = format_rfc822(dt)
assert formatted == "Mon, 18 Nov 2024 14:30:45 +0000"
def test_format_iso8601(self):
"""Test ISO 8601 date formatting"""
dt = datetime(2024, 11, 18, 14, 30, 45)
formatted = format_iso8601(dt)
assert formatted == "2024-11-18T14:30:45Z"
def test_parse_iso8601(self):
"""Test ISO 8601 date parsing"""
dt = parse_iso8601("2024-11-18T14:30:45Z")
assert dt.year == 2024
assert dt.month == 11
assert dt.day == 18
assert dt.hour == 14
assert dt.minute == 30
assert dt.second == 45
def test_parse_iso8601_invalid(self):
"""Test ISO 8601 parsing rejects invalid format"""
with pytest.raises(ValueError):
parse_iso8601("not-a-date")
Additional Test Cases
Security Tests:
- Path traversal attempts with various patterns
- Symlink handling
- Unicode/emoji in slugs
- Very long input strings
- Malicious input patterns
Edge Cases:
- Empty content
- Whitespace-only content
- Very long content (>1MB)
- Unicode normalization issues
- Concurrent file operations (if applicable)
Performance Tests:
- Slug generation with 10,000 character input
- Hash calculation with large content
- Path validation with deep directory structures
Usage Examples
Creating a Note
from datetime import datetime
from pathlib import Path
from starpunk.utils import (
generate_slug,
make_slug_unique,
generate_note_path,
ensure_note_directory,
write_note_file,
calculate_content_hash
)
# Note content
content = "This is my first note about Python programming"
created_at = datetime.utcnow()
data_dir = Path("data")
# Generate slug
base_slug = generate_slug(content, created_at) # "this-is-my-first-note"
# Check uniqueness (pseudo-code, actual check via database)
existing_slugs = get_existing_slugs_from_db() # Returns set of slugs
slug = make_slug_unique(base_slug, existing_slugs)
# Generate file path
note_path = generate_note_path(slug, created_at, data_dir)
# Result: data/notes/2024/11/this-is-my-first-note.md
# Create directory
ensure_note_directory(note_path)
# Write file
write_note_file(note_path, content)
# Calculate hash for database
content_hash = calculate_content_hash(content)
# Save metadata to database (pseudo-code)
save_note_metadata(
slug=slug,
file_path=str(note_path.relative_to(data_dir)),
created_at=created_at,
content_hash=content_hash
)
Reading a Note
from pathlib import Path
from starpunk.utils import read_note_file, validate_note_path
# Get file path from database (pseudo-code)
note = get_note_from_db(slug="my-first-note")
file_path = Path("data") / note.file_path
# Validate path (security check)
if not validate_note_path(file_path, Path("data")):
raise ValueError("Invalid file path")
# Read content
content = read_note_file(file_path)
Deleting a Note
from pathlib import Path
from starpunk.utils import delete_note_file
# Get note path from database
note = get_note_from_db(slug="old-note")
file_path = Path("data") / note.file_path
# Soft delete (move to trash)
delete_note_file(file_path, soft=True, data_dir=Path("data"))
# Update database to mark as deleted (pseudo-code)
mark_note_deleted(note.id)
Integration with Other Modules
Database Integration
The database module will use utilities like this:
# In starpunk/notes.py
from starpunk.utils import (
generate_slug, make_slug_unique, generate_note_path,
ensure_note_directory, write_note_file, calculate_content_hash
)
from starpunk.database import get_db
def create_note(content: str, published: bool = False) -> Note:
"""Create a new note"""
db = get_db()
created_at = datetime.utcnow()
# Generate unique slug
base_slug = generate_slug(content, created_at)
existing_slugs = set(row[0] for row in db.execute("SELECT slug FROM notes").fetchall())
slug = make_slug_unique(base_slug, existing_slugs)
# Generate path and write file
note_path = generate_note_path(slug, created_at, Path(current_app.config['DATA_PATH']))
ensure_note_directory(note_path)
write_note_file(note_path, content)
# Calculate hash
content_hash = calculate_content_hash(content)
# Insert into database
db.execute(
"INSERT INTO notes (slug, file_path, published, created_at, updated_at, content_hash) "
"VALUES (?, ?, ?, ?, ?, ?)",
(slug, str(note_path.relative_to(Path(current_app.config['DATA_PATH']))),
published, created_at, created_at, content_hash)
)
db.commit()
return Note(slug=slug, content=content, created_at=created_at, ...)
RSS Feed Integration
The feed module will use date formatting:
# In starpunk/feed.py
from starpunk.utils import format_rfc822
def generate_rss_feed():
"""Generate RSS feed"""
fg = FeedGenerator()
for note in get_published_notes():
fe = fg.add_entry()
fe.pubDate(format_rfc822(note.created_at))
Performance Considerations
Slug Generation Performance
- Expected input: 50-1000 characters
- Target: <1ms per slug generation
- Bottleneck: None (string operations are fast)
File Operations Performance
- Write operation: <10ms for typical note (1-10KB)
- Read operation: <5ms for typical note
- Atomic rename: ~1ms (OS-level operation)
Path Validation Performance
- Target: <1ms per validation
- Use cached absolute paths where possible
Security Considerations
Path Traversal Prevention
CRITICAL: All file paths MUST be validated before use.
# Always validate before file operations
if not validate_note_path(file_path, data_dir):
raise ValueError("Invalid file path: potential directory traversal")
Slug Validation
Prevent injection attacks through slugs:
# Validate slug before using in paths
if not validate_slug(slug):
raise ValueError(f"Invalid slug: {slug}")
Random Suffix Security
Use cryptographically secure random:
# Use secrets module, not random module
import secrets
suffix = secrets.token_urlsafe(4)[:4].lower()
Dependencies
Standard Library
hashlib- SHA-256 hashingre- Regular expressionssecrets- Secure random generationpathlib- Path operationsdatetime- Date/time handlingshutil- File operations (move)
Third-Party
flask- Forcurrent_appcontext (configuration access)
Internal
- None (this is the foundation module)
Future Enhancements (V2+)
Slug Customization
Allow user to specify custom slug:
def generate_slug(content: str, custom_slug: Optional[str] = None, ...) -> str:
if custom_slug:
if validate_slug(custom_slug):
return custom_slug
raise ValueError("Invalid custom slug")
# ... existing logic
Unicode Slug Support
Support non-ASCII characters in slugs:
def generate_unicode_slug(content: str) -> str:
"""Generate slug with unicode support"""
# Use unicode normalization
# Support CJK characters
# Support diacritics
Content Preview Extraction
Extract preview/excerpt from content:
def extract_preview(content: str, max_length: int = 200) -> str:
"""Extract preview text from content"""
# Strip markdown formatting
# Truncate to max_length
# Add ellipsis if truncated
Title Extraction
Extract title from first line:
def extract_title(content: str) -> Optional[str]:
"""Extract title from content (first line or # heading)"""
# Check for # heading
# Or use first line
# Return None if empty
Acceptance Criteria
- All functions implemented with type hints
- All functions have comprehensive docstrings
- Slug generation creates valid, URL-safe slugs
- Slug uniqueness enforcement works correctly
- Content hashing is consistent and correct
- Path validation prevents directory traversal
- Atomic file writes prevent corruption
- Date formatting matches RFC-822 and ISO 8601 specs
- All security checks are in place
- Test coverage >90%
- All tests pass
- Code formatted with Black
- Code passes flake8 linting
- No hardcoded values (use constants)
- Error messages are clear and actionable
References
- ADR-004: File-Based Note Storage
- Python Coding Standards
- Project Structure
- CommonMark Specification
- RSS 2.0 Specification
- ISO 8601 Date Format
- RFC-822 Date Format
- Python pathlib Documentation
- Python secrets Documentation
Implementation Checklist
When implementing starpunk/utils.py, complete in this order:
- Create file with module docstring
- Add imports and constants
- Implement helper functions (extract_first_words, normalize_slug_text, generate_random_suffix)
- Implement slug functions (generate_slug, make_slug_unique, validate_slug)
- Implement content hashing (calculate_content_hash)
- Implement path functions (generate_note_path, ensure_note_directory, validate_note_path)
- Implement file operations (write_note_file, read_note_file, delete_note_file)
- Implement date/time functions (format_rfc822, format_iso8601, parse_iso8601)
- Create tests/test_utils.py
- Write tests for all functions
- Run tests and achieve >90% coverage
- Format with Black
- Lint with flake8
- Review all docstrings
- Review all error messages
Estimated Time: 2-3 hours for implementation + tests