Files
StarPunk/docs/design/phase-1.2-quick-reference.md
2025-11-18 19:21:31 -07:00

15 KiB

Phase 1.2 Quick Reference: Data Models

Quick Start

File: starpunk/models.py Tests: tests/test_models.py Estimated Time: 3-4 hours Dependencies: starpunk/utils.py, starpunk/database.py

Implementation Order

  1. Module docstring, imports, and constants
  2. Note model (most complex)
  3. Session model
  4. Token model
  5. AuthState model (simplest)
  6. Tests for all models

Model Checklist

Note Model (10 items)

  • Dataclass structure with all fields
  • from_row(row: dict, data_dir: Path) -> Note class method
  • content property (lazy loading from file)
  • html property (markdown rendering + caching)
  • title property (extract from content)
  • excerpt property (first 200 chars)
  • permalink property (URL path)
  • is_published property (alias)
  • to_dict(include_content, include_html) -> dict method
  • verify_integrity() -> bool method

Session Model (6 items)

  • Dataclass structure with all fields
  • from_row(row: dict) -> Session class method
  • is_expired property
  • is_valid() -> bool method
  • with_updated_last_used() -> Session method
  • to_dict() -> dict method

Token Model (6 items)

  • Dataclass structure with all fields
  • from_row(row: dict) -> Token class method
  • scopes property (list of scope strings)
  • has_scope(required_scope: str) -> bool method
  • is_valid(required_scope: Optional[str]) -> bool method
  • to_dict() -> dict method

AuthState Model (4 items)

  • Dataclass structure with all fields
  • from_row(row: dict) -> AuthState class method
  • is_expired property
  • is_valid() -> bool method
  • to_dict() -> dict method

Total: 4 models, 26 methods/properties

Constants Required

# Session configuration
DEFAULT_SESSION_EXPIRY_DAYS = 30
SESSION_EXTENSION_ON_USE = True

# Auth state configuration
DEFAULT_AUTH_STATE_EXPIRY_MINUTES = 5

# Token configuration
DEFAULT_TOKEN_EXPIRY_DAYS = 90

# Markdown rendering
MARKDOWN_EXTENSIONS = ['extra', 'codehilite', 'nl2br']

# Content limits
MAX_TITLE_LENGTH = 200
EXCERPT_LENGTH = 200

Key Design Patterns

Frozen Dataclasses

@dataclass(frozen=True)
class Note:
    # Core fields
    id: int
    slug: str
    # ... more fields

    # Internal fields (not from database)
    _data_dir: Path = field(repr=False, compare=False)
    _cached_content: Optional[str] = field(
        default=None,
        repr=False,
        compare=False,
        init=False
    )

Lazy Loading Pattern

@property
def content(self) -> str:
    """Lazy-load content from file"""
    if self._cached_content is None:
        # Read from file
        file_path = self._data_dir / self.file_path
        content = read_note_file(file_path)
        # Cache it (use object.__setattr__ for frozen dataclass)
        object.__setattr__(self, '_cached_content', content)
    return self._cached_content

from_row Pattern

@classmethod
def from_row(cls, row: dict, data_dir: Path = None) -> 'Note':
    """Create instance from database row"""
    # Handle sqlite3.Row or dict
    if hasattr(row, 'keys'):
        data = {key: row[key] for key in row.keys()}
    else:
        data = row

    # Convert timestamps if needed
    if isinstance(data['created_at'], str):
        data['created_at'] = datetime.fromisoformat(data['created_at'])

    return cls(
        id=data['id'],
        slug=data['slug'],
        # ... more fields
        _data_dir=data_dir
    )

Immutable Update Pattern

def with_updated_last_used(self) -> 'Session':
    """Create new session with updated timestamp"""
    from dataclasses import replace
    return replace(self, last_used_at=datetime.utcnow())

Test Coverage Requirements

  • Minimum 90% code coverage
  • Test all model creation (from_row)
  • Test all properties and methods
  • Test lazy loading behavior
  • Test caching behavior
  • Test edge cases (empty content, expired sessions, etc.)
  • Test error cases (file not found, invalid data)

Example Test Structure

class TestNoteModel:
    def test_from_row(self): pass
    def test_content_lazy_loading(self, tmp_path): pass
    def test_content_caching(self, tmp_path): pass
    def test_html_rendering(self, tmp_path): pass
    def test_html_caching(self, tmp_path): pass
    def test_title_extraction(self, tmp_path): pass
    def test_title_fallback_to_slug(self): pass
    def test_excerpt_generation(self, tmp_path): pass
    def test_permalink(self): pass
    def test_to_dict_basic(self): pass
    def test_to_dict_with_content(self, tmp_path): pass
    def test_verify_integrity_success(self, tmp_path): pass
    def test_verify_integrity_failure(self, tmp_path): pass

class TestSessionModel:
    def test_from_row(self): pass
    def test_is_expired_false(self): pass
    def test_is_expired_true(self): pass
    def test_is_valid_active(self): pass
    def test_is_valid_expired(self): pass
    def test_with_updated_last_used(self): pass
    def test_to_dict(self): pass

class TestTokenModel:
    def test_from_row(self): pass
    def test_scopes_property(self): pass
    def test_scopes_empty(self): pass
    def test_has_scope_true(self): pass
    def test_has_scope_false(self): pass
    def test_is_expired_never(self): pass
    def test_is_expired_yes(self): pass
    def test_is_valid(self): pass
    def test_is_valid_with_scope(self): pass
    def test_to_dict(self): pass

class TestAuthStateModel:
    def test_from_row(self): pass
    def test_is_expired_false(self): pass
    def test_is_expired_true(self): pass
    def test_is_valid(self): pass
    def test_to_dict(self): pass

Common Pitfalls to Avoid

  1. Don't modify frozen dataclasses directly → Use object.__setattr__() for caching
  2. Don't load content in init → Use lazy loading properties
  3. Don't forget to cache expensive operations → HTML rendering should be cached
  4. Don't validate paths in models → Models trust caller has validated
  5. Don't put business logic in models → Models are data only
  6. Don't forget datetime conversion → Database may return strings
  7. Don't expose sensitive data in to_dict() → Exclude tokens, passwords
  8. Don't forget to test with tmp_path → Use pytest tmp_path fixture

Security Checklist

  • Session token excluded from to_dict()
  • Token value excluded from to_dict()
  • File paths not directly exposed (use properties)
  • Expiry checked before using sessions/tokens/states
  • No SQL injection (models don't query, but be aware)
  • File reading errors propagate (don't hide)
  • Datetime comparisons use UTC

Performance Targets

  • Model creation (from_row): < 1ms
  • Content loading (first access): < 5ms
  • HTML rendering (first access): < 10ms
  • Cached property access: < 0.1ms
  • to_dict() serialization: < 1ms

Module Structure Template

"""
Data models for StarPunk

This module provides data model classes that wrap database rows and provide
clean interfaces for working with notes, sessions, tokens, and authentication
state. All models are immutable and use dataclasses.
"""

# Standard library
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from pathlib import Path
from typing import Optional

# Third-party
import markdown

# Local
from starpunk.utils import read_note_file, calculate_content_hash

# Constants
DEFAULT_SESSION_EXPIRY_DAYS = 30
# ... more constants

# Models

@dataclass(frozen=True)
class Note:
    """Represents a note/post"""
    # Fields here
    pass

@dataclass(frozen=True)
class Session:
    """Represents an authenticated session"""
    # Fields here
    pass

@dataclass(frozen=True)
class Token:
    """Represents a Micropub access token"""
    # Fields here
    pass

@dataclass(frozen=True)
class AuthState:
    """Represents an OAuth state token"""
    # Fields here
    pass

Quick Algorithm Reference

Title Extraction Algorithm

1. Get content (lazy load if needed)
2. Split on newlines
3. Take first non-empty line
4. Strip markdown heading syntax (# , ## , etc.)
5. Limit to MAX_TITLE_LENGTH
6. If empty, use slug as fallback

Excerpt Generation Algorithm

1. Get content (lazy load if needed)
2. Remove markdown syntax (simple regex)
3. Take first EXCERPT_LENGTH characters
4. Truncate to word boundary
5. Add ellipsis if truncated

Lazy Loading Algorithm

1. Check if cached value is None
2. If None:
   a. Perform expensive operation (file read, HTML render)
   b. Store result in cache using object.__setattr__()
3. Return cached value

Session Validation Algorithm

1. Check if session is expired
2. Check if session_token is not empty
3. Check if 'me' URL is valid (basic check)
4. Return True only if all checks pass

Token Scope Checking Algorithm

1. Parse scope string into list (split on whitespace)
2. Check if required_scope in list
3. Return boolean

Database Row Format Reference

Note Row

{
    'id': 1,
    'slug': 'my-note',
    'file_path': 'notes/2024/11/my-note.md',
    'published': 1,  # or True
    'created_at': '2024-11-18T14:30:00' or datetime(...),
    'updated_at': '2024-11-18T14:30:00' or datetime(...),
    'content_hash': 'abc123...'
}

Session Row

{
    'id': 1,
    'session_token': 'xyz789...',
    'me': 'https://alice.example.com',
    'created_at': datetime(...),
    'expires_at': datetime(...),
    'last_used_at': datetime(...) or None
}

Token Row

{
    'token': 'abc123...',
    'me': 'https://alice.example.com',
    'client_id': 'https://quill.p3k.io',
    'scope': 'create update',
    'created_at': datetime(...),
    'expires_at': datetime(...) or None
}

AuthState Row

{
    'state': 'random123...',
    'created_at': datetime(...),
    'expires_at': datetime(...)
}

Verification Checklist

Before marking Phase 1.2 complete:

  • All 4 models implemented
  • All models are frozen dataclasses
  • All models have from_row() class method
  • All models have to_dict() method
  • Note model lazy-loads content
  • Note model renders HTML with caching
  • Session model validates expiry
  • Token model validates scopes
  • All properties have type hints
  • All methods have docstrings with examples
  • Test file created with >90% coverage
  • All tests pass
  • Code formatted with Black
  • Code passes flake8
  • No security issues
  • Integration with utils.py works

Usage Quick Examples

Note Model

# Create from database
row = db.execute("SELECT * FROM notes WHERE slug = ?", (slug,)).fetchone()
note = Note.from_row(row, data_dir=Path("data"))

# Access metadata (fast)
print(note.slug, note.published)

# Lazy load content (slow first time, cached after)
content = note.content

# Render HTML (slow first time, cached after)
html = note.html

# Extract metadata
title = note.title
permalink = note.permalink

# Serialize for JSON/templates
data = note.to_dict(include_content=True)

Session Model

# Create from database
row = db.execute("SELECT * FROM sessions WHERE session_token = ?", (token,)).fetchone()
session = Session.from_row(row)

# Validate
if session.is_valid():
    # Update last used
    updated = session.with_updated_last_used()
    # Save to database
    db.execute("UPDATE sessions SET last_used_at = ? WHERE id = ?",
               (updated.last_used_at, updated.id))

Token Model

# Create from database
row = db.execute("SELECT * FROM tokens WHERE token = ?", (token,)).fetchone()
token_obj = Token.from_row(row)

# Validate with required scope
if token_obj.is_valid(required_scope='create'):
    # Allow request
    pass

AuthState Model

# Create from database
row = db.execute("SELECT * FROM auth_state WHERE state = ?", (state,)).fetchone()
auth_state = AuthState.from_row(row)

# Validate
if auth_state.is_valid():
    # Delete (single-use)
    db.execute("DELETE FROM auth_state WHERE state = ?", (state,))

Next Steps After Implementation

Once starpunk/models.py is complete:

  1. Move to Phase 2.1: Notes Management (starpunk/notes.py)
  2. Notes module will use Note model extensively
  3. Integration tests will verify models work with database

References

  • Full design: /home/phil/Projects/starpunk/docs/design/phase-1.2-data-models.md
  • Database schema: /home/phil/Projects/starpunk/starpunk/database.py
  • Utilities: /home/phil/Projects/starpunk/starpunk/utils.py
  • Python dataclasses: https://docs.python.org/3/library/dataclasses.html

Quick Command Reference

# Run tests
pytest tests/test_models.py -v

# Run tests with coverage
pytest tests/test_models.py --cov=starpunk.models --cov-report=term-missing

# Format code
black starpunk/models.py tests/test_models.py

# Lint code
flake8 starpunk/models.py tests/test_models.py

# Type check (optional)
mypy starpunk/models.py

Estimated Time Breakdown

  • Module docstring and imports: 10 minutes
  • Constants: 5 minutes
  • Note model: 80 minutes
    • Basic structure: 15 minutes
    • from_row: 10 minutes
    • Lazy loading properties: 20 minutes
    • HTML rendering: 15 minutes
    • Metadata extraction: 15 minutes
    • Other methods: 5 minutes
  • Session model: 30 minutes
  • Token model: 30 minutes
  • AuthState model: 20 minutes
  • Tests (all models): 90-120 minutes
  • Documentation review: 15 minutes

Total: 3-4 hours

Implementation Tips

Frozen Dataclass Caching

Since dataclasses with frozen=True are immutable, use this pattern for caching:

@property
def content(self) -> str:
    if self._cached_content is None:
        content = read_note_file(self._data_dir / self.file_path)
        # Use object.__setattr__ to bypass frozen restriction
        object.__setattr__(self, '_cached_content', content)
    return self._cached_content

Datetime Handling

Database may return strings or datetime objects:

@classmethod
def from_row(cls, row: dict) -> 'Note':
    created_at = row['created_at']
    if isinstance(created_at, str):
        created_at = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
    # ... rest of method

Testing with tmp_path

Use pytest's tmp_path fixture for file operations:

def test_content_loading(tmp_path):
    # Create test file
    note_file = tmp_path / 'notes' / '2024' / '11' / 'test.md'
    note_file.parent.mkdir(parents=True)
    note_file.write_text('# Test')

    # Create note with tmp_path as data_dir
    note = Note(
        id=1,
        slug='test',
        file_path='notes/2024/11/test.md',
        published=True,
        created_at=datetime.utcnow(),
        updated_at=datetime.utcnow(),
        _data_dir=tmp_path
    )

    assert '# Test' in note.content

Markdown Extension Configuration

import markdown

html = markdown.markdown(
    content,
    extensions=['extra', 'codehilite', 'nl2br'],
    extension_configs={
        'codehilite': {'css_class': 'highlight'}
    }
)