# Phase 1.2 Quick Reference: Data Models ## Quick Start **File**: `starpunk/models.py` **Tests**: `tests/test_models.py` **Estimated Time**: 3-4 hours **Dependencies**: `starpunk/utils.py`, `starpunk/database.py` ## Implementation Order 1. Module docstring, imports, and constants 2. Note model (most complex) 3. Session model 4. Token model 5. AuthState model (simplest) 6. Tests for all models ## Model Checklist ### Note Model (10 items) - [ ] Dataclass structure with all fields - [ ] `from_row(row: dict, data_dir: Path) -> Note` class method - [ ] `content` property (lazy loading from file) - [ ] `html` property (markdown rendering + caching) - [ ] `title` property (extract from content) - [ ] `excerpt` property (first 200 chars) - [ ] `permalink` property (URL path) - [ ] `is_published` property (alias) - [ ] `to_dict(include_content, include_html) -> dict` method - [ ] `verify_integrity() -> bool` method ### Session Model (6 items) - [ ] Dataclass structure with all fields - [ ] `from_row(row: dict) -> Session` class method - [ ] `is_expired` property - [ ] `is_valid() -> bool` method - [ ] `with_updated_last_used() -> Session` method - [ ] `to_dict() -> dict` method ### Token Model (6 items) - [ ] Dataclass structure with all fields - [ ] `from_row(row: dict) -> Token` class method - [ ] `scopes` property (list of scope strings) - [ ] `has_scope(required_scope: str) -> bool` method - [ ] `is_valid(required_scope: Optional[str]) -> bool` method - [ ] `to_dict() -> dict` method ### AuthState Model (4 items) - [ ] Dataclass structure with all fields - [ ] `from_row(row: dict) -> AuthState` class method - [ ] `is_expired` property - [ ] `is_valid() -> bool` method - [ ] `to_dict() -> dict` method **Total**: 4 models, 26 methods/properties ## Constants Required ```python # Session configuration DEFAULT_SESSION_EXPIRY_DAYS = 30 SESSION_EXTENSION_ON_USE = True # Auth state configuration DEFAULT_AUTH_STATE_EXPIRY_MINUTES = 5 # Token configuration DEFAULT_TOKEN_EXPIRY_DAYS = 90 # Markdown rendering MARKDOWN_EXTENSIONS = ['extra', 'codehilite', 'nl2br'] # Content limits MAX_TITLE_LENGTH = 200 EXCERPT_LENGTH = 200 ``` ## Key Design Patterns ### Frozen Dataclasses ```python @dataclass(frozen=True) class Note: # Core fields id: int slug: str # ... more fields # Internal fields (not from database) _data_dir: Path = field(repr=False, compare=False) _cached_content: Optional[str] = field( default=None, repr=False, compare=False, init=False ) ``` ### Lazy Loading Pattern ```python @property def content(self) -> str: """Lazy-load content from file""" if self._cached_content is None: # Read from file file_path = self._data_dir / self.file_path content = read_note_file(file_path) # Cache it (use object.__setattr__ for frozen dataclass) object.__setattr__(self, '_cached_content', content) return self._cached_content ``` ### from_row Pattern ```python @classmethod def from_row(cls, row: dict, data_dir: Path = None) -> 'Note': """Create instance from database row""" # Handle sqlite3.Row or dict if hasattr(row, 'keys'): data = {key: row[key] for key in row.keys()} else: data = row # Convert timestamps if needed if isinstance(data['created_at'], str): data['created_at'] = datetime.fromisoformat(data['created_at']) return cls( id=data['id'], slug=data['slug'], # ... more fields _data_dir=data_dir ) ``` ### Immutable Update Pattern ```python def with_updated_last_used(self) -> 'Session': """Create new session with updated timestamp""" from dataclasses import replace return replace(self, last_used_at=datetime.utcnow()) ``` ## Test Coverage Requirements - Minimum 90% code coverage - Test all model creation (from_row) - Test all properties and methods - Test lazy loading behavior - Test caching behavior - Test edge cases (empty content, expired sessions, etc.) - Test error cases (file not found, invalid data) ## Example Test Structure ```python class TestNoteModel: def test_from_row(self): pass def test_content_lazy_loading(self, tmp_path): pass def test_content_caching(self, tmp_path): pass def test_html_rendering(self, tmp_path): pass def test_html_caching(self, tmp_path): pass def test_title_extraction(self, tmp_path): pass def test_title_fallback_to_slug(self): pass def test_excerpt_generation(self, tmp_path): pass def test_permalink(self): pass def test_to_dict_basic(self): pass def test_to_dict_with_content(self, tmp_path): pass def test_verify_integrity_success(self, tmp_path): pass def test_verify_integrity_failure(self, tmp_path): pass class TestSessionModel: def test_from_row(self): pass def test_is_expired_false(self): pass def test_is_expired_true(self): pass def test_is_valid_active(self): pass def test_is_valid_expired(self): pass def test_with_updated_last_used(self): pass def test_to_dict(self): pass class TestTokenModel: def test_from_row(self): pass def test_scopes_property(self): pass def test_scopes_empty(self): pass def test_has_scope_true(self): pass def test_has_scope_false(self): pass def test_is_expired_never(self): pass def test_is_expired_yes(self): pass def test_is_valid(self): pass def test_is_valid_with_scope(self): pass def test_to_dict(self): pass class TestAuthStateModel: def test_from_row(self): pass def test_is_expired_false(self): pass def test_is_expired_true(self): pass def test_is_valid(self): pass def test_to_dict(self): pass ``` ## Common Pitfalls to Avoid 1. **Don't modify frozen dataclasses directly** → Use `object.__setattr__()` for caching 2. **Don't load content in __init__** → Use lazy loading properties 3. **Don't forget to cache expensive operations** → HTML rendering should be cached 4. **Don't validate paths in models** → Models trust caller has validated 5. **Don't put business logic in models** → Models are data only 6. **Don't forget datetime conversion** → Database may return strings 7. **Don't expose sensitive data in to_dict()** → Exclude tokens, passwords 8. **Don't forget to test with tmp_path** → Use pytest tmp_path fixture ## Security Checklist - [ ] Session token excluded from to_dict() - [ ] Token value excluded from to_dict() - [ ] File paths not directly exposed (use properties) - [ ] Expiry checked before using sessions/tokens/states - [ ] No SQL injection (models don't query, but be aware) - [ ] File reading errors propagate (don't hide) - [ ] Datetime comparisons use UTC ## Performance Targets - Model creation (from_row): < 1ms - Content loading (first access): < 5ms - HTML rendering (first access): < 10ms - Cached property access: < 0.1ms - to_dict() serialization: < 1ms ## Module Structure Template ```python """ Data models for StarPunk This module provides data model classes that wrap database rows and provide clean interfaces for working with notes, sessions, tokens, and authentication state. All models are immutable and use dataclasses. """ # Standard library from dataclasses import dataclass, field from datetime import datetime, timedelta from pathlib import Path from typing import Optional # Third-party import markdown # Local from starpunk.utils import read_note_file, calculate_content_hash # Constants DEFAULT_SESSION_EXPIRY_DAYS = 30 # ... more constants # Models @dataclass(frozen=True) class Note: """Represents a note/post""" # Fields here pass @dataclass(frozen=True) class Session: """Represents an authenticated session""" # Fields here pass @dataclass(frozen=True) class Token: """Represents a Micropub access token""" # Fields here pass @dataclass(frozen=True) class AuthState: """Represents an OAuth state token""" # Fields here pass ``` ## Quick Algorithm Reference ### Title Extraction Algorithm ``` 1. Get content (lazy load if needed) 2. Split on newlines 3. Take first non-empty line 4. Strip markdown heading syntax (# , ## , etc.) 5. Limit to MAX_TITLE_LENGTH 6. If empty, use slug as fallback ``` ### Excerpt Generation Algorithm ``` 1. Get content (lazy load if needed) 2. Remove markdown syntax (simple regex) 3. Take first EXCERPT_LENGTH characters 4. Truncate to word boundary 5. Add ellipsis if truncated ``` ### Lazy Loading Algorithm ``` 1. Check if cached value is None 2. If None: a. Perform expensive operation (file read, HTML render) b. Store result in cache using object.__setattr__() 3. Return cached value ``` ### Session Validation Algorithm ``` 1. Check if session is expired 2. Check if session_token is not empty 3. Check if 'me' URL is valid (basic check) 4. Return True only if all checks pass ``` ### Token Scope Checking Algorithm ``` 1. Parse scope string into list (split on whitespace) 2. Check if required_scope in list 3. Return boolean ``` ## Database Row Format Reference ### Note Row ```python { 'id': 1, 'slug': 'my-note', 'file_path': 'notes/2024/11/my-note.md', 'published': 1, # or True 'created_at': '2024-11-18T14:30:00' or datetime(...), 'updated_at': '2024-11-18T14:30:00' or datetime(...), 'content_hash': 'abc123...' } ``` ### Session Row ```python { 'id': 1, 'session_token': 'xyz789...', 'me': 'https://alice.example.com', 'created_at': datetime(...), 'expires_at': datetime(...), 'last_used_at': datetime(...) or None } ``` ### Token Row ```python { 'token': 'abc123...', 'me': 'https://alice.example.com', 'client_id': 'https://quill.p3k.io', 'scope': 'create update', 'created_at': datetime(...), 'expires_at': datetime(...) or None } ``` ### AuthState Row ```python { 'state': 'random123...', 'created_at': datetime(...), 'expires_at': datetime(...) } ``` ## Verification Checklist Before marking Phase 1.2 complete: - [ ] All 4 models implemented - [ ] All models are frozen dataclasses - [ ] All models have from_row() class method - [ ] All models have to_dict() method - [ ] Note model lazy-loads content - [ ] Note model renders HTML with caching - [ ] Session model validates expiry - [ ] Token model validates scopes - [ ] All properties have type hints - [ ] All methods have docstrings with examples - [ ] Test file created with >90% coverage - [ ] All tests pass - [ ] Code formatted with Black - [ ] Code passes flake8 - [ ] No security issues - [ ] Integration with utils.py works ## Usage Quick Examples ### Note Model ```python # Create from database row = db.execute("SELECT * FROM notes WHERE slug = ?", (slug,)).fetchone() note = Note.from_row(row, data_dir=Path("data")) # Access metadata (fast) print(note.slug, note.published) # Lazy load content (slow first time, cached after) content = note.content # Render HTML (slow first time, cached after) html = note.html # Extract metadata title = note.title permalink = note.permalink # Serialize for JSON/templates data = note.to_dict(include_content=True) ``` ### Session Model ```python # Create from database row = db.execute("SELECT * FROM sessions WHERE session_token = ?", (token,)).fetchone() session = Session.from_row(row) # Validate if session.is_valid(): # Update last used updated = session.with_updated_last_used() # Save to database db.execute("UPDATE sessions SET last_used_at = ? WHERE id = ?", (updated.last_used_at, updated.id)) ``` ### Token Model ```python # Create from database row = db.execute("SELECT * FROM tokens WHERE token = ?", (token,)).fetchone() token_obj = Token.from_row(row) # Validate with required scope if token_obj.is_valid(required_scope='create'): # Allow request pass ``` ### AuthState Model ```python # Create from database row = db.execute("SELECT * FROM auth_state WHERE state = ?", (state,)).fetchone() auth_state = AuthState.from_row(row) # Validate if auth_state.is_valid(): # Delete (single-use) db.execute("DELETE FROM auth_state WHERE state = ?", (state,)) ``` ## Next Steps After Implementation Once `starpunk/models.py` is complete: 1. Move to Phase 2.1: Notes Management (`starpunk/notes.py`) 2. Notes module will use Note model extensively 3. Integration tests will verify models work with database ## References - Full design: `/home/phil/Projects/starpunk/docs/design/phase-1.2-data-models.md` - Database schema: `/home/phil/Projects/starpunk/starpunk/database.py` - Utilities: `/home/phil/Projects/starpunk/starpunk/utils.py` - Python dataclasses: https://docs.python.org/3/library/dataclasses.html ## Quick Command Reference ```bash # Run tests pytest tests/test_models.py -v # Run tests with coverage pytest tests/test_models.py --cov=starpunk.models --cov-report=term-missing # Format code black starpunk/models.py tests/test_models.py # Lint code flake8 starpunk/models.py tests/test_models.py # Type check (optional) mypy starpunk/models.py ``` ## Estimated Time Breakdown - Module docstring and imports: 10 minutes - Constants: 5 minutes - Note model: 80 minutes - Basic structure: 15 minutes - from_row: 10 minutes - Lazy loading properties: 20 minutes - HTML rendering: 15 minutes - Metadata extraction: 15 minutes - Other methods: 5 minutes - Session model: 30 minutes - Token model: 30 minutes - AuthState model: 20 minutes - Tests (all models): 90-120 minutes - Documentation review: 15 minutes **Total**: 3-4 hours ## Implementation Tips ### Frozen Dataclass Caching Since dataclasses with `frozen=True` are immutable, use this pattern for caching: ```python @property def content(self) -> str: if self._cached_content is None: content = read_note_file(self._data_dir / self.file_path) # Use object.__setattr__ to bypass frozen restriction object.__setattr__(self, '_cached_content', content) return self._cached_content ``` ### Datetime Handling Database may return strings or datetime objects: ```python @classmethod def from_row(cls, row: dict) -> 'Note': created_at = row['created_at'] if isinstance(created_at, str): created_at = datetime.fromisoformat(created_at.replace('Z', '+00:00')) # ... rest of method ``` ### Testing with tmp_path Use pytest's tmp_path fixture for file operations: ```python def test_content_loading(tmp_path): # Create test file note_file = tmp_path / 'notes' / '2024' / '11' / 'test.md' note_file.parent.mkdir(parents=True) note_file.write_text('# Test') # Create note with tmp_path as data_dir note = Note( id=1, slug='test', file_path='notes/2024/11/test.md', published=True, created_at=datetime.utcnow(), updated_at=datetime.utcnow(), _data_dir=tmp_path ) assert '# Test' in note.content ``` ### Markdown Extension Configuration ```python import markdown html = markdown.markdown( content, extensions=['extra', 'codehilite', 'nl2br'], extension_configs={ 'codehilite': {'css_class': 'highlight'} } ) ```