that initial commit

This commit is contained in:
2025-11-18 19:21:31 -07:00
commit a68fd570c7
69 changed files with 31070 additions and 0 deletions

View File

@@ -0,0 +1,599 @@
# Phase 1.2 Quick Reference: Data Models
## Quick Start
**File**: `starpunk/models.py`
**Tests**: `tests/test_models.py`
**Estimated Time**: 3-4 hours
**Dependencies**: `starpunk/utils.py`, `starpunk/database.py`
## Implementation Order
1. Module docstring, imports, and constants
2. Note model (most complex)
3. Session model
4. Token model
5. AuthState model (simplest)
6. Tests for all models
## Model Checklist
### Note Model (10 items)
- [ ] Dataclass structure with all fields
- [ ] `from_row(row: dict, data_dir: Path) -> Note` class method
- [ ] `content` property (lazy loading from file)
- [ ] `html` property (markdown rendering + caching)
- [ ] `title` property (extract from content)
- [ ] `excerpt` property (first 200 chars)
- [ ] `permalink` property (URL path)
- [ ] `is_published` property (alias)
- [ ] `to_dict(include_content, include_html) -> dict` method
- [ ] `verify_integrity() -> bool` method
### Session Model (6 items)
- [ ] Dataclass structure with all fields
- [ ] `from_row(row: dict) -> Session` class method
- [ ] `is_expired` property
- [ ] `is_valid() -> bool` method
- [ ] `with_updated_last_used() -> Session` method
- [ ] `to_dict() -> dict` method
### Token Model (6 items)
- [ ] Dataclass structure with all fields
- [ ] `from_row(row: dict) -> Token` class method
- [ ] `scopes` property (list of scope strings)
- [ ] `has_scope(required_scope: str) -> bool` method
- [ ] `is_valid(required_scope: Optional[str]) -> bool` method
- [ ] `to_dict() -> dict` method
### AuthState Model (4 items)
- [ ] Dataclass structure with all fields
- [ ] `from_row(row: dict) -> AuthState` class method
- [ ] `is_expired` property
- [ ] `is_valid() -> bool` method
- [ ] `to_dict() -> dict` method
**Total**: 4 models, 26 methods/properties
## Constants Required
```python
# Session configuration
DEFAULT_SESSION_EXPIRY_DAYS = 30
SESSION_EXTENSION_ON_USE = True
# Auth state configuration
DEFAULT_AUTH_STATE_EXPIRY_MINUTES = 5
# Token configuration
DEFAULT_TOKEN_EXPIRY_DAYS = 90
# Markdown rendering
MARKDOWN_EXTENSIONS = ['extra', 'codehilite', 'nl2br']
# Content limits
MAX_TITLE_LENGTH = 200
EXCERPT_LENGTH = 200
```
## Key Design Patterns
### Frozen Dataclasses
```python
@dataclass(frozen=True)
class Note:
# Core fields
id: int
slug: str
# ... more fields
# Internal fields (not from database)
_data_dir: Path = field(repr=False, compare=False)
_cached_content: Optional[str] = field(
default=None,
repr=False,
compare=False,
init=False
)
```
### Lazy Loading Pattern
```python
@property
def content(self) -> str:
"""Lazy-load content from file"""
if self._cached_content is None:
# Read from file
file_path = self._data_dir / self.file_path
content = read_note_file(file_path)
# Cache it (use object.__setattr__ for frozen dataclass)
object.__setattr__(self, '_cached_content', content)
return self._cached_content
```
### from_row Pattern
```python
@classmethod
def from_row(cls, row: dict, data_dir: Path = None) -> 'Note':
"""Create instance from database row"""
# Handle sqlite3.Row or dict
if hasattr(row, 'keys'):
data = {key: row[key] for key in row.keys()}
else:
data = row
# Convert timestamps if needed
if isinstance(data['created_at'], str):
data['created_at'] = datetime.fromisoformat(data['created_at'])
return cls(
id=data['id'],
slug=data['slug'],
# ... more fields
_data_dir=data_dir
)
```
### Immutable Update Pattern
```python
def with_updated_last_used(self) -> 'Session':
"""Create new session with updated timestamp"""
from dataclasses import replace
return replace(self, last_used_at=datetime.utcnow())
```
## Test Coverage Requirements
- Minimum 90% code coverage
- Test all model creation (from_row)
- Test all properties and methods
- Test lazy loading behavior
- Test caching behavior
- Test edge cases (empty content, expired sessions, etc.)
- Test error cases (file not found, invalid data)
## Example Test Structure
```python
class TestNoteModel:
def test_from_row(self): pass
def test_content_lazy_loading(self, tmp_path): pass
def test_content_caching(self, tmp_path): pass
def test_html_rendering(self, tmp_path): pass
def test_html_caching(self, tmp_path): pass
def test_title_extraction(self, tmp_path): pass
def test_title_fallback_to_slug(self): pass
def test_excerpt_generation(self, tmp_path): pass
def test_permalink(self): pass
def test_to_dict_basic(self): pass
def test_to_dict_with_content(self, tmp_path): pass
def test_verify_integrity_success(self, tmp_path): pass
def test_verify_integrity_failure(self, tmp_path): pass
class TestSessionModel:
def test_from_row(self): pass
def test_is_expired_false(self): pass
def test_is_expired_true(self): pass
def test_is_valid_active(self): pass
def test_is_valid_expired(self): pass
def test_with_updated_last_used(self): pass
def test_to_dict(self): pass
class TestTokenModel:
def test_from_row(self): pass
def test_scopes_property(self): pass
def test_scopes_empty(self): pass
def test_has_scope_true(self): pass
def test_has_scope_false(self): pass
def test_is_expired_never(self): pass
def test_is_expired_yes(self): pass
def test_is_valid(self): pass
def test_is_valid_with_scope(self): pass
def test_to_dict(self): pass
class TestAuthStateModel:
def test_from_row(self): pass
def test_is_expired_false(self): pass
def test_is_expired_true(self): pass
def test_is_valid(self): pass
def test_to_dict(self): pass
```
## Common Pitfalls to Avoid
1. **Don't modify frozen dataclasses directly** → Use `object.__setattr__()` for caching
2. **Don't load content in __init__** → Use lazy loading properties
3. **Don't forget to cache expensive operations** → HTML rendering should be cached
4. **Don't validate paths in models** → Models trust caller has validated
5. **Don't put business logic in models** → Models are data only
6. **Don't forget datetime conversion** → Database may return strings
7. **Don't expose sensitive data in to_dict()** → Exclude tokens, passwords
8. **Don't forget to test with tmp_path** → Use pytest tmp_path fixture
## Security Checklist
- [ ] Session token excluded from to_dict()
- [ ] Token value excluded from to_dict()
- [ ] File paths not directly exposed (use properties)
- [ ] Expiry checked before using sessions/tokens/states
- [ ] No SQL injection (models don't query, but be aware)
- [ ] File reading errors propagate (don't hide)
- [ ] Datetime comparisons use UTC
## Performance Targets
- Model creation (from_row): < 1ms
- Content loading (first access): < 5ms
- HTML rendering (first access): < 10ms
- Cached property access: < 0.1ms
- to_dict() serialization: < 1ms
## Module Structure Template
```python
"""
Data models for StarPunk
This module provides data model classes that wrap database rows and provide
clean interfaces for working with notes, sessions, tokens, and authentication
state. All models are immutable and use dataclasses.
"""
# Standard library
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from pathlib import Path
from typing import Optional
# Third-party
import markdown
# Local
from starpunk.utils import read_note_file, calculate_content_hash
# Constants
DEFAULT_SESSION_EXPIRY_DAYS = 30
# ... more constants
# Models
@dataclass(frozen=True)
class Note:
"""Represents a note/post"""
# Fields here
pass
@dataclass(frozen=True)
class Session:
"""Represents an authenticated session"""
# Fields here
pass
@dataclass(frozen=True)
class Token:
"""Represents a Micropub access token"""
# Fields here
pass
@dataclass(frozen=True)
class AuthState:
"""Represents an OAuth state token"""
# Fields here
pass
```
## Quick Algorithm Reference
### Title Extraction Algorithm
```
1. Get content (lazy load if needed)
2. Split on newlines
3. Take first non-empty line
4. Strip markdown heading syntax (# , ## , etc.)
5. Limit to MAX_TITLE_LENGTH
6. If empty, use slug as fallback
```
### Excerpt Generation Algorithm
```
1. Get content (lazy load if needed)
2. Remove markdown syntax (simple regex)
3. Take first EXCERPT_LENGTH characters
4. Truncate to word boundary
5. Add ellipsis if truncated
```
### Lazy Loading Algorithm
```
1. Check if cached value is None
2. If None:
a. Perform expensive operation (file read, HTML render)
b. Store result in cache using object.__setattr__()
3. Return cached value
```
### Session Validation Algorithm
```
1. Check if session is expired
2. Check if session_token is not empty
3. Check if 'me' URL is valid (basic check)
4. Return True only if all checks pass
```
### Token Scope Checking Algorithm
```
1. Parse scope string into list (split on whitespace)
2. Check if required_scope in list
3. Return boolean
```
## Database Row Format Reference
### Note Row
```python
{
'id': 1,
'slug': 'my-note',
'file_path': 'notes/2024/11/my-note.md',
'published': 1, # or True
'created_at': '2024-11-18T14:30:00' or datetime(...),
'updated_at': '2024-11-18T14:30:00' or datetime(...),
'content_hash': 'abc123...'
}
```
### Session Row
```python
{
'id': 1,
'session_token': 'xyz789...',
'me': 'https://alice.example.com',
'created_at': datetime(...),
'expires_at': datetime(...),
'last_used_at': datetime(...) or None
}
```
### Token Row
```python
{
'token': 'abc123...',
'me': 'https://alice.example.com',
'client_id': 'https://quill.p3k.io',
'scope': 'create update',
'created_at': datetime(...),
'expires_at': datetime(...) or None
}
```
### AuthState Row
```python
{
'state': 'random123...',
'created_at': datetime(...),
'expires_at': datetime(...)
}
```
## Verification Checklist
Before marking Phase 1.2 complete:
- [ ] All 4 models implemented
- [ ] All models are frozen dataclasses
- [ ] All models have from_row() class method
- [ ] All models have to_dict() method
- [ ] Note model lazy-loads content
- [ ] Note model renders HTML with caching
- [ ] Session model validates expiry
- [ ] Token model validates scopes
- [ ] All properties have type hints
- [ ] All methods have docstrings with examples
- [ ] Test file created with >90% coverage
- [ ] All tests pass
- [ ] Code formatted with Black
- [ ] Code passes flake8
- [ ] No security issues
- [ ] Integration with utils.py works
## Usage Quick Examples
### Note Model
```python
# Create from database
row = db.execute("SELECT * FROM notes WHERE slug = ?", (slug,)).fetchone()
note = Note.from_row(row, data_dir=Path("data"))
# Access metadata (fast)
print(note.slug, note.published)
# Lazy load content (slow first time, cached after)
content = note.content
# Render HTML (slow first time, cached after)
html = note.html
# Extract metadata
title = note.title
permalink = note.permalink
# Serialize for JSON/templates
data = note.to_dict(include_content=True)
```
### Session Model
```python
# Create from database
row = db.execute("SELECT * FROM sessions WHERE session_token = ?", (token,)).fetchone()
session = Session.from_row(row)
# Validate
if session.is_valid():
# Update last used
updated = session.with_updated_last_used()
# Save to database
db.execute("UPDATE sessions SET last_used_at = ? WHERE id = ?",
(updated.last_used_at, updated.id))
```
### Token Model
```python
# Create from database
row = db.execute("SELECT * FROM tokens WHERE token = ?", (token,)).fetchone()
token_obj = Token.from_row(row)
# Validate with required scope
if token_obj.is_valid(required_scope='create'):
# Allow request
pass
```
### AuthState Model
```python
# Create from database
row = db.execute("SELECT * FROM auth_state WHERE state = ?", (state,)).fetchone()
auth_state = AuthState.from_row(row)
# Validate
if auth_state.is_valid():
# Delete (single-use)
db.execute("DELETE FROM auth_state WHERE state = ?", (state,))
```
## Next Steps After Implementation
Once `starpunk/models.py` is complete:
1. Move to Phase 2.1: Notes Management (`starpunk/notes.py`)
2. Notes module will use Note model extensively
3. Integration tests will verify models work with database
## References
- Full design: `/home/phil/Projects/starpunk/docs/design/phase-1.2-data-models.md`
- Database schema: `/home/phil/Projects/starpunk/starpunk/database.py`
- Utilities: `/home/phil/Projects/starpunk/starpunk/utils.py`
- Python dataclasses: https://docs.python.org/3/library/dataclasses.html
## Quick Command Reference
```bash
# Run tests
pytest tests/test_models.py -v
# Run tests with coverage
pytest tests/test_models.py --cov=starpunk.models --cov-report=term-missing
# Format code
black starpunk/models.py tests/test_models.py
# Lint code
flake8 starpunk/models.py tests/test_models.py
# Type check (optional)
mypy starpunk/models.py
```
## Estimated Time Breakdown
- Module docstring and imports: 10 minutes
- Constants: 5 minutes
- Note model: 80 minutes
- Basic structure: 15 minutes
- from_row: 10 minutes
- Lazy loading properties: 20 minutes
- HTML rendering: 15 minutes
- Metadata extraction: 15 minutes
- Other methods: 5 minutes
- Session model: 30 minutes
- Token model: 30 minutes
- AuthState model: 20 minutes
- Tests (all models): 90-120 minutes
- Documentation review: 15 minutes
**Total**: 3-4 hours
## Implementation Tips
### Frozen Dataclass Caching
Since dataclasses with `frozen=True` are immutable, use this pattern for caching:
```python
@property
def content(self) -> str:
if self._cached_content is None:
content = read_note_file(self._data_dir / self.file_path)
# Use object.__setattr__ to bypass frozen restriction
object.__setattr__(self, '_cached_content', content)
return self._cached_content
```
### Datetime Handling
Database may return strings or datetime objects:
```python
@classmethod
def from_row(cls, row: dict) -> 'Note':
created_at = row['created_at']
if isinstance(created_at, str):
created_at = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
# ... rest of method
```
### Testing with tmp_path
Use pytest's tmp_path fixture for file operations:
```python
def test_content_loading(tmp_path):
# Create test file
note_file = tmp_path / 'notes' / '2024' / '11' / 'test.md'
note_file.parent.mkdir(parents=True)
note_file.write_text('# Test')
# Create note with tmp_path as data_dir
note = Note(
id=1,
slug='test',
file_path='notes/2024/11/test.md',
published=True,
created_at=datetime.utcnow(),
updated_at=datetime.utcnow(),
_data_dir=tmp_path
)
assert '# Test' in note.content
```
### Markdown Extension Configuration
```python
import markdown
html = markdown.markdown(
content,
extensions=['extra', 'codehilite', 'nl2br'],
extension_configs={
'codehilite': {'css_class': 'highlight'}
}
)
```