StarPunk/docs/reports/indieauth-removal-questions.md

# IndieAuth Removal - Questions for Architect

**Date**: 2025-11-24
**Developer**: Fullstack Developer Agent
**Document**: Pre-Implementation Questions

## Status: BLOCKED - Awaiting Architectural Clarification

I have thoroughly reviewed the removal plan and identified several architectural questions that need answers before implementation can begin safely.

---

## CRITICAL QUESTIONS (Must answer before implementing)

### Q1: External Token Endpoint Response Format

**What I see in the plan** (ADR-050 lines 156-191):
```python
response = httpx.get(
    token_endpoint,
    headers={'Authorization': f'Bearer {bearer_token}'}
)
data = response.json()
# Uses: data.get('me'), data.get('scope')
```

**What I see in current code** (starpunk/tokens.py:116-164):
```python
def verify_token(token: str) -> Optional[Dict[str, Any]]:
    return {
        'me': row['me'],
        'client_id': row['client_id'],
        'scope': row['scope']
    }
```

**Questions**:
1. What is the EXACT response format from tokens.indieauth.com/token?
2. Does it include `client_id`? (current code uses this)
3. What fields can we rely on?
4. What status codes indicate invalid token vs server error?

**Request**: Provide actual example response from tokens.indieauth.com or point to specification.

**Why this blocks**: Phase 4 implementation depends on knowing exact response format.

---

### Q2: HTML Discovery Headers Strategy

**What the plan shows** (simplified-auth-architecture.md lines 207-210):
```html
<link rel="authorization_endpoint" href="https://indieauth.com/auth">
<link rel="token_endpoint" href="https://tokens.indieauth.com/token">
```

**My confusion**:
- These headers tell Micropub CLIENTS where to get tokens
- We're putting them on OUR pages (starpunk instance)
- But shouldn't they point to the USER's chosen provider?
- IndieAuth spec says these come from the user's DOMAIN, not from StarPunk

**Example**:
- User: alice.com (ADMIN_ME)
- StarPunk: starpunk.alice.com
- Client (Quill) looks at alice.com for discovery headers
- Quill should see alice's chosen provider, not ours

**Questions**:
1. Should these headers be on StarPunk pages at all?
2. Or should users add them to their own domain?
3. Are we confusing "where StarPunk verifies" with "where clients authenticate"?

**Request**: Clarify the relationship between:
- StarPunk's token verification (internal, uses tokens.indieauth.com)
- Client's token acquisition (should use user's domain discovery)

**Why this blocks**: We might be implementing discovery headers incorrectly, which would break IndieAuth flow.

---

### Q3: Migration 002 Handling Strategy

**The plan mentions** (indieauth-removal-phases.md line 209):
```bash
mv migrations/002_secure_tokens_and_authorization_codes.sql migrations/archive/
```

**Questions**:
1. Should we keep 002 in migrations/ and add 003 that drops tables?
2. Should we delete 002 entirely?
3. Should we archive to a different directory?
4. What about fresh installs - do they need 002 at all?

**Three approaches**:

**Option A: Keep 002, Add 003**
- Pro: Clear history, both migrations run in order
- Con: Creates then immediately drops tables (wasteful)
- Use case: Existing installations upgrade smoothly

**Option B: Delete 002, Renumber Everything**
- Pro: Clean, no dead migrations
- Con: Breaking change for existing installations
- Use case: Fresh installs don't have dead code

**Option C: Archive 002, Add 003**
- Pro: Git history preserved, clean migrations/
- Con: Migration numbers have gaps
- Use case: Documentation without execution

**Request**: Which approach should we use and why?

**Why this blocks**: Phase 3 depends on knowing how to handle migration files.

---

### Q4: Error Handling Strategy

**Current plan** (indieauth-removal-plan.md lines 169-173):
```python
if response.status_code != 200:
    return None
```

This treats ALL failures identically:
- Token invalid (401 from provider) → return None
- tokens.indieauth.com down (connection error) → return None
- Rate limited (429 from provider) → return None
- Timeout (no response) → return None

**Questions**:
1. Should we differentiate between "invalid token" and "service unavailable"?
2. Should we fail closed (deny) or fail open (allow) on timeout?
3. Should we return different error messages to users?

**Proposed enhancement**:
```python
try:
    response = httpx.get(endpoint, timeout=5.0)
    if response.status_code == 401:
        return None  # Invalid token
    elif response.status_code != 200:
        logger.error(f"Token endpoint returned {response.status_code}")
        return None  # Service error, deny access
except httpx.TimeoutException:
    logger.error("Token verification timeout")
    return None  # Network issue, deny access
```

**Request**: Define error handling policy - what happens for each error type?

**Why this blocks**: Affects user experience and security posture.

---

### Q5: Token Cache Revocation Delay

**Proposed caching** (indieauth-removal-phases.md lines 266-280):
```python
# Cache for 5 minutes
_token_cache[token_hash] = (data, time() + 300)
```

**The problem**:
1. User revokes token at tokens.indieauth.com
2. StarPunk cache still has it for up to 5 minutes
3. Token continues to work for 5 minutes after revocation

**Questions**:
1. Is this acceptable for security?
2. Should we document this limitation?
3. Should we implement cache invalidation somehow?
4. Should cache TTL be shorter (1 minute)?

**Trade-off**:
- Longer TTL = better performance, worse security
- Shorter TTL = worse performance, better security
- No cache = worst performance, best security

**Request**: Confirm 5-minute window is acceptable or specify different TTL.

**Why this blocks**: Security/performance trade-off needs architectural decision.

---

## IMPORTANT QUESTIONS (Should answer before implementing)

### Q6: Cache Cleanup Implementation

**Current plan** (indieauth-removal-phases.md lines 266-280):
```python
_token_cache = {}
```

**Problem**: No cleanup mechanism - expired entries accumulate forever.

**Questions**:
1. Should we implement LRU cache eviction?
2. Should we implement TTL-based cleanup?
3. Should we just document the limitation?
4. Should we recommend Redis for production?

**Recommendation**: Add simple cleanup:
```python
def verify_token(token):
    # Clean expired entries every 100 requests
    if len(_token_cache) % 100 == 0:
        now = time()
        _token_cache = {k: v for k, v in _token_cache.items() if v[1] > now}
```

**Request**: Approve cleanup approach or specify alternative.

---

### Q7: Integration Testing Strategy

**Plan shows only mocked tests** (indieauth-removal-phases.md lines 332-348):
```python
@patch('starpunk.micropub.httpx.get')
def test_external_token_verification(mock_get):
    mock_response.status_code = 200
```

**Questions**:
1. Should we have integration tests with real tokens.indieauth.com?
2. How do we get test tokens for CI?
3. Should CI test against real external service?

**Recommendation**: Two-tier testing:
- Unit tests: Mock external calls (fast, always pass)
- Integration tests: Real tokens.indieauth.com (slow, conditional)

**Request**: Define testing strategy for external dependencies.

---

### Q8: Rollback Procedure Detail

**Plan mentions** (ADR-050 lines 224-240):
```bash
git revert HEAD~5..HEAD
```

**Problems**:
1. Assumes exactly 5 commits
2. Plan mentions PostgreSQL but we use SQLite
3. No phase-specific rollback

**Request**: Create specific rollback for each phase:

**Phase 1 rollback**:
```bash
git revert <commit-hash>
# No database changes, just code
```

**Phase 3 rollback**:
```bash
cp data/starpunk.db.backup data/starpunk.db
git revert <commit-hash>
```

**Full rollback**:
```bash
git revert <phase-5-commit>...<phase-1-commit>
cp data/starpunk.db.backup data/starpunk.db
```

---

### Q9: TOKEN_ENDPOINT Configuration

**Plan shows** (indieauth-removal-plan.md line 181):
```python
TOKEN_ENDPOINT = os.getenv('TOKEN_ENDPOINT', 'https://tokens.indieauth.com/token')
```

**Questions**:
1. Should this be configurable or hardcoded?
2. Is there a use case for different token endpoints?
3. Should we support per-user endpoints (discovery)?

**Recommendation**: Hardcode for V1, make configurable later if needed.

**Request**: Confirm configuration approach.

---

### Q10: Schema Version Table

**Plan shows** (indieauth-removal-plan.md lines 246-248):
```sql
UPDATE schema_version SET version = 3 WHERE id = 1;
```

**Question**: Does this table exist? I don't see it in current migrations.

**Request**: Clarify if this is needed or remove from migration 003.

---

## NICE TO HAVE ANSWERS

### Q11: Multi-Worker Cache Coherence

With multiple gunicorn workers, each has separate in-memory cache:
- Worker 1: Verifies token, caches it
- Worker 2: Gets request with same token, cache miss, verifies again

**Question**: Should we document this limitation or implement shared cache (Redis)?

### Q12: Request Coalescing

If multiple concurrent requests use same token:
- All hit cache miss
- All make external API call
- All cache separately

**Question**: Should we implement request coalescing (only one verification per token)?

### Q13: Configurable Cache TTL

**Question**: Should cache TTL be configurable via environment variable?
```python
CACHE_TTL = int(os.getenv('TOKEN_CACHE_TTL', '300'))
```

---

## Summary

**Status**: Ready to review, not ready to implement

**Blocking questions**: 5 critical architectural decisions
**Important questions**: 5 implementation details
**Nice-to-have questions**: 3 optimization considerations

**My assessment**: The plan is solid and well-thought-out. These questions are about clarifying implementation details and edge cases, not fundamental flaws. Once we have answers to the critical questions, I'm confident we can implement successfully.

**Next steps**:
1. Architect reviews and answers questions
2. I implement based on clarified architecture
3. We proceed through phases with clear acceptance criteria

**Estimated implementation time after clarification**: 2-3 days per plan