Files

Phil Skentelbery a3bac86647 feat: Complete IndieAuth server removal (Phases 2-4)

Completed all remaining phases of ADR-030 IndieAuth provider removal.
StarPunk no longer acts as an authorization server - all IndieAuth
operations delegated to external providers.

Phase 2 - Remove Token Issuance:
- Deleted /auth/token endpoint
- Removed token_endpoint() function from routes/auth.py
- Deleted tests/test_routes_token.py

Phase 3 - Remove Token Storage:
- Deleted starpunk/tokens.py module entirely
- Created migration 004 to drop tokens and authorization_codes tables
- Deleted tests/test_tokens.py
- Removed all internal token CRUD operations

Phase 4 - External Token Verification:
- Created starpunk/auth_external.py module
- Implemented verify_external_token() for external IndieAuth providers
- Updated Micropub endpoint to use external verification
- Added TOKEN_ENDPOINT configuration
- Updated all Micropub tests to mock external verification
- HTTP timeout protection (5s) for external requests

Additional Changes:
- Created migration 003 to remove code_verifier from auth_state
- Fixed 5 migration tests that referenced obsolete code_verifier column
- Updated 11 Micropub tests for external verification
- Fixed test fixture and app context issues
- All 501 tests passing

Breaking Changes:
- Micropub clients must use external IndieAuth providers
- TOKEN_ENDPOINT configuration now required
- Existing internal tokens invalid (tables dropped)

Migration Impact:
- Simpler codebase: -500 lines of code
- Fewer database tables: -2 tables (tokens, authorization_codes)
- More secure: External providers handle token security
- More maintainable: Less authentication code to maintain

Standards Compliance:
- W3C IndieAuth specification
- OAuth 2.0 Bearer token authentication
- IndieWeb principle: delegate to external services

Related:
- ADR-030: IndieAuth Provider Removal Strategy
- ADR-050: Remove Custom IndieAuth Server
- Migration 003: Remove code_verifier from auth_state
- Migration 004: Drop tokens and authorization_codes tables

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-24 17:23:46 -07:00

17 KiB

Raw Blame History

IndieAuth Removal Implementation Analysis

Date: 2025-11-24 Developer: Fullstack Developer Agent Status: Pre-Implementation Review

Executive Summary

I have thoroughly reviewed the architect's plan to remove the custom IndieAuth authorization server from StarPunk. This document presents my understanding, identifies concerns, and lists questions that need clarification before implementation begins.

What I Understand

Current Architecture

The system currently implements BOTH roles:

Authorization Server (to be removed):
- /auth/authorization endpoint with consent UI
- /auth/token endpoint for token issuance
- starpunk/tokens.py module (~413 lines)
- PKCE implementation in starpunk/auth.py
- Two database tables: authorization_codes and tokens
- Migration 002 that creates these tables
Resource Server (to be kept and modified):
- /micropub endpoint
- Admin authentication via IndieLogin.com
- Session management
- Token verification (currently local, will become external)

Proposed Changes

Remove ~500+ lines of authorization server code
Delete 2 database tables
Replace local token verification with external API calls
Add token caching (5-minute TTL) for performance
Update HTML discovery headers
Bump version from 0.4.0 → 0.5.0

Implementation Phases

The plan breaks the work into 5 phases over 3 days:

Remove authorization endpoint (Day 1)
Remove token issuance (Day 1)
Database schema simplification (Day 2)
External token verification (Day 2)
Documentation and discovery (Day 3)

Critical Questions for the Architect

1. Admin Authentication Clarification

Question: How exactly does admin authentication work after removal?

Context: I see two authentication flows in the current code:

Admin login: Uses IndieLogin.com → creates session cookie
Micropub auth: Uses local tokens → will use external verification

The plan says "admin login still works" but I need to confirm:

Does admin login continue using IndieLogin.com ONLY for session creation?
The admin never needs Micropub tokens for the web UI, correct?
Sessions are completely separate from Micropub tokens?

Why this matters: I need to ensure Phase 1-2 don't break admin access.

2. Token Verification Implementation Details

Question: What exactly should the external token verification return?

Current local implementation (starpunk/tokens.py:116-164):

def verify_token(token: str) -> Optional[Dict[str, Any]]:
    # Returns: {me, client_id, scope}
    # Updates last_used_at timestamp

Proposed external implementation (ADR-050 lines 156-191):

def verify_token(bearer_token: str) -> Optional[Dict[str, Any]]:
    response = httpx.get(
        token_endpoint,
        headers={'Authorization': f'Bearer {bearer_token}'}
    )
    # Returns response.json()

Concerns:

Does tokens.indieauth.com return the same fields (me, client_id, scope)?
What if the endpoint returns different field names?
How do we handle token endpoint errors vs invalid tokens?
Should we distinguish between "token invalid" and "endpoint unreachable"?

Request: Provide exact expected response format from tokens.indieauth.com or document what fields we should expect.

3. Scope Validation Strategy

Question: Where does scope validation happen after removal?

Current flow:

Client requests scope during authorization
We validate scope → only "create" supported
We store validated scope in authorization code
We issue token with validated scope
Micropub endpoint checks token has "create" scope

After removal:

External provider issues tokens with scopes
What if external provider issues a token with unsupported scopes?
Should we validate scope is "create" in our verify_token()?
Or trust the external provider completely?

From ADR-050 lines 180-185:

# Check scope
if 'create' not in data.get('scope', ''):
    return None

This suggests we validate, but I want to confirm this is the right approach.

4. Migration Backwards Compatibility

Question: What happens to existing StarPunk installations?

Scenario 1: Fresh install after 0.5.0

No problem - migration 002 never runs
But wait... other code might expect migration 002 to exist?

Scenario 2: Existing 0.4.0 installation upgrading to 0.5.0

Has migration 002 already run
Has tokens and authorization_codes tables
May have active tokens in database

The plan says (indieauth-removal-phases.md lines 168-189):

-- 003_remove_indieauth_tables.sql
DROP TABLE IF EXISTS tokens CASCADE;
DROP TABLE IF EXISTS authorization_codes CASCADE;

Concerns:

Should we archive migration 002 or delete it?
If we delete it, fresh installs won't have the migration number continuity
If we archive it, where? The plan shows /migrations/archive/
Do we need a "down migration" for rollback?

Request: Clarify migration strategy:

Keep 002 but add 003 that drops tables? (staged approach)
Delete 002 and renumber everything? (breaking approach)
Archive 002 to different directory? (git history approach)

5. Token Caching Security

Question: Is in-memory token caching secure?

Proposed cache (indieauth-removal-phases.md lines 266-280):

_token_cache = {}  # {token_hash: (data, expiry)}

def cache_token(token: str, data: dict, ttl: int = 300):
    token_hash = hashlib.sha256(token.encode()).hexdigest()
    token_cache[token_hash] = (data, time() + ttl)

Concerns:

Cache invalidation: If a token is revoked externally, we'll continue accepting it for up to 5 minutes
Memory growth: No cache cleanup of expired entries - they just accumulate
Multi-process: If running with multiple workers (gunicorn/uwsgi), each process has separate cache
Token exposure: Are we caching the full token or just the hash?

Questions:

Is 5-minute window for revocation acceptable?
Should we implement cache cleanup (LRU or TTL-based)?
Should we document that caching makes revocation non-immediate?
For production, should we recommend Redis instead?

The plan shows we cache the hash, not the token, which is good. But should we document the revocation delay?

6. Error Handling and User Experience

Question: How should we handle external endpoint failures?

Scenarios:

tokens.indieauth.com is down (network error)
tokens.indieauth.com returns 500 (server error)
tokens.indieauth.com returns 429 (rate limit)
Token is invalid (returns 401/404)
Request times out (> 5 seconds)

Current plan (indieauth-removal-plan.md lines 169-173):

if response.status_code != 200:
    return None

This treats ALL failures the same: "forbidden" error to user.

Questions:

Should we differentiate between "invalid token" and "verification service down"?
Should we fail open (allow request) or fail closed (deny request) on timeout?
Should we log different error types differently?
Should we have a fallback mechanism?

Recommendation: Return different error messages:

401/404 from endpoint → "Invalid or expired token"
Network/timeout error → "Authentication service temporarily unavailable"
This gives users better feedback

7. Configuration Changes

Question: Should TOKEN_ENDPOINT be configurable or hardcoded?

Current plan:

TOKEN_ENDPOINT = os.getenv('TOKEN_ENDPOINT', 'https://tokens.indieauth.com/token')

Questions:

Is there ever a reason to use a different token endpoint?
Should we support per-user token endpoints (discovery from user's domain)?
Or should we hardcode tokens.indieauth.com and simplify?

From the HTML discovery (simplified-auth-architecture.md lines 193-211):

<link rel="token_endpoint" href="{{ config.TOKEN_ENDPOINT }}">

This advertises OUR token endpoint to clients. But we're using an external one. Should this link point to:

tokens.indieauth.com (external provider)?
Or should we remove this link entirely since we're not issuing tokens?

This seems like a spec compliance issue that needs clarification.

8. Testing Strategy

Question: How do we test external token verification?

Proposed test (indieauth-removal-phases.md lines 332-348):

@patch('starpunk.micropub.httpx.get')
def test_external_token_verification(mock_get):
    mock_response.status_code = 200
    mock_response.json.return_value = {
        'me': 'https://example.com',
        'scope': 'create update'
    }

Concerns:

All tests will be mocked - we never test real integration
If tokens.indieauth.com changes response format, we won't know
We're mocking at the wrong level (httpx) - should mock at verify_token level?

Questions:

Should we have integration tests with real tokens.indieauth.com?
Should we test in CI with actual test tokens?
How do we get test tokens for CI? Manual process?
Should we implement a "test mode" that uses mock verification?

Recommendation: Create integration test suite that:

Uses real tokens.indieauth.com in CI
Requires CI environment variable with test token
Skips integration tests in local development
Keeps unit tests mocked as planned

9. Rollback Procedure

Question: What's the actual rollback procedure?

The plan mentions (ADR-050 lines 224-240):

git revert HEAD~5..HEAD
pg_dump restoration

Concerns:

This assumes PostgreSQL but StarPunk uses SQLite
HEAD~5 is fragile - depends on exactly 5 commits
No clear step-by-step rollback instructions
What if we're in the middle of Phase 3?

Questions:

Should we create backup before starting?
Should each phase be a separate commit for easier rollback?
How do we handle database rollback with SQLite?
Should we test the rollback procedure before starting?

Request: Create clear rollback procedure for each phase.

10. Performance Impact

Question: What's the expected performance impact?

Current: Local token verification

Database query: ~1-5ms
No network calls

Proposed: External verification

HTTP request to tokens.indieauth.com: 200-500ms
Cached requests: <1ms (cache hit)

Concerns:

First request to Micropub will be 200-500ms slower
If cache is cold, every request is 200-500ms slower
What if user makes batch requests (multiple posts)?
Does this make the UI feel slow?

Questions:

Is 200-500ms acceptable for Micropub clients?
Should we pre-warm the cache somehow?
Should cache TTL be configurable?
Should we implement request coalescing (multiple concurrent verifications for same token)?

Note: The plan mentions 90% cache hit rate, but this assumes:

Clients reuse tokens across requests
Multiple requests within 5-minute window
Single-process deployment

With multiple gunicorn workers, cache hit rate will be lower.

11. Database Schema Question

Question: Why does migration 003 update schema_version?

From indieauth-removal-plan.md lines 246-248:

UPDATE schema_version SET version = 3 WHERE id = 1;

But I don't see a schema_version table in the current migrations.

Questions:

Does this table exist?
Is this part of a migration tracking system?
Should migration 003 check for this table first?

12. IndieAuth Discovery Links

Question: What should the HTML discovery headers be?

Current (implied by removal):

<link rel="authorization_endpoint" href="/auth/authorization">
<link rel="token_endpoint" href="/auth/token">

Proposed (simplified-auth-architecture.md lines 207-210):

<link rel="authorization_endpoint" href="https://indieauth.com/auth">
<link rel="token_endpoint" href="https://tokens.indieauth.com/token">
<link rel="micropub" href="https://starpunk.example.com/micropub">

Questions:

Should these be in base.html (every page) or just the homepage?
Are we advertising that WE use indieauth.com, or that CLIENTS should?
Shouldn't these come from the user's own domain (ADMIN_ME)?
What if the user wants to use a different provider?

My understanding from IndieAuth spec:

These links tell clients WHERE to authenticate
They should point to the provider the USER wants to use
Not the provider StarPunk uses internally

This seems like it might be architecturally wrong. Need clarification.

Risks Identified

High-Risk Areas

Breaking Admin Access (Phase 1-2)
- Risk: Accidentally remove code needed for admin login
- Mitigation: Test admin login after each commit
- Severity: Critical (blocks all access)
Data Loss (Phase 3)
- Risk: Drop tables with no backup
- Mitigation: Backup database before migration
- Severity: High (no recovery path)
External Dependency (Phase 4)
- Risk: tokens.indieauth.com becomes required for operation
- Mitigation: Good error handling, caching
- Severity: High (service becomes unusable)
Token Format Mismatch (Phase 4)
- Risk: External endpoint returns different format than expected
- Mitigation: Thorough testing, error handling
- Severity: High (all Micropub requests fail)

Medium-Risk Areas

Cache Memory Leak (Phase 4)
- Risk: Token cache grows unbounded
- Mitigation: Implement cache cleanup
- Severity: Medium (performance degradation)
Multi-Worker Cache Misses (Phase 4)
- Risk: Poor cache hit rate with multiple processes
- Mitigation: Document limitation, consider Redis
- Severity: Medium (performance impact)
Migration Continuity (Phase 3)
- Risk: Migration numbering confusion
- Mitigation: Clear documentation
- Severity: Low (documentation issue)

Recommendations

Before Starting Implementation

Create Integration Test Suite
- Get test token from tokens.indieauth.com
- Write tests that verify actual response format
- Ensure we handle all error cases
Document Rollback Procedure
- Create step-by-step rollback for each phase
- Test rollback procedure before starting
- Create database backup script
Clarify Architecture Questions
- Resolve HTML discovery header confusion
- Confirm token verification response format
- Define error handling strategy
Implement Cache Cleanup
- Add LRU or TTL-based cache eviction
- Add cache size limit
- Add monitoring/logging

During Implementation

One Phase at a Time
- Complete each phase fully before moving to next
- Test thoroughly after each phase
- Create checkpoint commits for rollback
Comprehensive Testing
- Test admin login after Phase 1-2
- Test database migration on test database first
- Test external verification with real tokens
Monitor Performance
- Log token verification times
- Monitor cache hit rates
- Check for memory leaks

After Implementation

Production Migration Guide
- Document exact upgrade steps
- Include backup procedures
- Provide user communication template
Performance Monitoring
- Track external API latency
- Monitor cache effectiveness
- Alert on verification failures
User Documentation
- Update README with new setup instructions
- Create troubleshooting guide
- Document rollback procedure

Questions Summary

Here are all my questions organized by priority:

Must Answer Before Implementation

What is the exact response format from tokens.indieauth.com?
How should HTML discovery headers work (user's domain vs our provider)?
What's the migration strategy (keep 002, delete 002, or archive)?
How should we differentiate between token invalid vs service down?
Is 5-minute revocation delay acceptable?

Should Answer Before Implementation

Should we implement cache cleanup or just document the issue?
Should we have integration tests with real tokens?
What's the detailed rollback procedure for each phase?
Should TOKEN_ENDPOINT be configurable or hardcoded?
Does schema_version table exist?

Nice to Answer

Should we support multiple providers?
Should we implement request coalescing for concurrent verifications?
Should cache TTL be configurable?

My Recommendation to Proceed

I recommend we get answers to the "Must Answer" questions before implementing. The plan is solid overall, but these architectural details will affect how we implement Phase 4 (external verification), which is the core of this change.

Once we have clarity on:

External endpoint response format
HTML discovery strategy
Migration approach
Error handling strategy

...then I can implement confidently following the phased approach.

The plan is well-structured and thoughtfully designed. I appreciate the clear separation of phases and the detailed acceptance criteria. My questions are primarily about clarifying implementation details and edge cases.

Ready to implement: No Blocking issues: 5 architectural questions Estimated time after clarification: 2-3 days per plan

17 KiB Raw Blame History

IndieAuth Removal Implementation Analysis

Executive Summary

What I Understand

Current Architecture

Proposed Changes

Implementation Phases

Critical Questions for the Architect

1. Admin Authentication Clarification

2. Token Verification Implementation Details

3. Scope Validation Strategy

4. Migration Backwards Compatibility

5. Token Caching Security

6. Error Handling and User Experience

7. Configuration Changes

8. Testing Strategy

9. Rollback Procedure

10. Performance Impact

11. Database Schema Question

12. IndieAuth Discovery Links

Risks Identified

High-Risk Areas

Medium-Risk Areas

Recommendations

Before Starting Implementation

During Implementation

After Implementation

Questions Summary

Must Answer Before Implementation

Should Answer Before Implementation

Nice to Answer

My Recommendation to Proceed

17 KiB

Raw Blame History