# IndieAuth Endpoint Discovery Implementation Analysis **Date**: 2025-11-24 **Developer**: StarPunk Fullstack Developer **Status**: Ready for Architect Review **Target Version**: 1.0.0-rc.5 --- ## Executive Summary I have reviewed the architect's corrected IndieAuth endpoint discovery design and the W3C IndieAuth specification. The design is fundamentally sound and correctly implements the IndieAuth specification. However, I have **critical questions** about implementation details, particularly around the "chicken-and-egg" problem of determining which endpoint to verify a token with when we don't know the user's identity beforehand. **Overall Assessment**: The design is architecturally correct, but needs clarification on practical implementation details before coding can begin. --- ## What I Understand ### 1. The Core Problem Fixed The architect correctly identified that **hardcoding `TOKEN_ENDPOINT=https://tokens.indieauth.com/token` is fundamentally wrong**. This violates IndieAuth's core principle of user sovereignty. **Correct Approach**: - Store only `ADMIN_ME=https://admin.example.com/` in configuration - Discover endpoints dynamically from the user's profile URL at runtime - Each user can use their own IndieAuth provider ### 2. Endpoint Discovery Flow Per W3C IndieAuth Section 4.2, I understand the discovery process: ``` 1. Fetch user's profile URL (e.g., https://admin.example.com/) 2. Check in priority order: a. HTTP Link headers (highest priority) b. HTML elements (document order) c. IndieAuth metadata endpoint (optional) 3. Parse rel="authorization_endpoint" and rel="token_endpoint" 4. Resolve relative URLs against profile URL base 5. Cache discovered endpoints (with TTL) ``` **Example Discovery**: ```html GET https://admin.example.com/ HTTP/1.1 HTTP/1.1 200 OK Link: ; rel="token_endpoint" Content-Type: text/html ``` ### 3. Token Verification Flow Per W3C IndieAuth Section 6, I understand token verification: ``` 1. Receive Bearer token in Authorization header 2. Make GET request to token endpoint with Bearer token 3. Token endpoint returns: {me, client_id, scope} 4. Validate 'me' matches expected identity 5. Check required scopes present ``` **Example Verification**: ``` GET https://auth.example.com/token HTTP/1.1 Authorization: Bearer xyz123 Accept: application/json HTTP/1.1 200 OK Content-Type: application/json { "me": "https://admin.example.com/", "client_id": "https://quill.p3k.io/", "scope": "create update delete" } ``` ### 4. Security Considerations I understand the security model from the architect's docs: - **HTTPS Required**: Profile URLs and endpoints MUST use HTTPS in production - **Redirect Limits**: Maximum 5 redirects to prevent loops - **Cache Integrity**: Validate endpoints before caching - **URL Validation**: Ensure discovered URLs are well-formed - **Token Hashing**: Hash tokens before caching (SHA-256) ### 5. Implementation Components I understand these modules need to be created: 1. **`endpoint_discovery.py`**: Discover endpoints from profile URLs - HTTP Link header parsing - HTML link element extraction - URL resolution (relative to absolute) - Error handling 2. **Updated `auth_external.py`**: Token verification with discovery - Integrate endpoint discovery - Cache discovered endpoints - Verify tokens with discovered endpoints - Validate responses 3. **`endpoint_cache.py`** (or part of auth_external): Caching layer - Endpoint caching (TTL: 3600s) - Token verification caching (TTL: 300s) - Cache invalidation ### 6. Current Broken Code From `starpunk/auth_external.py` line 49: ```python token_endpoint = current_app.config.get("TOKEN_ENDPOINT") ``` This hardcoded approach is the problem we're fixing. --- ## Critical Questions for the Architect ### Question 1: The "Which Endpoint?" Problem ⚠️ **The Problem**: When Micropub receives a token, we need to verify it. But **which endpoint do we use to verify it**? The W3C spec says: > "GET request to the token endpoint containing an HTTP Authorization header with the Bearer Token according to [[RFC6750]]" But it doesn't say **how we know which token endpoint to use** when we receive a token from an unknown source. **Current Micropub Flow**: ```python # micropub.py line 74 token_info = verify_external_token(token) ``` The token is an opaque string like `"abc123xyz"`. We have no idea: - Which user it belongs to - Which provider issued it - Which endpoint to verify it with **ADR-030-CORRECTED suggests (line 204-258)**: ``` 4. Option A: If we have cached token info, use cached 'me' URL 5. Option B: Try verification with last known endpoint for similar tokens 6. Option C: Require 'me' parameter in Micropub request ``` **My Questions**: **1a)** Which option should I implement? The ADR presents three options but doesn't specify which one. **1b)** For **Option A** (cached token): How does the first request work? We need to verify a token to cache its 'me' URL, but we need the 'me' URL to know which endpoint to verify with. This is circular. **1c)** For **Option B** (last known endpoint): How do we handle the first token ever received? What is the "last known endpoint" when the cache is empty? **1d)** For **Option C** (require 'me' parameter): Does this violate the Micropub spec? The W3C Micropub specification doesn't include a 'me' parameter in requests. Is this a StarPunk-specific extension? **1e)** **Proposed Solution** (awaiting architect approval): Since StarPunk is a **single-user CMS**, we KNOW the only valid tokens are for `ADMIN_ME`. Therefore: ```python def verify_external_token(token: str) -> Optional[Dict[str, Any]]: """Verify token for the admin user""" admin_me = current_app.config.get("ADMIN_ME") # Discover endpoints from ADMIN_ME endpoints = discover_endpoints(admin_me) token_endpoint = endpoints['token_endpoint'] # Verify token with discovered endpoint response = httpx.get( token_endpoint, headers={'Authorization': f'Bearer {token}'} ) token_info = response.json() # Validate token belongs to admin if normalize_url(token_info['me']) != normalize_url(admin_me): raise TokenVerificationError("Token not for admin user") return token_info ``` **Is this the correct approach?** This assumes: - StarPunk only accepts tokens for `ADMIN_ME` - We always discover from `ADMIN_ME` profile URL - Multi-user support is explicitly out of scope for V1 Please confirm this is correct or provide the proper approach. --- ### Question 2: Caching Strategy Details **ADR-030-CORRECTED suggests** (line 131-160): - Endpoint cache TTL: 3600s (1 hour) - Token verification cache TTL: 300s (5 minutes) **My Questions**: **2a)** **Cache Key for Endpoints**: Should the cache key be the profile URL (`admin_me`) or should we maintain a global cache? For single-user StarPunk, we only have one profile URL (`ADMIN_ME`), so a simple cache like: ```python self.cached_endpoints = None self.cached_until = 0 ``` Would suffice. Is this acceptable, or should I implement a full `profile_url -> endpoints` dict for future multi-user support? **2b)** **Cache Key for Tokens**: The migration guide (line 259) suggests hashing tokens: ```python token_hash = hashlib.sha256(token.encode()).hexdigest() ``` But if tokens are opaque and unpredictable, why hash them? Is this: - To prevent tokens appearing in logs/debug output? - To prevent tokens being extracted from memory dumps? - Because cache keys should be fixed-length? If it's for security, should I also: - Use a constant-time comparison for token hash lookups? - Add HMAC with a secret key instead of plain SHA-256? **2c)** **Cache Invalidation**: When should I clear the cache? - On application startup? (cache is in-memory, so yes?) - On configuration changes? (how do I detect these?) - On token verification failures? (what if it's a network issue, not a provider change?) - Manual admin endpoint `/admin/clear-cache`? (should I implement this?) **2d)** **Cache Storage**: The ADR shows in-memory caching. Should I: - Use a simple dict with tuples: `cache[key] = (value, expiry)` - Use `functools.lru_cache` decorator? - Use `cachetools` library for TTL support? - Implement custom `EndpointCache` class as shown in ADR? For V1 simplicity, I propose **custom class with simple dict**, but please confirm. --- ### Question 3: HTML Parsing Implementation **From `docs/migration/fix-hardcoded-endpoints.md`** line 139-159: ```python from bs4 import BeautifulSoup def _extract_from_html(self, html: str, base_url: str) -> Dict[str, str]: soup = BeautifulSoup(html, 'html.parser') auth_link = soup.find('link', rel='authorization_endpoint') if auth_link and auth_link.get('href'): endpoints['authorization_endpoint'] = urljoin(base_url, auth_link['href']) ``` **My Questions**: **3a)** **Dependency**: Do we want to add BeautifulSoup4 as a dependency? Current dependencies (from quick check): - Flask - httpx - Other core libs BeautifulSoup4 is a new dependency. Alternatives: - Use Python's built-in `html.parser` (more fragile) - Use regex (bad for HTML, but endpoints are simple) - Use `lxml` (faster, but C extension dependency) **Recommendation**: Add BeautifulSoup4 with html.parser backend (pure Python). Confirm? **3b)** **HTML Validation**: Should I validate HTML before parsing? - Malformed HTML could cause parsing errors - Should I catch and handle `ParserError`? - What if there's no `` section? - What if `` elements are in `` (technically invalid but might exist)? **3c)** **Case Sensitivity**: HTML `rel` attributes are case-insensitive per spec. Should I: ```python soup.find('link', rel='token_endpoint') # Exact match # vs soup.find('link', rel=lambda x: x.lower() == 'token_endpoint' if x else False) ``` BeautifulSoup's `find()` is case-insensitive by default for attributes, so this should be fine, but confirm? --- ### Question 4: HTTP Link Header Parsing **From `docs/migration/fix-hardcoded-endpoints.md`** line 126-136: ```python def _parse_link_header(self, header: str, base_url: str) -> Dict[str, str]: pattern = r'<([^>]+)>;\s*rel="([^"]+)"' matches = re.findall(pattern, header) ``` **My Questions**: **4a)** **Regex Robustness**: This regex assumes: - Double quotes around rel value - Semicolon separator - No spaces in weird places But HTTP Link header format (RFC 8288) is more complex: ``` Link: ; rel="value"; param="other" Link: ; rel=value (no quotes allowed per spec) Link: ;rel="value" (no space after semicolon) ``` Should I: - Use a more robust regex? - Use a proper Link header parser library (e.g., `httpx` has built-in parsing)? - Stick with simple regex and document limitations? **Recommendation**: Use `httpx.Headers` built-in Link header parsing if available, otherwise simple regex. Confirm? **4b)** **Multiple Headers**: RFC 8288 allows multiple Link headers: ``` Link: ; rel="authorization_endpoint" Link: ; rel="token_endpoint" ``` Or comma-separated in single header: ``` Link: ; rel="authorization_endpoint", ; rel="token_endpoint" ``` My regex with `re.findall()` should handle both. Confirm this is correct? **4c)** **Priority Order**: ADR says "HTTP Link headers take precedence over HTML". But what if: - Link header has `authorization_endpoint` but not `token_endpoint` - HTML has both Should I: ```python # Option A: Once we find in Link header, stop looking if 'token_endpoint' in link_header_endpoints: return link_header_endpoints else: check_html() # Option B: Merge Link header and HTML, Link header wins for conflicts endpoints = html_endpoints.copy() endpoints.update(link_header_endpoints) # Link header overwrites ``` The W3C spec says "first HTTP Link header takes precedence", which suggests **Option B** (merge and overwrite). Confirm? --- ### Question 5: URL Resolution and Validation **From ADR-030-CORRECTED** line 217: ```python from urllib.parse import urljoin endpoints['token_endpoint'] = urljoin(profile_url, href) ``` **My Questions**: **5a)** **URL Validation**: Should I validate discovered URLs? Checks: - Must be absolute after resolution - Must use HTTPS (in production) - Must be valid URL format - Hostname must be valid - No localhost/127.0.0.1 in production (allow in dev?) Example validation: ```python def validate_endpoint_url(url: str, is_production: bool) -> bool: parsed = urlparse(url) if is_production and parsed.scheme != 'https': raise DiscoveryError("HTTPS required in production") if is_production and parsed.hostname in ['localhost', '127.0.0.1', '::1']: raise DiscoveryError("localhost not allowed in production") if not parsed.scheme or not parsed.netloc: raise DiscoveryError("Invalid URL format") return True ``` Is this overkill, or necessary? What validation do you want? **5b)** **URL Normalization**: Should I normalize URLs before comparing? ```python def normalize_url(url: str) -> str: # Add trailing slash? # Convert to lowercase? # Remove default ports? # Sort query params? ``` The current code does: ```python # auth_external.py line 96 token_me = token_info["me"].rstrip("/") expected_me = admin_me.rstrip("/") ``` Should endpoint URLs also be normalized? Or left as-is? **5c)** **Relative URL Edge Cases**: What should happen with these? ```html Result: https://admin.example.com/auth/token Result: https://other-domain.com/token (if profile was HTTPS) Result: https://admin.example.com/other-domain.com/token (broken!) ``` Python's `urljoin()` handles first two correctly. Third is ambiguous. Should I: - Reject URLs without `://` or leading `/`? - Try to detect and fix common mistakes? - Document expected format and let it fail? --- ### Question 6: Error Handling and Retry Logic **My Questions**: **6a)** **Discovery Failures**: When endpoint discovery fails, what should happen? Scenarios: 1. Profile URL unreachable (DNS failure, network timeout) 2. Profile URL returns 404/500 3. Profile HTML malformed (parsing fails) 4. No endpoints found in profile 5. Endpoints found but invalid URLs For each scenario, should I: - Return error immediately? - Retry with backoff? - Use cached endpoints if available (even if expired)? - Fail open (allow access) or fail closed (deny access)? **Recommendation**: Fail closed (deny access), use cached endpoints if available, no retries for discovery (but retries for token verification?). Confirm? **6b)** **Token Verification Failures**: When token verification fails, what should happen? Scenarios: 1. Token endpoint unreachable (timeout) 2. Token endpoint returns 400/401/403 (token invalid) 3. Token endpoint returns 500 (server error) 4. Token response missing required fields 5. Token 'me' doesn't match expected For scenarios 1 and 3 (network/server errors), should I: - Retry with backoff? - Use cached token info if available? - Fail immediately? **Recommendation**: Retry up to 3 times with exponential backoff for network errors (1, 3). For invalid tokens (2, 4, 5), fail immediately. Confirm? **6c)** **Timeout Configuration**: What timeouts should I use? Suggested: - Profile URL fetch: 5s (discovery is cached, so can be slow) - Token verification: 3s (happens on every request, must be fast) - Cache lookup: <1ms (in-memory) Are these acceptable? Should they be configurable? --- ### Question 7: Testing Strategy **My Questions**: **7a)** **Mock vs Real**: Should tests: - Mock all HTTP requests (faster, isolated) - Hit real IndieAuth providers (slow, integration test) - Both (unit tests mock, integration tests real)? **Recommendation**: Unit tests mock everything, add one integration test for real IndieAuth.com. Confirm? **7b)** **Test Fixtures**: Should I create test fixtures like: ```python # tests/fixtures/profiles.py PROFILE_WITH_LINK_HEADERS = { 'url': 'https://user.example.com/', 'headers': { 'Link': '; rel="token_endpoint"' }, 'expected': {'token_endpoint': 'https://auth.example.com/token'} } PROFILE_WITH_HTML_LINKS = { 'url': 'https://user.example.com/', 'html': '', 'expected': {'token_endpoint': 'https://auth.example.com/token'} } # ... more fixtures ``` Or inline test data in test functions? Fixtures would be reusable across tests. **7c)** **Test Coverage**: What coverage % is acceptable? Current test suite has 501 passing tests. I should aim for: - 100% coverage of new endpoint discovery code? - Edge cases covered (malformed HTML, network errors, etc.)? - Integration tests for full flow? --- ### Question 8: Performance Implications **My Questions**: **8a)** **First Request Latency**: Without cached endpoints, first Micropub request will: 1. Fetch profile URL (HTTP GET): ~100-500ms 2. Parse HTML/headers: ~10-50ms 3. Verify token with endpoint: ~100-300ms 4. Total: ~200-850ms Is this acceptable? User will notice delay on first post. Should I: - Pre-warm cache on application startup? - Show "Authenticating..." message to user? - Accept the delay (only happens once per TTL)? **8b)** **Cache Hit Rate**: With TTL of 3600s for endpoints and 300s for tokens: - Endpoints discovered once per hour - Tokens verified every 5 minutes For active user posting frequently: - First post: 850ms (discovery + verification) - Posts within 5 min: <1ms (cached token) - Posts after 5 min but within 1 hour: ~150ms (cached endpoint, verify token) - Posts after 1 hour: 850ms again Is this acceptable? Or should I increase token cache TTL? **8c)** **Concurrent Requests**: If two Micropub requests arrive simultaneously with uncached token: - Both will trigger endpoint discovery - Race condition in cache update Should I: - Add locking around cache updates? - Accept duplicate discoveries (harmless, just wasteful)? - Use thread-safe cache implementation? **Recommendation**: For V1 single-user CMS with low traffic, accept duplicates. Add locking in V2+ if needed. --- ### Question 9: Configuration and Deployment **My Questions**: **9a)** **Configuration Changes**: Current config has: ```ini # .env (WRONG - to be removed) TOKEN_ENDPOINT=https://tokens.indieauth.com/token # .env (CORRECT - to be kept) ADMIN_ME=https://admin.example.com/ ``` Should I: - Remove `TOKEN_ENDPOINT` from config.py immediately? - Add deprecation warning if `TOKEN_ENDPOINT` is set? - Provide migration instructions in CHANGELOG? **9b)** **Backward Compatibility**: RC.4 was just released with `TOKEN_ENDPOINT` configuration. RC.5 will remove it. Should I: - Provide migration script? - Automatic migration (detect and convert)? - Just document breaking change in CHANGELOG? Since we're in RC phase, breaking changes are acceptable, but users might be testing. Recommendation? **9c)** **Health Check**: Should the `/health` endpoint also check: - Endpoint discovery working (fetch ADMIN_ME profile)? - Token endpoint reachable? Or is this too expensive for health checks? --- ### Question 10: Development and Testing Workflow **My Questions**: **10a)** **Local Development**: Developers typically use `http://localhost:5000` for SITE_URL. But IndieAuth requires HTTPS. How should developers test? Options: 1. Allow HTTP in development mode (detect DEV_MODE=true) 2. Require ngrok/localhost.run for HTTPS tunneling 3. Use mock endpoints in dev mode 4. Accept that IndieAuth won't work locally without setup Current `auth_external.py` doesn't have HTTPS check. Should I add it with dev mode exception? **10b)** **Testing with Real Providers**: To test against real IndieAuth providers, I need: - A real profile URL with IndieAuth links - Valid tokens from that provider Should I: - Create test profile for integration tests? - Document how developers can test? - Skip real provider tests in CI (only run locally)? --- ## Implementation Readiness Assessment ### What's Clear and Ready to Implement ✅ **HTTP Link Header Parsing**: Clear algorithm, standard format ✅ **HTML Link Element Extraction**: Clear approach with BeautifulSoup4 ✅ **URL Resolution**: Standard `urljoin()` from urllib.parse ✅ **Basic Caching**: In-memory dict with TTL expiry ✅ **Token Verification HTTP Request**: Standard GET with Bearer token ✅ **Response Validation**: Check for required fields (me, client_id, scope) ### What Needs Architect Clarification ⚠️ **Critical (blocks implementation)**: - Q1: Which endpoint to verify tokens with (the "chicken-and-egg" problem) - Q2a: Cache structure for single-user vs future multi-user - Q3a: Add BeautifulSoup4 dependency? ⚠️ **Important (affects quality)**: - Q5a: URL validation requirements - Q6a: Error handling strategy (fail open vs closed) - Q6b: Retry logic for network failures - Q9a: Remove TOKEN_ENDPOINT config or deprecate? ⚠️ **Nice to have (can implement sensibly)**: - Q2c: Cache invalidation triggers - Q7a: Test strategy (mock vs real) - Q8a: First request latency acceptable? --- ## Proposed Implementation Plan Once questions are answered, here's my implementation approach: ### Phase 1: Core Discovery (Days 1-2) 1. Create `endpoint_discovery.py` module - `EndpointDiscovery` class - HTTP Link header parsing - HTML link element extraction - URL resolution and validation - Error handling 2. Unit tests for discovery - Test Link header parsing - Test HTML parsing - Test URL resolution - Test error cases ### Phase 2: Token Verification Update (Day 3) 1. Update `auth_external.py` - Integrate endpoint discovery - Add caching layer - Update `verify_external_token()` - Remove hardcoded TOKEN_ENDPOINT usage 2. Unit tests for updated verification - Test with discovered endpoints - Test caching behavior - Test error handling ### Phase 3: Integration and Testing (Day 4) 1. Integration tests - Full Micropub request flow - Cache behavior across requests - Error scenarios 2. Update existing tests - Fix any broken tests - Update mocks to use discovery ### Phase 4: Configuration and Documentation (Day 5) 1. Update configuration - Remove TOKEN_ENDPOINT from config.py - Add deprecation warning if still set - Update .env.example 2. Update documentation - CHANGELOG entry for rc.5 - Migration guide if needed - API documentation ### Phase 5: Manual Testing and Refinement (Day 6) 1. Test with real IndieAuth provider 2. Performance testing (cache effectiveness) 3. Error handling verification 4. Final refinements **Estimated Total Time**: 5-7 days --- ## Dependencies to Add Based on migration guide, I'll need to add: ```toml # pyproject.toml or requirements.txt beautifulsoup4>=4.12.0 # HTML parsing for link extraction ``` `httpx` is already a dependency (used in current auth_external.py). --- ## Risks and Concerns ### Risk 1: Breaking Change Timing - **Issue**: RC.4 just shipped with TOKEN_ENDPOINT config - **Impact**: Users testing RC.4 will need to reconfigure for RC.5 - **Mitigation**: Clear migration notes in CHANGELOG, consider grace period ### Risk 2: Performance Degradation - **Issue**: First request will be slower (800ms vs <100ms cached) - **Impact**: User experience on first post after restart/cache expiry - **Mitigation**: Document expected behavior, consider pre-warming cache ### Risk 3: External Dependency - **Issue**: StarPunk now depends on external profile URL availability - **Impact**: If profile URL is down, Micropub stops working - **Mitigation**: Cache endpoints for longer TTL, fail gracefully with clear errors ### Risk 4: Testing Complexity - **Issue**: More moving parts to test (HTTP, HTML parsing, caching) - **Impact**: More test code, more mocking, more edge cases - **Mitigation**: Good test fixtures, clear test organization --- ## Recommended Next Steps 1. **Architect reviews this report** and answers questions 2. **I create test fixtures** based on ADR examples 3. **I implement Phase 1** (core discovery) with tests 4. **Checkpoint review** - verify discovery working correctly 5. **I implement Phase 2** (integration with token verification) 6. **Checkpoint review** - verify end-to-end flow 7. **I implement Phase 3-5** (tests, config, docs) 8. **Final review** before merge --- ## Questions Summary (Quick Reference) **Critical** (must answer before coding): 1. Q1: Which endpoint to verify tokens with? Proposed: Use ADMIN_ME profile for single-user StarPunk 2. Q2a: Cache structure for single-user vs multi-user? 3. Q3a: Add BeautifulSoup4 dependency? **Important** (affects implementation quality): 4. Q5a: URL validation requirements? 5. Q6a: Error handling strategy (fail open/closed)? 6. Q6b: Retry logic for network failures? 7. Q9a: Remove or deprecate TOKEN_ENDPOINT config? **Can implement sensibly** (but prefer guidance): 8. Q2c: Cache invalidation triggers? 9. Q7a: Test strategy (mock vs real)? 10. Q8a: First request latency acceptable? --- ## Conclusion The architect's corrected design is sound and properly implements IndieAuth endpoint discovery per the W3C specification. The primary blocker is clarifying the "which endpoint?" question for token verification in a single-user CMS context. My proposed solution (always use ADMIN_ME profile for endpoint discovery) seems correct for StarPunk's single-user model, but I need architect confirmation before proceeding. Once questions are answered, I'm ready to implement with high confidence. The code will be clean, tested, and follow the specifications exactly. **Status**: ⏸️ **Waiting for Architect Review** --- **Document Version**: 1.0 **Created**: 2025-11-24 **Author**: StarPunk Fullstack Developer **Next Review**: After architect responds to questions