# IndieAuth Endpoint Discovery: Definitive Implementation Answers **Date**: 2025-11-24 **Architect**: StarPunk Software Architect **Status**: APPROVED FOR IMPLEMENTATION **Target Version**: 1.0.0-rc.5 --- ## Executive Summary These are definitive answers to the developer's 10 questions about IndieAuth endpoint discovery implementation. The developer should implement exactly as specified here. --- ## CRITICAL ANSWERS (Blocking Implementation) ### Answer 1: The "Which Endpoint?" Problem ✅ **DEFINITIVE ANSWER**: For StarPunk V1 (single-user CMS), ALWAYS use ADMIN_ME for endpoint discovery. Your proposed solution is **100% CORRECT**: ```python def verify_external_token(token: str) -> Optional[Dict[str, Any]]: """Verify token for the admin user""" admin_me = current_app.config.get("ADMIN_ME") # ALWAYS discover endpoints from ADMIN_ME profile endpoints = discover_endpoints(admin_me) token_endpoint = endpoints['token_endpoint'] # Verify token with discovered endpoint response = httpx.get( token_endpoint, headers={'Authorization': f'Bearer {token}'} ) token_info = response.json() # Validate token belongs to admin if normalize_url(token_info['me']) != normalize_url(admin_me): raise TokenVerificationError("Token not for admin user") return token_info ``` **Rationale**: - StarPunk V1 is explicitly single-user - Only the admin (ADMIN_ME) can post to the CMS - Any token not belonging to ADMIN_ME is invalid by definition - This eliminates the chicken-and-egg problem completely **Important**: Document this single-user assumption clearly in the code comments. When V2 adds multi-user support, this will need revisiting. ### Answer 2a: Cache Structure ✅ **DEFINITIVE ANSWER**: Use a SIMPLE cache for V1 single-user. ```python class EndpointCache: def __init__(self): # Simple cache for single-user V1 self.endpoints = None self.endpoints_expire = 0 self.token_cache = {} # token_hash -> (info, expiry) ``` **Rationale**: - We only have one user (ADMIN_ME) in V1 - No need for profile_url -> endpoints mapping - Simplest solution that works - Easy to upgrade to dict-based for V2 multi-user ### Answer 3a: BeautifulSoup4 Dependency ✅ **DEFINITIVE ANSWER**: YES, add BeautifulSoup4 as a dependency. ```toml # pyproject.toml [project.dependencies] beautifulsoup4 = ">=4.12.0" ``` **Rationale**: - Industry standard for HTML parsing - More robust than regex or built-in parser - Pure Python (with html.parser backend) - Well-maintained and documented - Worth the dependency for correctness --- ## IMPORTANT ANSWERS (Affects Quality) ### Answer 2b: Token Hashing ✅ **DEFINITIVE ANSWER**: YES, hash tokens with SHA-256. ```python token_hash = hashlib.sha256(token.encode()).hexdigest() ``` **Rationale**: - Prevents tokens appearing in logs - Fixed-length cache keys - Security best practice - NO need for HMAC (we're not signing, just hashing) - NO need for constant-time comparison (cache lookup, not authentication) ### Answer 2c: Cache Invalidation ✅ **DEFINITIVE ANSWER**: Clear cache on: 1. **Application startup** (cache is in-memory) 2. **TTL expiry** (automatic) 3. **NOT on failures** (could be transient network issues) 4. **NO manual endpoint needed** for V1 ### Answer 2d: Cache Storage ✅ **DEFINITIVE ANSWER**: Custom EndpointCache class with simple dict. ```python class EndpointCache: """Simple in-memory cache with TTL support""" def __init__(self): self.endpoints = None self.endpoints_expire = 0 self.token_cache = {} def get_endpoints(self): if time.time() < self.endpoints_expire: return self.endpoints return None def set_endpoints(self, endpoints, ttl=3600): self.endpoints = endpoints self.endpoints_expire = time.time() + ttl ``` **Rationale**: - Simple and explicit - No external dependencies - Easy to test - Clear TTL handling ### Answer 3b: HTML Validation ✅ **DEFINITIVE ANSWER**: Handle malformed HTML gracefully. ```python try: soup = BeautifulSoup(html, 'html.parser') # Look for links in both head and body (be liberal) for link in soup.find_all('link', rel=True): # Process... except Exception as e: logger.warning(f"HTML parsing failed: {e}") return {} # Return empty, don't crash ``` ### Answer 3c: Case Sensitivity ✅ **DEFINITIVE ANSWER**: BeautifulSoup handles this correctly by default. No special handling needed. ### Answer 4a: Link Header Parsing ✅ **DEFINITIVE ANSWER**: Use simple regex, document limitations. ```python def _parse_link_header(self, header: str) -> Dict[str, str]: """Parse Link header (basic RFC 8288 support) Note: Only supports quoted rel values, single Link headers """ pattern = r'<([^>]+)>;\s*rel="([^"]+)"' matches = re.findall(pattern, header) # ... process matches ``` **Rationale**: - Simple implementation for V1 - Document limitations clearly - Can upgrade if needed later - Avoids additional dependencies ### Answer 4b: Multiple Headers ✅ **DEFINITIVE ANSWER**: Your regex with re.findall() is correct. It handles both cases. ### Answer 4c: Priority Order ✅ **DEFINITIVE ANSWER**: Option B - Merge with Link header overwriting HTML. ```python endpoints = {} # First get from HTML endpoints.update(html_endpoints) # Then overwrite with Link headers (higher priority) endpoints.update(link_header_endpoints) ``` ### Answer 5a: URL Validation ✅ **DEFINITIVE ANSWER**: Validate with these checks: ```python def validate_endpoint_url(url: str) -> bool: parsed = urlparse(url) # Must be absolute if not parsed.scheme or not parsed.netloc: raise DiscoveryError("Invalid URL format") # HTTPS required in production if not current_app.debug and parsed.scheme != 'https': raise DiscoveryError("HTTPS required in production") # Allow localhost only in debug mode if not current_app.debug and parsed.hostname in ['localhost', '127.0.0.1', '::1']: raise DiscoveryError("Localhost not allowed in production") return True ``` ### Answer 5b: URL Normalization ✅ **DEFINITIVE ANSWER**: Normalize only for comparison, not storage. ```python def normalize_url(url: str) -> str: """Normalize URL for comparison only""" return url.rstrip("/").lower() ``` Store endpoints as discovered, normalize only when comparing. ### Answer 5c: Relative URL Edge Cases ✅ **DEFINITIVE ANSWER**: Let urljoin() handle it, document behavior. Python's urljoin() handles first two cases correctly. For the third (broken) case, let it fail naturally. Don't try to be clever. ### Answer 6a: Discovery Failures ✅ **DEFINITIVE ANSWER**: Fail closed with grace period. ```python def discover_endpoints(profile_url: str) -> Dict[str, str]: try: # Try discovery endpoints = self._fetch_and_parse(profile_url) self.cache.set_endpoints(endpoints) return endpoints except Exception as e: # Check cache even if expired (grace period) cached = self.cache.get_endpoints(ignore_expiry=True) if cached: logger.warning(f"Using expired cache due to discovery failure: {e}") return cached # No cache, must fail raise DiscoveryError(f"Endpoint discovery failed: {e}") ``` ### Answer 6b: Token Verification Failures ✅ **DEFINITIVE ANSWER**: Retry ONLY for network errors. ```python def verify_with_retries(endpoint: str, token: str, max_retries: int = 3): for attempt in range(max_retries): try: response = httpx.get(...) if response.status_code in [500, 502, 503, 504]: # Server error, retry if attempt < max_retries - 1: time.sleep(2 ** attempt) # Exponential backoff continue return response except (httpx.TimeoutException, httpx.NetworkError): if attempt < max_retries - 1: time.sleep(2 ** attempt) continue raise # For 400/401/403, fail immediately (no retry) ``` ### Answer 6c: Timeout Configuration ✅ **DEFINITIVE ANSWER**: Use these timeouts: ```python DISCOVERY_TIMEOUT = 5.0 # Profile fetch (cached, so can be slower) VERIFICATION_TIMEOUT = 3.0 # Token verification (every request) ``` Not configurable in V1. Hardcode with constants. --- ## OTHER ANSWERS ### Answer 7a: Test Strategy ✅ **DEFINITIVE ANSWER**: Unit tests mock, ONE integration test with real IndieAuth.com. ### Answer 7b: Test Fixtures ✅ **DEFINITIVE ANSWER**: YES, create reusable fixtures. ```python # tests/fixtures/indieauth_profiles.py PROFILES = { 'link_header': {...}, 'html_links': {...}, 'both': {...}, # etc. } ``` ### Answer 7c: Test Coverage ✅ **DEFINITIVE ANSWER**: - 90%+ coverage for new code - All edge cases tested - One real integration test ### Answer 8a: First Request Latency ✅ **DEFINITIVE ANSWER**: Accept the delay. Do NOT pre-warm cache. **Rationale**: - Only happens once per hour - Pre-warming adds complexity - User can wait 850ms for first post ### Answer 8b: Cache TTLs ✅ **DEFINITIVE ANSWER**: Keep as specified: - Endpoints: 3600s (1 hour) - Token verifications: 300s (5 minutes) These are good defaults. ### Answer 8c: Concurrent Requests ✅ **DEFINITIVE ANSWER**: Accept duplicate discoveries for V1. No locking needed for single-user low-traffic V1. ### Answer 9a: Configuration Changes ✅ **DEFINITIVE ANSWER**: Remove TOKEN_ENDPOINT immediately with deprecation warning. ```python # config.py if 'TOKEN_ENDPOINT' in os.environ: logger.warning( "TOKEN_ENDPOINT is deprecated and ignored. " "Remove it from your configuration. " "Endpoints are now discovered from ADMIN_ME profile." ) ``` ### Answer 9b: Backward Compatibility ✅ **DEFINITIVE ANSWER**: Document breaking change in CHANGELOG. No migration script. We're in RC phase, breaking changes are acceptable. ### Answer 9c: Health Check ✅ **DEFINITIVE ANSWER**: NO endpoint discovery in health check. Too expensive. Health check should be fast. ### Answer 10a: Local Development ✅ **DEFINITIVE ANSWER**: Allow HTTP in debug mode. ```python if current_app.debug: # Allow HTTP in development pass else: # Require HTTPS in production if parsed.scheme != 'https': raise SecurityError("HTTPS required") ``` ### Answer 10b: Testing with Real Providers ✅ **DEFINITIVE ANSWER**: Document test setup, skip in CI. ```python @pytest.mark.skipif( not os.environ.get('TEST_REAL_INDIEAUTH'), reason="Set TEST_REAL_INDIEAUTH=1 to run real provider tests" ) def test_real_indieauth(): # Test with real IndieAuth.com ``` --- ## Implementation Go/No-Go Decision ### ✅ APPROVED FOR IMPLEMENTATION You have all the information needed to implement endpoint discovery correctly. Proceed with your Phase 1-5 plan. ### Implementation Priorities 1. **FIRST**: Implement Question 1 solution (ADMIN_ME discovery) 2. **SECOND**: Add BeautifulSoup4 dependency 3. **THIRD**: Create EndpointCache class 4. **THEN**: Follow your phased implementation plan ### Key Implementation Notes 1. **Always use ADMIN_ME** for endpoint discovery in V1 2. **Fail closed** on security errors 3. **Be liberal** in what you accept (HTML parsing) 4. **Be strict** in what you validate (URLs, tokens) 5. **Document** single-user assumptions clearly 6. **Test** edge cases thoroughly --- ## Summary for Quick Reference | Question | Answer | Implementation | |----------|--------|----------------| | Q1: Which endpoint? | Always use ADMIN_ME | `discover_endpoints(admin_me)` | | Q2a: Cache structure? | Simple for single-user | `self.endpoints = None` | | Q3a: Add BeautifulSoup4? | YES | Add to dependencies | | Q5a: URL validation? | HTTPS in prod, localhost in dev | Check with `current_app.debug` | | Q6a: Error handling? | Fail closed with cache grace | Try cache on failure | | Q6b: Retry logic? | Only for network errors | 3 retries with backoff | | Q9a: Remove TOKEN_ENDPOINT? | Yes with warning | Deprecation message | --- **This document provides definitive answers. Implement as specified. No further architectural review needed before coding.** **Document Version**: 1.0 **Status**: FINAL **Next Step**: Begin implementation immediately