StarPunk/docs/architecture/endpoint-discovery-answers.md

# IndieAuth Endpoint Discovery: Definitive Implementation Answers

**Date**: 2025-11-24
**Architect**: StarPunk Software Architect
**Status**: APPROVED FOR IMPLEMENTATION
**Target Version**: 1.0.0-rc.5

---

## Executive Summary

These are definitive answers to the developer's 10 questions about IndieAuth endpoint discovery implementation. The developer should implement exactly as specified here.

---

## CRITICAL ANSWERS (Blocking Implementation)

### Answer 1: The "Which Endpoint?" Problem ✅

**DEFINITIVE ANSWER**: For StarPunk V1 (single-user CMS), ALWAYS use ADMIN_ME for endpoint discovery.

Your proposed solution is **100% CORRECT**:

```python
def verify_external_token(token: str) -> Optional[Dict[str, Any]]:
    """Verify token for the admin user"""
    admin_me = current_app.config.get("ADMIN_ME")

    # ALWAYS discover endpoints from ADMIN_ME profile
    endpoints = discover_endpoints(admin_me)
    token_endpoint = endpoints['token_endpoint']

    # Verify token with discovered endpoint
    response = httpx.get(
        token_endpoint,
        headers={'Authorization': f'Bearer {token}'}
    )

    token_info = response.json()

    # Validate token belongs to admin
    if normalize_url(token_info['me']) != normalize_url(admin_me):
        raise TokenVerificationError("Token not for admin user")

    return token_info
```

**Rationale**:
- StarPunk V1 is explicitly single-user
- Only the admin (ADMIN_ME) can post to the CMS
- Any token not belonging to ADMIN_ME is invalid by definition
- This eliminates the chicken-and-egg problem completely

**Important**: Document this single-user assumption clearly in the code comments. When V2 adds multi-user support, this will need revisiting.

### Answer 2a: Cache Structure ✅

**DEFINITIVE ANSWER**: Use a SIMPLE cache for V1 single-user.

```python
class EndpointCache:
    def __init__(self):
        # Simple cache for single-user V1
        self.endpoints = None
        self.endpoints_expire = 0
        self.token_cache = {}  # token_hash -> (info, expiry)
```

**Rationale**:
- We only have one user (ADMIN_ME) in V1
- No need for profile_url -> endpoints mapping
- Simplest solution that works
- Easy to upgrade to dict-based for V2 multi-user

### Answer 3a: BeautifulSoup4 Dependency ✅

**DEFINITIVE ANSWER**: YES, add BeautifulSoup4 as a dependency.

```toml
# pyproject.toml
[project.dependencies]
beautifulsoup4 = ">=4.12.0"
```

**Rationale**:
- Industry standard for HTML parsing
- More robust than regex or built-in parser
- Pure Python (with html.parser backend)
- Well-maintained and documented
- Worth the dependency for correctness

---

## IMPORTANT ANSWERS (Affects Quality)

### Answer 2b: Token Hashing ✅

**DEFINITIVE ANSWER**: YES, hash tokens with SHA-256.

```python
token_hash = hashlib.sha256(token.encode()).hexdigest()
```

**Rationale**:
- Prevents tokens appearing in logs
- Fixed-length cache keys
- Security best practice
- NO need for HMAC (we're not signing, just hashing)
- NO need for constant-time comparison (cache lookup, not authentication)

### Answer 2c: Cache Invalidation ✅

**DEFINITIVE ANSWER**: Clear cache on:
1. **Application startup** (cache is in-memory)
2. **TTL expiry** (automatic)
3. **NOT on failures** (could be transient network issues)
4. **NO manual endpoint needed** for V1

### Answer 2d: Cache Storage ✅

**DEFINITIVE ANSWER**: Custom EndpointCache class with simple dict.

```python
class EndpointCache:
    """Simple in-memory cache with TTL support"""

    def __init__(self):
        self.endpoints = None
        self.endpoints_expire = 0
        self.token_cache = {}

    def get_endpoints(self):
        if time.time() < self.endpoints_expire:
            return self.endpoints
        return None

    def set_endpoints(self, endpoints, ttl=3600):
        self.endpoints = endpoints
        self.endpoints_expire = time.time() + ttl
```

**Rationale**:
- Simple and explicit
- No external dependencies
- Easy to test
- Clear TTL handling

### Answer 3b: HTML Validation ✅

**DEFINITIVE ANSWER**: Handle malformed HTML gracefully.

```python
try:
    soup = BeautifulSoup(html, 'html.parser')
    # Look for links in both head and body (be liberal)
    for link in soup.find_all('link', rel=True):
        # Process...
except Exception as e:
    logger.warning(f"HTML parsing failed: {e}")
    return {}  # Return empty, don't crash
```

### Answer 3c: Case Sensitivity ✅

**DEFINITIVE ANSWER**: BeautifulSoup handles this correctly by default. No special handling needed.

### Answer 4a: Link Header Parsing ✅

**DEFINITIVE ANSWER**: Use simple regex, document limitations.

```python
def _parse_link_header(self, header: str) -> Dict[str, str]:
    """Parse Link header (basic RFC 8288 support)

    Note: Only supports quoted rel values, single Link headers
    """
    pattern = r'<([^>]+)>;\s*rel="([^"]+)"'
    matches = re.findall(pattern, header)
    # ... process matches
```

**Rationale**:
- Simple implementation for V1
- Document limitations clearly
- Can upgrade if needed later
- Avoids additional dependencies

### Answer 4b: Multiple Headers ✅

**DEFINITIVE ANSWER**: Your regex with re.findall() is correct. It handles both cases.

### Answer 4c: Priority Order ✅

**DEFINITIVE ANSWER**: Option B - Merge with Link header overwriting HTML.

```python
endpoints = {}
# First get from HTML
endpoints.update(html_endpoints)
# Then overwrite with Link headers (higher priority)
endpoints.update(link_header_endpoints)
```

### Answer 5a: URL Validation ✅

**DEFINITIVE ANSWER**: Validate with these checks:

```python
def validate_endpoint_url(url: str) -> bool:
    parsed = urlparse(url)

    # Must be absolute
    if not parsed.scheme or not parsed.netloc:
        raise DiscoveryError("Invalid URL format")

    # HTTPS required in production
    if not current_app.debug and parsed.scheme != 'https':
        raise DiscoveryError("HTTPS required in production")

    # Allow localhost only in debug mode
    if not current_app.debug and parsed.hostname in ['localhost', '127.0.0.1', '::1']:
        raise DiscoveryError("Localhost not allowed in production")

    return True
```

### Answer 5b: URL Normalization ✅

**DEFINITIVE ANSWER**: Normalize only for comparison, not storage.

```python
def normalize_url(url: str) -> str:
    """Normalize URL for comparison only"""
    return url.rstrip("/").lower()
```

Store endpoints as discovered, normalize only when comparing.

### Answer 5c: Relative URL Edge Cases ✅

**DEFINITIVE ANSWER**: Let urljoin() handle it, document behavior.

Python's urljoin() handles first two cases correctly. For the third (broken) case, let it fail naturally. Don't try to be clever.

### Answer 6a: Discovery Failures ✅

**DEFINITIVE ANSWER**: Fail closed with grace period.

```python
def discover_endpoints(profile_url: str) -> Dict[str, str]:
    try:
        # Try discovery
        endpoints = self._fetch_and_parse(profile_url)
        self.cache.set_endpoints(endpoints)
        return endpoints
    except Exception as e:
        # Check cache even if expired (grace period)
        cached = self.cache.get_endpoints(ignore_expiry=True)
        if cached:
            logger.warning(f"Using expired cache due to discovery failure: {e}")
            return cached
        # No cache, must fail
        raise DiscoveryError(f"Endpoint discovery failed: {e}")
```

### Answer 6b: Token Verification Failures ✅

**DEFINITIVE ANSWER**: Retry ONLY for network errors.

```python
def verify_with_retries(endpoint: str, token: str, max_retries: int = 3):
    for attempt in range(max_retries):
        try:
            response = httpx.get(...)
            if response.status_code in [500, 502, 503, 504]:
                # Server error, retry
                if attempt < max_retries - 1:
                    time.sleep(2 ** attempt)  # Exponential backoff
                    continue
            return response
        except (httpx.TimeoutException, httpx.NetworkError):
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)
                continue
            raise

    # For 400/401/403, fail immediately (no retry)
```

### Answer 6c: Timeout Configuration ✅

**DEFINITIVE ANSWER**: Use these timeouts:

```python
DISCOVERY_TIMEOUT = 5.0  # Profile fetch (cached, so can be slower)
VERIFICATION_TIMEOUT = 3.0  # Token verification (every request)
```

Not configurable in V1. Hardcode with constants.

---

## OTHER ANSWERS

### Answer 7a: Test Strategy ✅

**DEFINITIVE ANSWER**: Unit tests mock, ONE integration test with real IndieAuth.com.

### Answer 7b: Test Fixtures ✅

**DEFINITIVE ANSWER**: YES, create reusable fixtures.

```python
# tests/fixtures/indieauth_profiles.py
PROFILES = {
    'link_header': {...},
    'html_links': {...},
    'both': {...},
    # etc.
}
```

### Answer 7c: Test Coverage ✅

**DEFINITIVE ANSWER**:
- 90%+ coverage for new code
- All edge cases tested
- One real integration test

### Answer 8a: First Request Latency ✅

**DEFINITIVE ANSWER**: Accept the delay. Do NOT pre-warm cache.

**Rationale**:
- Only happens once per hour
- Pre-warming adds complexity
- User can wait 850ms for first post

### Answer 8b: Cache TTLs ✅

**DEFINITIVE ANSWER**: Keep as specified:
- Endpoints: 3600s (1 hour)
- Token verifications: 300s (5 minutes)

These are good defaults.

### Answer 8c: Concurrent Requests ✅

**DEFINITIVE ANSWER**: Accept duplicate discoveries for V1.

No locking needed for single-user low-traffic V1.

### Answer 9a: Configuration Changes ✅

**DEFINITIVE ANSWER**: Remove TOKEN_ENDPOINT immediately with deprecation warning.

```python
# config.py
if 'TOKEN_ENDPOINT' in os.environ:
    logger.warning(
        "TOKEN_ENDPOINT is deprecated and ignored. "
        "Remove it from your configuration. "
        "Endpoints are now discovered from ADMIN_ME profile."
    )
```

### Answer 9b: Backward Compatibility ✅

**DEFINITIVE ANSWER**: Document breaking change in CHANGELOG. No migration script.

We're in RC phase, breaking changes are acceptable.

### Answer 9c: Health Check ✅

**DEFINITIVE ANSWER**: NO endpoint discovery in health check.

Too expensive. Health check should be fast.

### Answer 10a: Local Development ✅

**DEFINITIVE ANSWER**: Allow HTTP in debug mode.

```python
if current_app.debug:
    # Allow HTTP in development
    pass
else:
    # Require HTTPS in production
    if parsed.scheme != 'https':
        raise SecurityError("HTTPS required")
```

### Answer 10b: Testing with Real Providers ✅

**DEFINITIVE ANSWER**: Document test setup, skip in CI.

```python
@pytest.mark.skipif(
    not os.environ.get('TEST_REAL_INDIEAUTH'),
    reason="Set TEST_REAL_INDIEAUTH=1 to run real provider tests"
)
def test_real_indieauth():
    # Test with real IndieAuth.com
```

---

## Implementation Go/No-Go Decision

### ✅ APPROVED FOR IMPLEMENTATION

You have all the information needed to implement endpoint discovery correctly. Proceed with your Phase 1-5 plan.

### Implementation Priorities

1. **FIRST**: Implement Question 1 solution (ADMIN_ME discovery)
2. **SECOND**: Add BeautifulSoup4 dependency
3. **THIRD**: Create EndpointCache class
4. **THEN**: Follow your phased implementation plan

### Key Implementation Notes

1. **Always use ADMIN_ME** for endpoint discovery in V1
2. **Fail closed** on security errors
3. **Be liberal** in what you accept (HTML parsing)
4. **Be strict** in what you validate (URLs, tokens)
5. **Document** single-user assumptions clearly
6. **Test** edge cases thoroughly

---

## Summary for Quick Reference

| Question | Answer | Implementation |
|----------|--------|----------------|
| Q1: Which endpoint? | Always use ADMIN_ME | `discover_endpoints(admin_me)` |
| Q2a: Cache structure? | Simple for single-user | `self.endpoints = None` |
| Q3a: Add BeautifulSoup4? | YES | Add to dependencies |
| Q5a: URL validation? | HTTPS in prod, localhost in dev | Check with `current_app.debug` |
| Q6a: Error handling? | Fail closed with cache grace | Try cache on failure |
| Q6b: Retry logic? | Only for network errors | 3 retries with backoff |
| Q9a: Remove TOKEN_ENDPOINT? | Yes with warning | Deprecation message |

---

**This document provides definitive answers. Implement as specified. No further architectural review needed before coding.**

**Document Version**: 1.0
**Status**: FINAL
**Next Step**: Begin implementation immediately