# Client ID Validation Compliance Analysis ## W3C IndieAuth Specification Requirements (Section 3.2) According to the W3C IndieAuth specification, client identifiers MUST meet the following requirements: 1. **Scheme**: MUST have either `https` or `http` scheme 2. **Path**: MUST contain a path component 3. **Path Segments**: MUST NOT contain single-dot or double-dot path segments 4. **Query String**: MAY contain a query string component 5. **Fragment**: MUST NOT contain a fragment component 6. **User Info**: MUST NOT contain username or password component 7. **Port**: MAY contain a port 8. **Hostname**: - MUST be domain names or a loopback interface - MUST NOT be IPv4 or IPv6 addresses except for 127.0.0.1 or [::1] ## Current Gondulf Implementation Analysis ### What We Currently Validate ✅ 1. **Scheme Validation** (PARTIAL) - ✅ We require HTTPS scheme in `normalize_client_id()` - ❌ We reject HTTP completely, even for localhost (spec allows HTTP) - Location: `/src/gondulf/utils/validation.py` lines 48-49 2. **Query String Preservation** - ✅ We preserve query strings correctly - Location: `/src/gondulf/utils/validation.py` lines 58-59 3. **Fragment Preservation** (BUT NOT VALIDATED) - ⚠️ We preserve fragments but don't reject them - Location: `/src/gondulf/utils/validation.py` lines 60-61 ### What We DON'T Validate ❌ 1. **Path Component Requirement** - ❌ We don't verify that a path component exists - Spec requires at least "/" as a path 2. **Path Segment Validation** - ❌ We don't check for single-dot (.) or double-dot (..) path segments - These should be rejected per spec 3. **Fragment Component** - ❌ We don't reject URLs with fragments (we actually preserve them) - Spec says MUST NOT contain fragment 4. **User Info Component** - ❌ We don't check for or reject username:password in URLs - Spec says MUST NOT contain username or password 5. **Hostname/IP Validation** - ❌ We don't validate that hostnames are domain names - ❌ We don't reject IPv4/IPv6 addresses (except 127.0.0.1 and [::1]) - ❌ We don't have special handling for localhost/127.0.0.1/[::1] 6. **Port Support** - ✅ We preserve ports correctly - ⚠️ But we don't allow HTTP for localhost with ports ## Reference Implementation (IndieLogin) Comparison From examining `/home/phil/Projects/indielogin.com/app/Authenticate.php`: 1. Uses `\p3k\url\is_url()` for basic URL validation (line 49) 2. Checks that client_id contains a dot OR is localhost (line 51) 3. Validates redirect_uri domain matches client_id domain (lines 75-95) 4. Does NOT appear to validate all spec requirements either ## Non-Compliance Issues Found ### Critical Issues 1. **HTTP localhost support missing** ```python # Current code rejects all HTTP if parsed.scheme != 'https': raise ValueError("client_id must use https scheme") ``` Should allow HTTP for localhost/127.0.0.1/[::1] 2. **Fragment rejection missing** ```python # Current code preserves fragments instead of rejecting them if parsed.fragment: normalized += f"#{parsed.fragment}" ``` 3. **Path component not validated** - No check that path exists (at minimum "/") 4. **IP address validation missing** - Should reject IPs except 127.0.0.1 and [::1] ### Security Implications 1. **Fragment acceptance** - Could lead to confusion about actual client_id 2. **User info not rejected** - Could expose credentials in logs 3. **IP addresses allowed** - Could bypass domain validation 4. **Path traversal** - Accepting ../.. could be security issue ## Recommended Fixes ### Priority 1: Add proper client_id validation function Create a new `validate_client_id()` function that checks ALL spec requirements: ```python def validate_client_id(client_id: str) -> tuple[bool, str]: """ Validate client_id per W3C IndieAuth spec Section 3.2. Returns: (is_valid, error_message) """ try: parsed = urlparse(client_id) # 1. Check scheme if parsed.scheme not in ['https', 'http']: return False, "client_id must use https or http scheme" # 2. For HTTP, only allow localhost/loopback if parsed.scheme == 'http': if parsed.hostname not in ['localhost', '127.0.0.1', '[::1]', '::1']: return False, "HTTP scheme only allowed for localhost/127.0.0.1/[::1]" # 3. Must have path component if not parsed.path: return False, "client_id must contain a path component" # 4. Check for . or .. segments path_segments = parsed.path.split('/') if '.' in path_segments or '..' in path_segments: return False, "client_id must not contain . or .. path segments" # 5. Must NOT have fragment if parsed.fragment: return False, "client_id must not contain a fragment" # 6. Must NOT have user info if parsed.username or parsed.password: return False, "client_id must not contain username or password" # 7. Check hostname (no raw IPs except loopback) if parsed.hostname: # Check if it's an IP address import ipaddress try: ip = ipaddress.ip_address(parsed.hostname) # Only allow loopback IPs if not ip.is_loopback: return False, f"client_id must not use IP address {parsed.hostname}" except ValueError: # Not an IP, that's good (it's a domain) pass return True, "" except Exception as e: return False, f"Invalid URL: {e}" ``` ### Priority 2: Update normalize_client_id() Should call validate_client_id() first, then normalize. ### Priority 3: Add comprehensive tests Test all validation rules with both positive and negative cases. ## Conclusion **Current Status: NOT COMPLIANT** ❌ Gondulf's client_id validation is currently **not compliant** with W3C IndieAuth specification Section 3.2. We have several missing validations that need to be implemented before v1.0.0 release. ### Required Actions Before v1.0.0 1. Implement complete `validate_client_id()` function 2. Update `normalize_client_id()` to validate first 3. Add support for HTTP on localhost/127.0.0.1/[::1] 4. Add tests for all validation rules 5. Update authorization endpoint to use new validation This is a **BLOCKING ISSUE** for v1.0.0 release as it affects spec compliance.