Implements complete W3C IndieAuth Section 3.2 client identifier validation including: - Fragment rejection - HTTP scheme support for localhost/loopback only - Username/password component rejection - Non-loopback IP address rejection - Path traversal prevention (.. and . segments) - Hostname case normalization - Default port removal (80/443) - Path component enforcement All 75 validation tests passing with 99% coverage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
18 KiB
Client ID Validation Compliance
Purpose
This design addresses critical non-compliance issues in Gondulf's client_id validation that violate the W3C IndieAuth specification Section 3.2. These issues must be fixed before v1.0.0 release to ensure any compliant IndieAuth client can successfully authenticate.
CLARIFICATIONS (2025-11-24)
Based on Developer questions, the following clarifications have been added:
-
IPv6 Bracket Handling: Python's
urlparsereturnshostnameWITHOUT brackets for IPv6 addresses. The brackets are only innetloc. Therefore, the check should be against '::1' without brackets. -
Normalization of IPv6 with Port: When reconstructing URLs with IPv6 addresses and ports, brackets MUST be added back (e.g.,
[::1]:8080). -
Empty Path Normalization: Confirmed -
https://example.comshould normalize tohttps://example.com/(with trailing slash). -
Validation Rule Ordering: Implementation should follow the logical flow shown in the example implementation (lines 87-138), not the numbered list order. The try/except for URL parsing serves as the "Basic URL Structure" check.
-
Endpoint Updates: These are SEPARATE tasks and should NOT be implemented as part of the validation.py update task.
-
Test File Location: Tests should go in the existing
/home/phil/Projects/Gondulf/tests/unit/test_validation.pyfile. -
Import Location: The
ipaddressimport should be at module level (Python convention), not inside the function.
Specification References
- Primary: W3C IndieAuth Section 3.2 - Client Identifier
- OAuth 2.0: RFC 6749 Section 2.2
- Reference Implementation: IndieLogin.com
/app/Authenticate.php
Design Overview
Replace the current incomplete normalize_client_id() function with two distinct functions:
validate_client_id()- Validates client_id against all specification requirementsnormalize_client_id()- Normalizes a valid client_id to canonical form
This separation ensures clear validation logic and proper error reporting while maintaining backward compatibility with existing code that expects normalization.
Component Details
New Function: validate_client_id()
Location: /home/phil/Projects/Gondulf/src/gondulf/utils/validation.py
Purpose: Validate a client_id URL against all W3C IndieAuth specification requirements.
Function Signature:
def validate_client_id(client_id: str) -> tuple[bool, str]:
"""
Validate client_id against W3C IndieAuth specification Section 3.2.
Args:
client_id: The client identifier URL to validate
Returns:
Tuple of (is_valid, error_message)
- is_valid: True if client_id is valid, False otherwise
- error_message: Empty string if valid, specific error message if invalid
"""
Validation Rules (in order):
-
Basic URL Structure
- Must be a parseable URL with urlparse()
- Error: "client_id must be a valid URL"
-
Scheme Validation
- Must be 'https' OR 'http'
- Error: "client_id must use https or http scheme"
-
HTTP Scheme Restriction
- If scheme is 'http', hostname MUST be one of: 'localhost', '127.0.0.1', '::1' (note: hostname from urlparse has no brackets)
- Error: "client_id with http scheme is only allowed for localhost, 127.0.0.1, or [::1]"
-
Fragment Rejection
- Must NOT contain a fragment component (# part)
- Error: "client_id must not contain a fragment (#)"
-
User Info Rejection
- Must NOT contain username or password components
- Error: "client_id must not contain username or password"
-
IP Address Validation
- Check if hostname is an IP address using ipaddress.ip_address()
- If it's an IP:
- Must be loopback (127.0.0.1 or ::1)
- Error: "client_id must not use IP address (except 127.0.0.1 or [::1])"
- If not an IP (ValueError), it's a domain name (valid)
-
Path Component Requirement
- Path must exist (at minimum "/")
- If empty path, it's still valid (will be normalized to "/" later)
-
Path Segment Validation
- Split path by '/' and check segments
- Must NOT contain single dot ('.') as a complete segment
- Must NOT contain double dot ('..') as a complete segment
- Note: './file' or '../file' as part of a segment is allowed, only standalone '.' or '..' segments are rejected
- Error: "client_id must not contain single-dot (.) or double-dot (..) path segments"
Implementation:
import ipaddress # At module level with other imports
def validate_client_id(client_id: str) -> tuple[bool, str]:
"""
Validate client_id against W3C IndieAuth specification Section 3.2.
Args:
client_id: The client identifier URL to validate
Returns:
Tuple of (is_valid, error_message)
"""
try:
parsed = urlparse(client_id)
# 1. Check scheme
if parsed.scheme not in ['https', 'http']:
return False, "client_id must use https or http scheme"
# 2. HTTP only for localhost/loopback
if parsed.scheme == 'http':
# Note: parsed.hostname returns '::1' without brackets for IPv6
if parsed.hostname not in ['localhost', '127.0.0.1', '::1']:
return False, "client_id with http scheme is only allowed for localhost, 127.0.0.1, or [::1]"
# 3. No fragments allowed
if parsed.fragment:
return False, "client_id must not contain a fragment (#)"
# 4. No username/password allowed
if parsed.username or parsed.password:
return False, "client_id must not contain username or password"
# 5. Check for non-loopback IP addresses
if parsed.hostname:
try:
# parsed.hostname already has no brackets for IPv6
ip = ipaddress.ip_address(parsed.hostname)
if not ip.is_loopback:
return False, f"client_id must not use IP address (except 127.0.0.1 or [::1])"
except ValueError:
# Not an IP address, it's a domain (valid)
pass
# 6. Check for . or .. path segments
if parsed.path:
segments = parsed.path.split('/')
for segment in segments:
if segment == '.' or segment == '..':
return False, "client_id must not contain single-dot (.) or double-dot (..) path segments"
return True, ""
except Exception as e:
return False, f"client_id must be a valid URL: {e}"
Updated Function: normalize_client_id()
Purpose: Normalize a valid client_id to canonical form. Must validate first.
Function Signature:
def normalize_client_id(client_id: str) -> str:
"""
Normalize client_id URL to canonical form per IndieAuth spec.
Normalization rules:
- Validate against specification first
- Convert hostname to lowercase
- Remove default ports (80 for http, 443 for https)
- Ensure path exists (default to "/" if empty)
- Preserve query string if present
- Never include fragments (already validated out)
Args:
client_id: Client ID URL to normalize
Returns:
Normalized client_id
Raises:
ValueError: If client_id is not valid per specification
"""
Normalization Rules:
-
Validation First
- Call validate_client_id()
- If invalid, raise ValueError with the error message
-
Hostname Normalization
- Convert hostname to lowercase
- Preserve IPv6 brackets if present
-
Port Normalization
- Remove port 80 for http URLs
- Remove port 443 for https URLs
- Preserve any other ports
-
Path Normalization
- If path is empty, set to "/"
- Do NOT remove trailing slashes (spec doesn't require this)
- Do NOT normalize . or .. (already validated out)
-
Component Assembly
- Reconstruct URL with normalized components
- Include query string if present
- Never include fragment (already validated out)
Implementation:
def normalize_client_id(client_id: str) -> str:
"""
Normalize client_id URL to canonical form per IndieAuth spec.
Args:
client_id: Client ID URL to normalize
Returns:
Normalized client_id
Raises:
ValueError: If client_id is not valid per specification
"""
# First validate
is_valid, error = validate_client_id(client_id)
if not is_valid:
raise ValueError(error)
parsed = urlparse(client_id)
# Normalize hostname to lowercase
hostname = parsed.hostname.lower() if parsed.hostname else ''
# Determine if this is an IPv6 address (for bracket handling)
is_ipv6 = ':' in hostname # Simple check since hostname has no brackets
# Handle port normalization
port = parsed.port
if (parsed.scheme == 'http' and port == 80) or \
(parsed.scheme == 'https' and port == 443):
# Default port, omit it
if is_ipv6:
netloc = f"[{hostname}]" # IPv6 needs brackets in URL
else:
netloc = hostname
elif port:
# Non-default port, include it
if is_ipv6:
netloc = f"[{hostname}]:{port}" # IPv6 with port needs brackets
else:
netloc = f"{hostname}:{port}"
else:
# No port
if is_ipv6:
netloc = f"[{hostname}]" # IPv6 needs brackets in URL
else:
netloc = hostname
# Ensure path exists
path = parsed.path if parsed.path else '/'
# Reconstruct URL
normalized = f"{parsed.scheme}://{netloc}{path}"
# Add query if present
if parsed.query:
normalized += f"?{parsed.query}"
# Never add fragment (validated out)
return normalized
Authorization Endpoint Updates (SEPARATE TASK)
NOTE: This is a SEPARATE task and should NOT be implemented as part of the validation.py update task.
Location: /home/phil/Projects/Gondulf/src/gondulf/endpoints/authorization.py
When this separate task is implemented, update the authorization endpoint to use the new validation:
# In the authorize() function, when validating client_id:
# Validate and normalize client_id
is_valid, error = validate_client_id(client_id)
if not is_valid:
# Return error to client
return authorization_error_response(
redirect_uri=redirect_uri,
error="invalid_request",
error_description=f"Invalid client_id: {error}",
state=state
)
# Normalize for consistent storage/comparison
try:
normalized_client_id = normalize_client_id(client_id)
except ValueError as e:
# This shouldn't happen if validate_client_id passed, but handle it
return authorization_error_response(
redirect_uri=redirect_uri,
error="invalid_request",
error_description=str(e),
state=state
)
Token Endpoint Updates (SEPARATE TASK)
NOTE: This is a SEPARATE task and should NOT be implemented as part of the validation.py update task.
Location: /home/phil/Projects/Gondulf/src/gondulf/endpoints/token.py
When this separate task is implemented, update token endpoint validation similarly:
# In the token() function, when validating client_id:
# Validate and normalize client_id
is_valid, error = validate_client_id(client_id)
if not is_valid:
return JSONResponse(
status_code=400,
content={
"error": "invalid_client",
"error_description": f"Invalid client_id: {error}"
}
)
# Normalize for comparison with stored value
normalized_client_id = normalize_client_id(client_id)
Data Models
No database schema changes required. The validation happens at the API layer before storage.
API Contracts
Error Responses
When client_id validation fails, return appropriate OAuth 2.0 error responses:
Authorization Endpoint (if redirect_uri is valid):
HTTP/1.1 302 Found
Location: {redirect_uri}?error=invalid_request&error_description=Invalid+client_id%3A+{specific_error}&state={state}
Authorization Endpoint (if redirect_uri is also invalid):
HTTP/1.1 400 Bad Request
Content-Type: text/html
<html>
<body>
<h1>Invalid Request</h1>
<p>Invalid client_id: {specific_error}</p>
</body>
</html>
Token Endpoint:
HTTP/1.1 400 Bad Request
Content-Type: application/json
{
"error": "invalid_client",
"error_description": "Invalid client_id: {specific_error}"
}
Error Handling
Validation Error Messages
Each validation rule has a specific, user-friendly error message:
| Validation Rule | Error Message |
|---|---|
| Invalid URL | "client_id must be a valid URL: {parse_error}" |
| Wrong scheme | "client_id must use https or http scheme" |
| HTTP not localhost | "client_id with http scheme is only allowed for localhost, 127.0.0.1, or [::1]" |
| Has fragment | "client_id must not contain a fragment (#)" |
| Has credentials | "client_id must not contain username or password" |
| Non-loopback IP | "client_id must not use IP address (except 127.0.0.1 or [::1])" |
| Path traversal | "client_id must not contain single-dot (.) or double-dot (..) path segments" |
Exception Handling
validate_client_id()never raises exceptions, returns (False, error_message)normalize_client_id()raises ValueError if validation fails- URL parsing exceptions are caught and converted to validation errors
Security Considerations
Fragment Rejection
Fragments in client_ids could cause confusion about the actual client identity. By rejecting them, we ensure clear client identification.
Credential Rejection
Username/password in URLs could leak into logs or be displayed to users. Rejecting them prevents credential exposure.
IP Address Restriction
Allowing arbitrary IP addresses could bypass domain-based security controls. Only loopback addresses are permitted for local development.
Path Traversal Prevention
Single-dot and double-dot segments could potentially be used for path traversal attacks or cause confusion about the client's identity.
HTTP Localhost Support
HTTP is only allowed for localhost/loopback addresses to support local development while maintaining security in production.
Testing Strategy
Unit Tests Required
Create comprehensive tests in /home/phil/Projects/Gondulf/tests/unit/test_validation.py:
Valid Client IDs
valid_client_ids = [
"https://example.com",
"https://example.com/",
"https://example.com/app",
"https://example.com/app/client",
"https://example.com?foo=bar",
"https://example.com/app?foo=bar&baz=qux",
"https://sub.example.com",
"https://example.com:8080",
"https://example.com:8080/app",
"http://localhost",
"http://localhost:3000",
"http://127.0.0.1",
"http://127.0.0.1:8080",
"http://[::1]",
"http://[::1]:8080",
]
Invalid Client IDs
invalid_client_ids = [
("ftp://example.com", "must use https or http scheme"),
("https://example.com#fragment", "must not contain a fragment"),
("https://user:pass@example.com", "must not contain username or password"),
("https://example.com/./invalid", "must not contain single-dot"),
("https://example.com/../invalid", "must not contain double-dot"),
("http://example.com", "http scheme is only allowed for localhost"),
("https://192.168.1.1", "must not use IP address"),
("https://10.0.0.1", "must not use IP address"),
("https://[2001:db8::1]", "must not use IP address"),
("not-a-url", "must be a valid URL"),
("", "must be a valid URL"),
]
Normalization Tests
normalization_cases = [
("HTTPS://EXAMPLE.COM", "https://example.com/"),
("https://example.com", "https://example.com/"),
("https://example.com:443", "https://example.com/"),
("http://localhost:80", "http://localhost/"),
("https://EXAMPLE.COM:443/app", "https://example.com/app"),
("https://Example.Com/APP", "https://example.com/APP"), # Path case preserved
("https://example.com?foo=bar", "https://example.com/?foo=bar"),
]
Integration Tests
- Test authorization endpoint with various client_ids
- Test token endpoint with various client_ids
- Test that normalized client_ids match correctly between endpoints
- Test error responses for invalid client_ids
Security Tests
- Test that fragments are always rejected
- Test that credentials are always rejected
- Test that non-loopback IPs are rejected
- Test that path traversal segments are rejected
- Test that HTTP is only allowed for localhost
Acceptance Criteria
- ✅ All valid client_ids per W3C specification are accepted
- ✅ All invalid client_ids per W3C specification are rejected with specific error messages
- ✅ HTTP scheme is accepted for localhost, 127.0.0.1, and [::1]
- ✅ HTTPS scheme is accepted for all valid domain names
- ✅ Fragments are always rejected
- ✅ Username/password components are always rejected
- ✅ Non-loopback IP addresses are rejected
- ✅ Single-dot and double-dot path segments are rejected
- ✅ Hostnames are normalized to lowercase
- ✅ Default ports (80 for HTTP, 443 for HTTPS) are removed
- ✅ Empty paths are normalized to "/"
- ✅ Query strings are preserved
- ✅ Authorization endpoint uses new validation
- ✅ Token endpoint uses new validation
- ✅ All tests pass with 100% coverage of validation logic
- ✅ Error messages are specific and helpful
Implementation Order
Current Task (validation.py update):
- Implement
validate_client_id()function in validation.py - Update
normalize_client_id()to use validation in validation.py - Write comprehensive unit tests in tests/unit/test_validation.py
Separate Future Tasks:
- Update authorization endpoint (SEPARATE TASK)
- Update token endpoint (SEPARATE TASK)
- Write integration tests (SEPARATE TASK)
- Test with real IndieAuth clients (SEPARATE TASK)
Migration Notes
- No database migration needed
- Existing stored client_ids remain valid (they were normalized on storage)
- New validation is stricter but backward compatible with valid client_ids