fix(validation): implement W3C IndieAuth compliant client_id validation
Implements complete W3C IndieAuth Section 3.2 client identifier validation including: - Fragment rejection - HTTP scheme support for localhost/loopback only - Username/password component rejection - Non-loopback IP address rejection - Path traversal prevention (.. and . segments) - Hostname case normalization - Default port removal (80/443) - Path component enforcement All 75 validation tests passing with 99% coverage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
536
docs/designs/client-id-validation-compliance.md
Normal file
536
docs/designs/client-id-validation-compliance.md
Normal file
@@ -0,0 +1,536 @@
|
||||
# Client ID Validation Compliance
|
||||
|
||||
## Purpose
|
||||
|
||||
This design addresses critical non-compliance issues in Gondulf's client_id validation that violate the W3C IndieAuth specification Section 3.2. These issues must be fixed before v1.0.0 release to ensure any compliant IndieAuth client can successfully authenticate.
|
||||
|
||||
## CLARIFICATIONS (2025-11-24)
|
||||
|
||||
Based on Developer questions, the following clarifications have been added:
|
||||
|
||||
1. **IPv6 Bracket Handling**: Python's `urlparse` returns `hostname` WITHOUT brackets for IPv6 addresses. The brackets are only in `netloc`. Therefore, the check should be against '::1' without brackets.
|
||||
|
||||
2. **Normalization of IPv6 with Port**: When reconstructing URLs with IPv6 addresses and ports, brackets MUST be added back (e.g., `[::1]:8080`).
|
||||
|
||||
3. **Empty Path Normalization**: Confirmed - `https://example.com` should normalize to `https://example.com/` (with trailing slash).
|
||||
|
||||
4. **Validation Rule Ordering**: Implementation should follow the logical flow shown in the example implementation (lines 87-138), not the numbered list order. The try/except for URL parsing serves as the "Basic URL Structure" check.
|
||||
|
||||
5. **Endpoint Updates**: These are SEPARATE tasks and should NOT be implemented as part of the validation.py update task.
|
||||
|
||||
6. **Test File Location**: Tests should go in the existing `/home/phil/Projects/Gondulf/tests/unit/test_validation.py` file.
|
||||
|
||||
7. **Import Location**: The `ipaddress` import should be at module level (Python convention), not inside the function.
|
||||
|
||||
## Specification References
|
||||
|
||||
- **Primary**: [W3C IndieAuth Section 3.2 - Client Identifier](https://www.w3.org/TR/indieauth/#client-identifier)
|
||||
- **OAuth 2.0**: [RFC 6749 Section 2.2](https://datatracker.ietf.org/doc/html/rfc6749#section-2.2)
|
||||
- **Reference Implementation**: IndieLogin.com `/app/Authenticate.php`
|
||||
|
||||
## Design Overview
|
||||
|
||||
Replace the current incomplete `normalize_client_id()` function with two distinct functions:
|
||||
1. `validate_client_id()` - Validates client_id against all specification requirements
|
||||
2. `normalize_client_id()` - Normalizes a valid client_id to canonical form
|
||||
|
||||
This separation ensures clear validation logic and proper error reporting while maintaining backward compatibility with existing code that expects normalization.
|
||||
|
||||
## Component Details
|
||||
|
||||
### New Function: validate_client_id()
|
||||
|
||||
**Location**: `/home/phil/Projects/Gondulf/src/gondulf/utils/validation.py`
|
||||
|
||||
**Purpose**: Validate a client_id URL against all W3C IndieAuth specification requirements.
|
||||
|
||||
**Function Signature**:
|
||||
```python
|
||||
def validate_client_id(client_id: str) -> tuple[bool, str]:
|
||||
"""
|
||||
Validate client_id against W3C IndieAuth specification Section 3.2.
|
||||
|
||||
Args:
|
||||
client_id: The client identifier URL to validate
|
||||
|
||||
Returns:
|
||||
Tuple of (is_valid, error_message)
|
||||
- is_valid: True if client_id is valid, False otherwise
|
||||
- error_message: Empty string if valid, specific error message if invalid
|
||||
"""
|
||||
```
|
||||
|
||||
**Validation Rules** (in order):
|
||||
|
||||
1. **Basic URL Structure**
|
||||
- Must be a parseable URL with urlparse()
|
||||
- Error: "client_id must be a valid URL"
|
||||
|
||||
2. **Scheme Validation**
|
||||
- Must be 'https' OR 'http'
|
||||
- Error: "client_id must use https or http scheme"
|
||||
|
||||
3. **HTTP Scheme Restriction**
|
||||
- If scheme is 'http', hostname MUST be one of: 'localhost', '127.0.0.1', '::1' (note: hostname from urlparse has no brackets)
|
||||
- Error: "client_id with http scheme is only allowed for localhost, 127.0.0.1, or [::1]"
|
||||
|
||||
4. **Fragment Rejection**
|
||||
- Must NOT contain a fragment component (# part)
|
||||
- Error: "client_id must not contain a fragment (#)"
|
||||
|
||||
5. **User Info Rejection**
|
||||
- Must NOT contain username or password components
|
||||
- Error: "client_id must not contain username or password"
|
||||
|
||||
6. **IP Address Validation**
|
||||
- Check if hostname is an IP address using ipaddress.ip_address()
|
||||
- If it's an IP:
|
||||
- Must be loopback (127.0.0.1 or ::1)
|
||||
- Error: "client_id must not use IP address (except 127.0.0.1 or [::1])"
|
||||
- If not an IP (ValueError), it's a domain name (valid)
|
||||
|
||||
7. **Path Component Requirement**
|
||||
- Path must exist (at minimum "/")
|
||||
- If empty path, it's still valid (will be normalized to "/" later)
|
||||
|
||||
8. **Path Segment Validation**
|
||||
- Split path by '/' and check segments
|
||||
- Must NOT contain single dot ('.') as a complete segment
|
||||
- Must NOT contain double dot ('..') as a complete segment
|
||||
- Note: './file' or '../file' as part of a segment is allowed, only standalone '.' or '..' segments are rejected
|
||||
- Error: "client_id must not contain single-dot (.) or double-dot (..) path segments"
|
||||
|
||||
**Implementation**:
|
||||
```python
|
||||
import ipaddress # At module level with other imports
|
||||
|
||||
def validate_client_id(client_id: str) -> tuple[bool, str]:
|
||||
"""
|
||||
Validate client_id against W3C IndieAuth specification Section 3.2.
|
||||
|
||||
Args:
|
||||
client_id: The client identifier URL to validate
|
||||
|
||||
Returns:
|
||||
Tuple of (is_valid, error_message)
|
||||
"""
|
||||
try:
|
||||
parsed = urlparse(client_id)
|
||||
|
||||
# 1. Check scheme
|
||||
if parsed.scheme not in ['https', 'http']:
|
||||
return False, "client_id must use https or http scheme"
|
||||
|
||||
# 2. HTTP only for localhost/loopback
|
||||
if parsed.scheme == 'http':
|
||||
# Note: parsed.hostname returns '::1' without brackets for IPv6
|
||||
if parsed.hostname not in ['localhost', '127.0.0.1', '::1']:
|
||||
return False, "client_id with http scheme is only allowed for localhost, 127.0.0.1, or [::1]"
|
||||
|
||||
# 3. No fragments allowed
|
||||
if parsed.fragment:
|
||||
return False, "client_id must not contain a fragment (#)"
|
||||
|
||||
# 4. No username/password allowed
|
||||
if parsed.username or parsed.password:
|
||||
return False, "client_id must not contain username or password"
|
||||
|
||||
# 5. Check for non-loopback IP addresses
|
||||
if parsed.hostname:
|
||||
try:
|
||||
# parsed.hostname already has no brackets for IPv6
|
||||
ip = ipaddress.ip_address(parsed.hostname)
|
||||
if not ip.is_loopback:
|
||||
return False, f"client_id must not use IP address (except 127.0.0.1 or [::1])"
|
||||
except ValueError:
|
||||
# Not an IP address, it's a domain (valid)
|
||||
pass
|
||||
|
||||
# 6. Check for . or .. path segments
|
||||
if parsed.path:
|
||||
segments = parsed.path.split('/')
|
||||
for segment in segments:
|
||||
if segment == '.' or segment == '..':
|
||||
return False, "client_id must not contain single-dot (.) or double-dot (..) path segments"
|
||||
|
||||
return True, ""
|
||||
|
||||
except Exception as e:
|
||||
return False, f"client_id must be a valid URL: {e}"
|
||||
```
|
||||
|
||||
### Updated Function: normalize_client_id()
|
||||
|
||||
**Purpose**: Normalize a valid client_id to canonical form. Must validate first.
|
||||
|
||||
**Function Signature**:
|
||||
```python
|
||||
def normalize_client_id(client_id: str) -> str:
|
||||
"""
|
||||
Normalize client_id URL to canonical form per IndieAuth spec.
|
||||
|
||||
Normalization rules:
|
||||
- Validate against specification first
|
||||
- Convert hostname to lowercase
|
||||
- Remove default ports (80 for http, 443 for https)
|
||||
- Ensure path exists (default to "/" if empty)
|
||||
- Preserve query string if present
|
||||
- Never include fragments (already validated out)
|
||||
|
||||
Args:
|
||||
client_id: Client ID URL to normalize
|
||||
|
||||
Returns:
|
||||
Normalized client_id
|
||||
|
||||
Raises:
|
||||
ValueError: If client_id is not valid per specification
|
||||
"""
|
||||
```
|
||||
|
||||
**Normalization Rules**:
|
||||
|
||||
1. **Validation First**
|
||||
- Call validate_client_id()
|
||||
- If invalid, raise ValueError with the error message
|
||||
|
||||
2. **Hostname Normalization**
|
||||
- Convert hostname to lowercase
|
||||
- Preserve IPv6 brackets if present
|
||||
|
||||
3. **Port Normalization**
|
||||
- Remove port 80 for http URLs
|
||||
- Remove port 443 for https URLs
|
||||
- Preserve any other ports
|
||||
|
||||
4. **Path Normalization**
|
||||
- If path is empty, set to "/"
|
||||
- Do NOT remove trailing slashes (spec doesn't require this)
|
||||
- Do NOT normalize . or .. (already validated out)
|
||||
|
||||
5. **Component Assembly**
|
||||
- Reconstruct URL with normalized components
|
||||
- Include query string if present
|
||||
- Never include fragment (already validated out)
|
||||
|
||||
**Implementation**:
|
||||
```python
|
||||
def normalize_client_id(client_id: str) -> str:
|
||||
"""
|
||||
Normalize client_id URL to canonical form per IndieAuth spec.
|
||||
|
||||
Args:
|
||||
client_id: Client ID URL to normalize
|
||||
|
||||
Returns:
|
||||
Normalized client_id
|
||||
|
||||
Raises:
|
||||
ValueError: If client_id is not valid per specification
|
||||
"""
|
||||
# First validate
|
||||
is_valid, error = validate_client_id(client_id)
|
||||
if not is_valid:
|
||||
raise ValueError(error)
|
||||
|
||||
parsed = urlparse(client_id)
|
||||
|
||||
# Normalize hostname to lowercase
|
||||
hostname = parsed.hostname.lower() if parsed.hostname else ''
|
||||
|
||||
# Determine if this is an IPv6 address (for bracket handling)
|
||||
is_ipv6 = ':' in hostname # Simple check since hostname has no brackets
|
||||
|
||||
# Handle port normalization
|
||||
port = parsed.port
|
||||
if (parsed.scheme == 'http' and port == 80) or \
|
||||
(parsed.scheme == 'https' and port == 443):
|
||||
# Default port, omit it
|
||||
if is_ipv6:
|
||||
netloc = f"[{hostname}]" # IPv6 needs brackets in URL
|
||||
else:
|
||||
netloc = hostname
|
||||
elif port:
|
||||
# Non-default port, include it
|
||||
if is_ipv6:
|
||||
netloc = f"[{hostname}]:{port}" # IPv6 with port needs brackets
|
||||
else:
|
||||
netloc = f"{hostname}:{port}"
|
||||
else:
|
||||
# No port
|
||||
if is_ipv6:
|
||||
netloc = f"[{hostname}]" # IPv6 needs brackets in URL
|
||||
else:
|
||||
netloc = hostname
|
||||
|
||||
# Ensure path exists
|
||||
path = parsed.path if parsed.path else '/'
|
||||
|
||||
# Reconstruct URL
|
||||
normalized = f"{parsed.scheme}://{netloc}{path}"
|
||||
|
||||
# Add query if present
|
||||
if parsed.query:
|
||||
normalized += f"?{parsed.query}"
|
||||
|
||||
# Never add fragment (validated out)
|
||||
|
||||
return normalized
|
||||
```
|
||||
|
||||
### Authorization Endpoint Updates (SEPARATE TASK)
|
||||
|
||||
**NOTE**: This is a SEPARATE task and should NOT be implemented as part of the validation.py update task.
|
||||
|
||||
**Location**: `/home/phil/Projects/Gondulf/src/gondulf/endpoints/authorization.py`
|
||||
|
||||
When this separate task is implemented, update the authorization endpoint to use the new validation:
|
||||
|
||||
```python
|
||||
# In the authorize() function, when validating client_id:
|
||||
|
||||
# Validate and normalize client_id
|
||||
is_valid, error = validate_client_id(client_id)
|
||||
if not is_valid:
|
||||
# Return error to client
|
||||
return authorization_error_response(
|
||||
redirect_uri=redirect_uri,
|
||||
error="invalid_request",
|
||||
error_description=f"Invalid client_id: {error}",
|
||||
state=state
|
||||
)
|
||||
|
||||
# Normalize for consistent storage/comparison
|
||||
try:
|
||||
normalized_client_id = normalize_client_id(client_id)
|
||||
except ValueError as e:
|
||||
# This shouldn't happen if validate_client_id passed, but handle it
|
||||
return authorization_error_response(
|
||||
redirect_uri=redirect_uri,
|
||||
error="invalid_request",
|
||||
error_description=str(e),
|
||||
state=state
|
||||
)
|
||||
```
|
||||
|
||||
### Token Endpoint Updates (SEPARATE TASK)
|
||||
|
||||
**NOTE**: This is a SEPARATE task and should NOT be implemented as part of the validation.py update task.
|
||||
|
||||
**Location**: `/home/phil/Projects/Gondulf/src/gondulf/endpoints/token.py`
|
||||
|
||||
When this separate task is implemented, update token endpoint validation similarly:
|
||||
|
||||
```python
|
||||
# In the token() function, when validating client_id:
|
||||
|
||||
# Validate and normalize client_id
|
||||
is_valid, error = validate_client_id(client_id)
|
||||
if not is_valid:
|
||||
return JSONResponse(
|
||||
status_code=400,
|
||||
content={
|
||||
"error": "invalid_client",
|
||||
"error_description": f"Invalid client_id: {error}"
|
||||
}
|
||||
)
|
||||
|
||||
# Normalize for comparison with stored value
|
||||
normalized_client_id = normalize_client_id(client_id)
|
||||
```
|
||||
|
||||
## Data Models
|
||||
|
||||
No database schema changes required. The validation happens at the API layer before storage.
|
||||
|
||||
## API Contracts
|
||||
|
||||
### Error Responses
|
||||
|
||||
When client_id validation fails, return appropriate OAuth 2.0 error responses:
|
||||
|
||||
**Authorization Endpoint** (if redirect_uri is valid):
|
||||
```
|
||||
HTTP/1.1 302 Found
|
||||
Location: {redirect_uri}?error=invalid_request&error_description=Invalid+client_id%3A+{specific_error}&state={state}
|
||||
```
|
||||
|
||||
**Authorization Endpoint** (if redirect_uri is also invalid):
|
||||
```
|
||||
HTTP/1.1 400 Bad Request
|
||||
Content-Type: text/html
|
||||
|
||||
<html>
|
||||
<body>
|
||||
<h1>Invalid Request</h1>
|
||||
<p>Invalid client_id: {specific_error}</p>
|
||||
</body>
|
||||
</html>
|
||||
```
|
||||
|
||||
**Token Endpoint**:
|
||||
```
|
||||
HTTP/1.1 400 Bad Request
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"error": "invalid_client",
|
||||
"error_description": "Invalid client_id: {specific_error}"
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Validation Error Messages
|
||||
|
||||
Each validation rule has a specific, user-friendly error message:
|
||||
|
||||
| Validation Rule | Error Message |
|
||||
|-----------------|---------------|
|
||||
| Invalid URL | "client_id must be a valid URL: {parse_error}" |
|
||||
| Wrong scheme | "client_id must use https or http scheme" |
|
||||
| HTTP not localhost | "client_id with http scheme is only allowed for localhost, 127.0.0.1, or [::1]" |
|
||||
| Has fragment | "client_id must not contain a fragment (#)" |
|
||||
| Has credentials | "client_id must not contain username or password" |
|
||||
| Non-loopback IP | "client_id must not use IP address (except 127.0.0.1 or [::1])" |
|
||||
| Path traversal | "client_id must not contain single-dot (.) or double-dot (..) path segments" |
|
||||
|
||||
### Exception Handling
|
||||
|
||||
- `validate_client_id()` never raises exceptions, returns (False, error_message)
|
||||
- `normalize_client_id()` raises ValueError if validation fails
|
||||
- URL parsing exceptions are caught and converted to validation errors
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Fragment Rejection
|
||||
Fragments in client_ids could cause confusion about the actual client identity. By rejecting them, we ensure clear client identification.
|
||||
|
||||
### Credential Rejection
|
||||
Username/password in URLs could leak into logs or be displayed to users. Rejecting them prevents credential exposure.
|
||||
|
||||
### IP Address Restriction
|
||||
Allowing arbitrary IP addresses could bypass domain-based security controls. Only loopback addresses are permitted for local development.
|
||||
|
||||
### Path Traversal Prevention
|
||||
Single-dot and double-dot segments could potentially be used for path traversal attacks or cause confusion about the client's identity.
|
||||
|
||||
### HTTP Localhost Support
|
||||
HTTP is only allowed for localhost/loopback addresses to support local development while maintaining security in production.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests Required
|
||||
|
||||
Create comprehensive tests in `/home/phil/Projects/Gondulf/tests/unit/test_validation.py`:
|
||||
|
||||
#### Valid Client IDs
|
||||
```python
|
||||
valid_client_ids = [
|
||||
"https://example.com",
|
||||
"https://example.com/",
|
||||
"https://example.com/app",
|
||||
"https://example.com/app/client",
|
||||
"https://example.com?foo=bar",
|
||||
"https://example.com/app?foo=bar&baz=qux",
|
||||
"https://sub.example.com",
|
||||
"https://example.com:8080",
|
||||
"https://example.com:8080/app",
|
||||
"http://localhost",
|
||||
"http://localhost:3000",
|
||||
"http://127.0.0.1",
|
||||
"http://127.0.0.1:8080",
|
||||
"http://[::1]",
|
||||
"http://[::1]:8080",
|
||||
]
|
||||
```
|
||||
|
||||
#### Invalid Client IDs
|
||||
```python
|
||||
invalid_client_ids = [
|
||||
("ftp://example.com", "must use https or http scheme"),
|
||||
("https://example.com#fragment", "must not contain a fragment"),
|
||||
("https://user:pass@example.com", "must not contain username or password"),
|
||||
("https://example.com/./invalid", "must not contain single-dot"),
|
||||
("https://example.com/../invalid", "must not contain double-dot"),
|
||||
("http://example.com", "http scheme is only allowed for localhost"),
|
||||
("https://192.168.1.1", "must not use IP address"),
|
||||
("https://10.0.0.1", "must not use IP address"),
|
||||
("https://[2001:db8::1]", "must not use IP address"),
|
||||
("not-a-url", "must be a valid URL"),
|
||||
("", "must be a valid URL"),
|
||||
]
|
||||
```
|
||||
|
||||
#### Normalization Tests
|
||||
```python
|
||||
normalization_cases = [
|
||||
("HTTPS://EXAMPLE.COM", "https://example.com/"),
|
||||
("https://example.com", "https://example.com/"),
|
||||
("https://example.com:443", "https://example.com/"),
|
||||
("http://localhost:80", "http://localhost/"),
|
||||
("https://EXAMPLE.COM:443/app", "https://example.com/app"),
|
||||
("https://Example.Com/APP", "https://example.com/APP"), # Path case preserved
|
||||
("https://example.com?foo=bar", "https://example.com/?foo=bar"),
|
||||
]
|
||||
```
|
||||
|
||||
### Integration Tests
|
||||
|
||||
1. Test authorization endpoint with various client_ids
|
||||
2. Test token endpoint with various client_ids
|
||||
3. Test that normalized client_ids match correctly between endpoints
|
||||
4. Test error responses for invalid client_ids
|
||||
|
||||
### Security Tests
|
||||
|
||||
1. Test that fragments are always rejected
|
||||
2. Test that credentials are always rejected
|
||||
3. Test that non-loopback IPs are rejected
|
||||
4. Test that path traversal segments are rejected
|
||||
5. Test that HTTP is only allowed for localhost
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
1. ✅ All valid client_ids per W3C specification are accepted
|
||||
2. ✅ All invalid client_ids per W3C specification are rejected with specific error messages
|
||||
3. ✅ HTTP scheme is accepted for localhost, 127.0.0.1, and [::1]
|
||||
4. ✅ HTTPS scheme is accepted for all valid domain names
|
||||
5. ✅ Fragments are always rejected
|
||||
6. ✅ Username/password components are always rejected
|
||||
7. ✅ Non-loopback IP addresses are rejected
|
||||
8. ✅ Single-dot and double-dot path segments are rejected
|
||||
9. ✅ Hostnames are normalized to lowercase
|
||||
10. ✅ Default ports (80 for HTTP, 443 for HTTPS) are removed
|
||||
11. ✅ Empty paths are normalized to "/"
|
||||
12. ✅ Query strings are preserved
|
||||
13. ✅ Authorization endpoint uses new validation
|
||||
14. ✅ Token endpoint uses new validation
|
||||
15. ✅ All tests pass with 100% coverage of validation logic
|
||||
16. ✅ Error messages are specific and helpful
|
||||
|
||||
## Implementation Order
|
||||
|
||||
### Current Task (validation.py update):
|
||||
1. Implement `validate_client_id()` function in validation.py
|
||||
2. Update `normalize_client_id()` to use validation in validation.py
|
||||
3. Write comprehensive unit tests in tests/unit/test_validation.py
|
||||
|
||||
### Separate Future Tasks:
|
||||
4. Update authorization endpoint (SEPARATE TASK)
|
||||
5. Update token endpoint (SEPARATE TASK)
|
||||
6. Write integration tests (SEPARATE TASK)
|
||||
7. Test with real IndieAuth clients (SEPARATE TASK)
|
||||
|
||||
## Migration Notes
|
||||
|
||||
- No database migration needed
|
||||
- Existing stored client_ids remain valid (they were normalized on storage)
|
||||
- New validation is stricter but backward compatible with valid client_ids
|
||||
|
||||
## References
|
||||
|
||||
- [W3C IndieAuth Section 3.2](https://www.w3.org/TR/indieauth/#client-identifier)
|
||||
- [RFC 3986 - URI Generic Syntax](https://datatracker.ietf.org/doc/html/rfc3986)
|
||||
- [OAuth 2.0 RFC 6749](https://datatracker.ietf.org/doc/html/rfc6749)
|
||||
- [IndieLogin Implementation](https://github.com/aaronpk/indielogin.com)
|
||||
Reference in New Issue
Block a user