Implements complete W3C IndieAuth Section 3.2 client identifier validation including: - Fragment rejection - HTTP scheme support for localhost/loopback only - Username/password component rejection - Non-loopback IP address rejection - Path traversal prevention (.. and . segments) - Hostname case normalization - Default port removal (80/443) - Path component enforcement All 75 validation tests passing with 99% coverage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
536 lines
18 KiB
Markdown
536 lines
18 KiB
Markdown
# Client ID Validation Compliance
|
|
|
|
## Purpose
|
|
|
|
This design addresses critical non-compliance issues in Gondulf's client_id validation that violate the W3C IndieAuth specification Section 3.2. These issues must be fixed before v1.0.0 release to ensure any compliant IndieAuth client can successfully authenticate.
|
|
|
|
## CLARIFICATIONS (2025-11-24)
|
|
|
|
Based on Developer questions, the following clarifications have been added:
|
|
|
|
1. **IPv6 Bracket Handling**: Python's `urlparse` returns `hostname` WITHOUT brackets for IPv6 addresses. The brackets are only in `netloc`. Therefore, the check should be against '::1' without brackets.
|
|
|
|
2. **Normalization of IPv6 with Port**: When reconstructing URLs with IPv6 addresses and ports, brackets MUST be added back (e.g., `[::1]:8080`).
|
|
|
|
3. **Empty Path Normalization**: Confirmed - `https://example.com` should normalize to `https://example.com/` (with trailing slash).
|
|
|
|
4. **Validation Rule Ordering**: Implementation should follow the logical flow shown in the example implementation (lines 87-138), not the numbered list order. The try/except for URL parsing serves as the "Basic URL Structure" check.
|
|
|
|
5. **Endpoint Updates**: These are SEPARATE tasks and should NOT be implemented as part of the validation.py update task.
|
|
|
|
6. **Test File Location**: Tests should go in the existing `/home/phil/Projects/Gondulf/tests/unit/test_validation.py` file.
|
|
|
|
7. **Import Location**: The `ipaddress` import should be at module level (Python convention), not inside the function.
|
|
|
|
## Specification References
|
|
|
|
- **Primary**: [W3C IndieAuth Section 3.2 - Client Identifier](https://www.w3.org/TR/indieauth/#client-identifier)
|
|
- **OAuth 2.0**: [RFC 6749 Section 2.2](https://datatracker.ietf.org/doc/html/rfc6749#section-2.2)
|
|
- **Reference Implementation**: IndieLogin.com `/app/Authenticate.php`
|
|
|
|
## Design Overview
|
|
|
|
Replace the current incomplete `normalize_client_id()` function with two distinct functions:
|
|
1. `validate_client_id()` - Validates client_id against all specification requirements
|
|
2. `normalize_client_id()` - Normalizes a valid client_id to canonical form
|
|
|
|
This separation ensures clear validation logic and proper error reporting while maintaining backward compatibility with existing code that expects normalization.
|
|
|
|
## Component Details
|
|
|
|
### New Function: validate_client_id()
|
|
|
|
**Location**: `/home/phil/Projects/Gondulf/src/gondulf/utils/validation.py`
|
|
|
|
**Purpose**: Validate a client_id URL against all W3C IndieAuth specification requirements.
|
|
|
|
**Function Signature**:
|
|
```python
|
|
def validate_client_id(client_id: str) -> tuple[bool, str]:
|
|
"""
|
|
Validate client_id against W3C IndieAuth specification Section 3.2.
|
|
|
|
Args:
|
|
client_id: The client identifier URL to validate
|
|
|
|
Returns:
|
|
Tuple of (is_valid, error_message)
|
|
- is_valid: True if client_id is valid, False otherwise
|
|
- error_message: Empty string if valid, specific error message if invalid
|
|
"""
|
|
```
|
|
|
|
**Validation Rules** (in order):
|
|
|
|
1. **Basic URL Structure**
|
|
- Must be a parseable URL with urlparse()
|
|
- Error: "client_id must be a valid URL"
|
|
|
|
2. **Scheme Validation**
|
|
- Must be 'https' OR 'http'
|
|
- Error: "client_id must use https or http scheme"
|
|
|
|
3. **HTTP Scheme Restriction**
|
|
- If scheme is 'http', hostname MUST be one of: 'localhost', '127.0.0.1', '::1' (note: hostname from urlparse has no brackets)
|
|
- Error: "client_id with http scheme is only allowed for localhost, 127.0.0.1, or [::1]"
|
|
|
|
4. **Fragment Rejection**
|
|
- Must NOT contain a fragment component (# part)
|
|
- Error: "client_id must not contain a fragment (#)"
|
|
|
|
5. **User Info Rejection**
|
|
- Must NOT contain username or password components
|
|
- Error: "client_id must not contain username or password"
|
|
|
|
6. **IP Address Validation**
|
|
- Check if hostname is an IP address using ipaddress.ip_address()
|
|
- If it's an IP:
|
|
- Must be loopback (127.0.0.1 or ::1)
|
|
- Error: "client_id must not use IP address (except 127.0.0.1 or [::1])"
|
|
- If not an IP (ValueError), it's a domain name (valid)
|
|
|
|
7. **Path Component Requirement**
|
|
- Path must exist (at minimum "/")
|
|
- If empty path, it's still valid (will be normalized to "/" later)
|
|
|
|
8. **Path Segment Validation**
|
|
- Split path by '/' and check segments
|
|
- Must NOT contain single dot ('.') as a complete segment
|
|
- Must NOT contain double dot ('..') as a complete segment
|
|
- Note: './file' or '../file' as part of a segment is allowed, only standalone '.' or '..' segments are rejected
|
|
- Error: "client_id must not contain single-dot (.) or double-dot (..) path segments"
|
|
|
|
**Implementation**:
|
|
```python
|
|
import ipaddress # At module level with other imports
|
|
|
|
def validate_client_id(client_id: str) -> tuple[bool, str]:
|
|
"""
|
|
Validate client_id against W3C IndieAuth specification Section 3.2.
|
|
|
|
Args:
|
|
client_id: The client identifier URL to validate
|
|
|
|
Returns:
|
|
Tuple of (is_valid, error_message)
|
|
"""
|
|
try:
|
|
parsed = urlparse(client_id)
|
|
|
|
# 1. Check scheme
|
|
if parsed.scheme not in ['https', 'http']:
|
|
return False, "client_id must use https or http scheme"
|
|
|
|
# 2. HTTP only for localhost/loopback
|
|
if parsed.scheme == 'http':
|
|
# Note: parsed.hostname returns '::1' without brackets for IPv6
|
|
if parsed.hostname not in ['localhost', '127.0.0.1', '::1']:
|
|
return False, "client_id with http scheme is only allowed for localhost, 127.0.0.1, or [::1]"
|
|
|
|
# 3. No fragments allowed
|
|
if parsed.fragment:
|
|
return False, "client_id must not contain a fragment (#)"
|
|
|
|
# 4. No username/password allowed
|
|
if parsed.username or parsed.password:
|
|
return False, "client_id must not contain username or password"
|
|
|
|
# 5. Check for non-loopback IP addresses
|
|
if parsed.hostname:
|
|
try:
|
|
# parsed.hostname already has no brackets for IPv6
|
|
ip = ipaddress.ip_address(parsed.hostname)
|
|
if not ip.is_loopback:
|
|
return False, f"client_id must not use IP address (except 127.0.0.1 or [::1])"
|
|
except ValueError:
|
|
# Not an IP address, it's a domain (valid)
|
|
pass
|
|
|
|
# 6. Check for . or .. path segments
|
|
if parsed.path:
|
|
segments = parsed.path.split('/')
|
|
for segment in segments:
|
|
if segment == '.' or segment == '..':
|
|
return False, "client_id must not contain single-dot (.) or double-dot (..) path segments"
|
|
|
|
return True, ""
|
|
|
|
except Exception as e:
|
|
return False, f"client_id must be a valid URL: {e}"
|
|
```
|
|
|
|
### Updated Function: normalize_client_id()
|
|
|
|
**Purpose**: Normalize a valid client_id to canonical form. Must validate first.
|
|
|
|
**Function Signature**:
|
|
```python
|
|
def normalize_client_id(client_id: str) -> str:
|
|
"""
|
|
Normalize client_id URL to canonical form per IndieAuth spec.
|
|
|
|
Normalization rules:
|
|
- Validate against specification first
|
|
- Convert hostname to lowercase
|
|
- Remove default ports (80 for http, 443 for https)
|
|
- Ensure path exists (default to "/" if empty)
|
|
- Preserve query string if present
|
|
- Never include fragments (already validated out)
|
|
|
|
Args:
|
|
client_id: Client ID URL to normalize
|
|
|
|
Returns:
|
|
Normalized client_id
|
|
|
|
Raises:
|
|
ValueError: If client_id is not valid per specification
|
|
"""
|
|
```
|
|
|
|
**Normalization Rules**:
|
|
|
|
1. **Validation First**
|
|
- Call validate_client_id()
|
|
- If invalid, raise ValueError with the error message
|
|
|
|
2. **Hostname Normalization**
|
|
- Convert hostname to lowercase
|
|
- Preserve IPv6 brackets if present
|
|
|
|
3. **Port Normalization**
|
|
- Remove port 80 for http URLs
|
|
- Remove port 443 for https URLs
|
|
- Preserve any other ports
|
|
|
|
4. **Path Normalization**
|
|
- If path is empty, set to "/"
|
|
- Do NOT remove trailing slashes (spec doesn't require this)
|
|
- Do NOT normalize . or .. (already validated out)
|
|
|
|
5. **Component Assembly**
|
|
- Reconstruct URL with normalized components
|
|
- Include query string if present
|
|
- Never include fragment (already validated out)
|
|
|
|
**Implementation**:
|
|
```python
|
|
def normalize_client_id(client_id: str) -> str:
|
|
"""
|
|
Normalize client_id URL to canonical form per IndieAuth spec.
|
|
|
|
Args:
|
|
client_id: Client ID URL to normalize
|
|
|
|
Returns:
|
|
Normalized client_id
|
|
|
|
Raises:
|
|
ValueError: If client_id is not valid per specification
|
|
"""
|
|
# First validate
|
|
is_valid, error = validate_client_id(client_id)
|
|
if not is_valid:
|
|
raise ValueError(error)
|
|
|
|
parsed = urlparse(client_id)
|
|
|
|
# Normalize hostname to lowercase
|
|
hostname = parsed.hostname.lower() if parsed.hostname else ''
|
|
|
|
# Determine if this is an IPv6 address (for bracket handling)
|
|
is_ipv6 = ':' in hostname # Simple check since hostname has no brackets
|
|
|
|
# Handle port normalization
|
|
port = parsed.port
|
|
if (parsed.scheme == 'http' and port == 80) or \
|
|
(parsed.scheme == 'https' and port == 443):
|
|
# Default port, omit it
|
|
if is_ipv6:
|
|
netloc = f"[{hostname}]" # IPv6 needs brackets in URL
|
|
else:
|
|
netloc = hostname
|
|
elif port:
|
|
# Non-default port, include it
|
|
if is_ipv6:
|
|
netloc = f"[{hostname}]:{port}" # IPv6 with port needs brackets
|
|
else:
|
|
netloc = f"{hostname}:{port}"
|
|
else:
|
|
# No port
|
|
if is_ipv6:
|
|
netloc = f"[{hostname}]" # IPv6 needs brackets in URL
|
|
else:
|
|
netloc = hostname
|
|
|
|
# Ensure path exists
|
|
path = parsed.path if parsed.path else '/'
|
|
|
|
# Reconstruct URL
|
|
normalized = f"{parsed.scheme}://{netloc}{path}"
|
|
|
|
# Add query if present
|
|
if parsed.query:
|
|
normalized += f"?{parsed.query}"
|
|
|
|
# Never add fragment (validated out)
|
|
|
|
return normalized
|
|
```
|
|
|
|
### Authorization Endpoint Updates (SEPARATE TASK)
|
|
|
|
**NOTE**: This is a SEPARATE task and should NOT be implemented as part of the validation.py update task.
|
|
|
|
**Location**: `/home/phil/Projects/Gondulf/src/gondulf/endpoints/authorization.py`
|
|
|
|
When this separate task is implemented, update the authorization endpoint to use the new validation:
|
|
|
|
```python
|
|
# In the authorize() function, when validating client_id:
|
|
|
|
# Validate and normalize client_id
|
|
is_valid, error = validate_client_id(client_id)
|
|
if not is_valid:
|
|
# Return error to client
|
|
return authorization_error_response(
|
|
redirect_uri=redirect_uri,
|
|
error="invalid_request",
|
|
error_description=f"Invalid client_id: {error}",
|
|
state=state
|
|
)
|
|
|
|
# Normalize for consistent storage/comparison
|
|
try:
|
|
normalized_client_id = normalize_client_id(client_id)
|
|
except ValueError as e:
|
|
# This shouldn't happen if validate_client_id passed, but handle it
|
|
return authorization_error_response(
|
|
redirect_uri=redirect_uri,
|
|
error="invalid_request",
|
|
error_description=str(e),
|
|
state=state
|
|
)
|
|
```
|
|
|
|
### Token Endpoint Updates (SEPARATE TASK)
|
|
|
|
**NOTE**: This is a SEPARATE task and should NOT be implemented as part of the validation.py update task.
|
|
|
|
**Location**: `/home/phil/Projects/Gondulf/src/gondulf/endpoints/token.py`
|
|
|
|
When this separate task is implemented, update token endpoint validation similarly:
|
|
|
|
```python
|
|
# In the token() function, when validating client_id:
|
|
|
|
# Validate and normalize client_id
|
|
is_valid, error = validate_client_id(client_id)
|
|
if not is_valid:
|
|
return JSONResponse(
|
|
status_code=400,
|
|
content={
|
|
"error": "invalid_client",
|
|
"error_description": f"Invalid client_id: {error}"
|
|
}
|
|
)
|
|
|
|
# Normalize for comparison with stored value
|
|
normalized_client_id = normalize_client_id(client_id)
|
|
```
|
|
|
|
## Data Models
|
|
|
|
No database schema changes required. The validation happens at the API layer before storage.
|
|
|
|
## API Contracts
|
|
|
|
### Error Responses
|
|
|
|
When client_id validation fails, return appropriate OAuth 2.0 error responses:
|
|
|
|
**Authorization Endpoint** (if redirect_uri is valid):
|
|
```
|
|
HTTP/1.1 302 Found
|
|
Location: {redirect_uri}?error=invalid_request&error_description=Invalid+client_id%3A+{specific_error}&state={state}
|
|
```
|
|
|
|
**Authorization Endpoint** (if redirect_uri is also invalid):
|
|
```
|
|
HTTP/1.1 400 Bad Request
|
|
Content-Type: text/html
|
|
|
|
<html>
|
|
<body>
|
|
<h1>Invalid Request</h1>
|
|
<p>Invalid client_id: {specific_error}</p>
|
|
</body>
|
|
</html>
|
|
```
|
|
|
|
**Token Endpoint**:
|
|
```
|
|
HTTP/1.1 400 Bad Request
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"error": "invalid_client",
|
|
"error_description": "Invalid client_id: {specific_error}"
|
|
}
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Validation Error Messages
|
|
|
|
Each validation rule has a specific, user-friendly error message:
|
|
|
|
| Validation Rule | Error Message |
|
|
|-----------------|---------------|
|
|
| Invalid URL | "client_id must be a valid URL: {parse_error}" |
|
|
| Wrong scheme | "client_id must use https or http scheme" |
|
|
| HTTP not localhost | "client_id with http scheme is only allowed for localhost, 127.0.0.1, or [::1]" |
|
|
| Has fragment | "client_id must not contain a fragment (#)" |
|
|
| Has credentials | "client_id must not contain username or password" |
|
|
| Non-loopback IP | "client_id must not use IP address (except 127.0.0.1 or [::1])" |
|
|
| Path traversal | "client_id must not contain single-dot (.) or double-dot (..) path segments" |
|
|
|
|
### Exception Handling
|
|
|
|
- `validate_client_id()` never raises exceptions, returns (False, error_message)
|
|
- `normalize_client_id()` raises ValueError if validation fails
|
|
- URL parsing exceptions are caught and converted to validation errors
|
|
|
|
## Security Considerations
|
|
|
|
### Fragment Rejection
|
|
Fragments in client_ids could cause confusion about the actual client identity. By rejecting them, we ensure clear client identification.
|
|
|
|
### Credential Rejection
|
|
Username/password in URLs could leak into logs or be displayed to users. Rejecting them prevents credential exposure.
|
|
|
|
### IP Address Restriction
|
|
Allowing arbitrary IP addresses could bypass domain-based security controls. Only loopback addresses are permitted for local development.
|
|
|
|
### Path Traversal Prevention
|
|
Single-dot and double-dot segments could potentially be used for path traversal attacks or cause confusion about the client's identity.
|
|
|
|
### HTTP Localhost Support
|
|
HTTP is only allowed for localhost/loopback addresses to support local development while maintaining security in production.
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests Required
|
|
|
|
Create comprehensive tests in `/home/phil/Projects/Gondulf/tests/unit/test_validation.py`:
|
|
|
|
#### Valid Client IDs
|
|
```python
|
|
valid_client_ids = [
|
|
"https://example.com",
|
|
"https://example.com/",
|
|
"https://example.com/app",
|
|
"https://example.com/app/client",
|
|
"https://example.com?foo=bar",
|
|
"https://example.com/app?foo=bar&baz=qux",
|
|
"https://sub.example.com",
|
|
"https://example.com:8080",
|
|
"https://example.com:8080/app",
|
|
"http://localhost",
|
|
"http://localhost:3000",
|
|
"http://127.0.0.1",
|
|
"http://127.0.0.1:8080",
|
|
"http://[::1]",
|
|
"http://[::1]:8080",
|
|
]
|
|
```
|
|
|
|
#### Invalid Client IDs
|
|
```python
|
|
invalid_client_ids = [
|
|
("ftp://example.com", "must use https or http scheme"),
|
|
("https://example.com#fragment", "must not contain a fragment"),
|
|
("https://user:pass@example.com", "must not contain username or password"),
|
|
("https://example.com/./invalid", "must not contain single-dot"),
|
|
("https://example.com/../invalid", "must not contain double-dot"),
|
|
("http://example.com", "http scheme is only allowed for localhost"),
|
|
("https://192.168.1.1", "must not use IP address"),
|
|
("https://10.0.0.1", "must not use IP address"),
|
|
("https://[2001:db8::1]", "must not use IP address"),
|
|
("not-a-url", "must be a valid URL"),
|
|
("", "must be a valid URL"),
|
|
]
|
|
```
|
|
|
|
#### Normalization Tests
|
|
```python
|
|
normalization_cases = [
|
|
("HTTPS://EXAMPLE.COM", "https://example.com/"),
|
|
("https://example.com", "https://example.com/"),
|
|
("https://example.com:443", "https://example.com/"),
|
|
("http://localhost:80", "http://localhost/"),
|
|
("https://EXAMPLE.COM:443/app", "https://example.com/app"),
|
|
("https://Example.Com/APP", "https://example.com/APP"), # Path case preserved
|
|
("https://example.com?foo=bar", "https://example.com/?foo=bar"),
|
|
]
|
|
```
|
|
|
|
### Integration Tests
|
|
|
|
1. Test authorization endpoint with various client_ids
|
|
2. Test token endpoint with various client_ids
|
|
3. Test that normalized client_ids match correctly between endpoints
|
|
4. Test error responses for invalid client_ids
|
|
|
|
### Security Tests
|
|
|
|
1. Test that fragments are always rejected
|
|
2. Test that credentials are always rejected
|
|
3. Test that non-loopback IPs are rejected
|
|
4. Test that path traversal segments are rejected
|
|
5. Test that HTTP is only allowed for localhost
|
|
|
|
## Acceptance Criteria
|
|
|
|
1. ✅ All valid client_ids per W3C specification are accepted
|
|
2. ✅ All invalid client_ids per W3C specification are rejected with specific error messages
|
|
3. ✅ HTTP scheme is accepted for localhost, 127.0.0.1, and [::1]
|
|
4. ✅ HTTPS scheme is accepted for all valid domain names
|
|
5. ✅ Fragments are always rejected
|
|
6. ✅ Username/password components are always rejected
|
|
7. ✅ Non-loopback IP addresses are rejected
|
|
8. ✅ Single-dot and double-dot path segments are rejected
|
|
9. ✅ Hostnames are normalized to lowercase
|
|
10. ✅ Default ports (80 for HTTP, 443 for HTTPS) are removed
|
|
11. ✅ Empty paths are normalized to "/"
|
|
12. ✅ Query strings are preserved
|
|
13. ✅ Authorization endpoint uses new validation
|
|
14. ✅ Token endpoint uses new validation
|
|
15. ✅ All tests pass with 100% coverage of validation logic
|
|
16. ✅ Error messages are specific and helpful
|
|
|
|
## Implementation Order
|
|
|
|
### Current Task (validation.py update):
|
|
1. Implement `validate_client_id()` function in validation.py
|
|
2. Update `normalize_client_id()` to use validation in validation.py
|
|
3. Write comprehensive unit tests in tests/unit/test_validation.py
|
|
|
|
### Separate Future Tasks:
|
|
4. Update authorization endpoint (SEPARATE TASK)
|
|
5. Update token endpoint (SEPARATE TASK)
|
|
6. Write integration tests (SEPARATE TASK)
|
|
7. Test with real IndieAuth clients (SEPARATE TASK)
|
|
|
|
## Migration Notes
|
|
|
|
- No database migration needed
|
|
- Existing stored client_ids remain valid (they were normalized on storage)
|
|
- New validation is stricter but backward compatible with valid client_ids
|
|
|
|
## References
|
|
|
|
- [W3C IndieAuth Section 3.2](https://www.w3.org/TR/indieauth/#client-identifier)
|
|
- [RFC 3986 - URI Generic Syntax](https://datatracker.ietf.org/doc/html/rfc3986)
|
|
- [OAuth 2.0 RFC 6749](https://datatracker.ietf.org/doc/html/rfc6749)
|
|
- [IndieLogin Implementation](https://github.com/aaronpk/indielogin.com) |