docs: add Phase 2 domain verification design and clarifications
Add comprehensive Phase 2 documentation: - Complete design document for two-factor domain verification - Implementation guide with code examples - ADR for implementation decisions (ADR-0004) - ADR for rel="me" email discovery (ADR-008) - Phase 1 impact assessment - All 23 clarification questions answered - Updated architecture docs (indieauth-protocol, security) - Updated ADR-005 with rel="me" approach - Updated backlog with technical debt items Design ready for Phase 2 implementation. Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
516
docs/decisions/ADR-008-rel-me-email-discovery.md
Normal file
516
docs/decisions/ADR-008-rel-me-email-discovery.md
Normal file
@@ -0,0 +1,516 @@
|
||||
# ADR-008: rel="me" Email Discovery Pattern
|
||||
|
||||
Date: 2025-11-20
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
Gondulf's authentication flow requires email verification as part of two-factor domain verification (see ADR-005). This raises the question: How do we obtain the user's email address?
|
||||
|
||||
### Email Acquisition Methods Evaluated
|
||||
|
||||
**1. User-Provided Email Input**
|
||||
- User manually enters their email address
|
||||
- Server validates email domain matches identity domain
|
||||
- Simple UX pattern (familiar from many sites)
|
||||
|
||||
**2. DNS TXT Record**
|
||||
- Email address stored in DNS: `_email.example.com` TXT `user@example.com`
|
||||
- Server queries DNS to discover email
|
||||
- Requires DNS configuration
|
||||
|
||||
**3. rel="me" Link Discovery (IndieWeb Standard)**
|
||||
- User publishes email on their site: `<link rel="me" href="mailto:user@example.com">`
|
||||
- Server fetches site and parses HTML for rel="me" links
|
||||
- Follows IndieWeb standards for identity claims
|
||||
|
||||
**4. WebFinger Protocol**
|
||||
- Server queries `/.well-known/webfinger?resource={domain}`
|
||||
- Standard protocol for identity discovery
|
||||
- Requires additional endpoint implementation
|
||||
|
||||
### Requirements
|
||||
|
||||
From the user requirement and IndieAuth ecosystem:
|
||||
- **Security**: Prevent social engineering and email spoofing
|
||||
- **Simplicity**: Keep v1.0.0 implementation straightforward
|
||||
- **Standards**: Align with IndieWeb/IndieAuth community practices
|
||||
- **Self-Documenting**: Users should understand what they're publishing
|
||||
|
||||
### IndieWeb Context
|
||||
|
||||
The IndieWeb community uses `rel="me"` as a standard way to assert identity relationships:
|
||||
- Users publish rel="me" links on their homepage to various profiles (GitHub, Twitter, email, etc.)
|
||||
- Other tools can discover these relationships by parsing the page
|
||||
- Well-established pattern in the IndieWeb ecosystem
|
||||
- Reference implementation: https://thesatelliteoflove.com
|
||||
|
||||
## Decision
|
||||
|
||||
**Gondulf v1.0.0 will discover email addresses from rel="me" links published on the user's homepage, following the IndieWeb standard.**
|
||||
|
||||
### Implementation Approach
|
||||
|
||||
1. **Fetch User's Homepage**
|
||||
- When user initiates authentication with domain (e.g., `https://example.com`)
|
||||
- Server fetches the homepage over HTTPS
|
||||
- Timeout: 10 seconds
|
||||
- Follow redirects (max 5)
|
||||
- Verify SSL certificate
|
||||
|
||||
2. **Parse HTML for rel="me" Links**
|
||||
- Use BeautifulSoup for robust HTML parsing (handles malformed HTML)
|
||||
- Search for `<link rel="me" href="mailto:...">` tags
|
||||
- Also check `<a rel="me" href="mailto:...">` tags
|
||||
- Extract first matching mailto: link
|
||||
- Case-insensitive rel attribute matching
|
||||
|
||||
3. **Validate Email Format**
|
||||
- Basic RFC 5322 format validation
|
||||
- Length checks (max 254 characters per RFC 5321)
|
||||
- Format: `user@domain.tld`
|
||||
|
||||
4. **Use Discovered Email**
|
||||
- Send verification code to discovered email
|
||||
- Display partially masked email to user: `u***@example.com`
|
||||
- User cannot modify email (discovered automatically)
|
||||
|
||||
5. **Error Handling**
|
||||
- If no rel="me" link found: Display setup instructions
|
||||
- If multiple mailto: links: Use first one
|
||||
- If site unreachable: Display error with retry option
|
||||
- If SSL verification fails: Reject (security)
|
||||
|
||||
### Example HTML
|
||||
|
||||
User adds this to their homepage:
|
||||
|
||||
```html
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<title>Phil Skents</title>
|
||||
<!-- rel="me" link for email -->
|
||||
<link rel="me" href="mailto:phil@example.com">
|
||||
|
||||
<!-- Other rel="me" links (optional) -->
|
||||
<link rel="me" href="https://github.com/philskents">
|
||||
<link rel="me" href="https://twitter.com/philskents">
|
||||
</head>
|
||||
<body>
|
||||
<h1>Phil Skents</h1>
|
||||
<p>This is my personal website.</p>
|
||||
</body>
|
||||
</html>
|
||||
```
|
||||
|
||||
Or visible link:
|
||||
|
||||
```html
|
||||
<a rel="me" href="mailto:phil@example.com">Email me</a>
|
||||
```
|
||||
|
||||
## Rationale
|
||||
|
||||
### Follows IndieWeb Standards
|
||||
|
||||
**IndieWeb Alignment**:
|
||||
- rel="me" is the standard way to assert identity in IndieWeb
|
||||
- Users familiar with IndieAuth likely already have rel="me" configured
|
||||
- Interoperability with other IndieWeb tools
|
||||
- Well-documented pattern: https://indieweb.org/rel-me
|
||||
|
||||
**Community Expectations**:
|
||||
- IndieAuth ecosystem uses rel="me" extensively
|
||||
- Users understand the pattern
|
||||
- Existing tutorials and documentation available
|
||||
- Aligns with decentralized identity principles
|
||||
|
||||
### Security Benefits
|
||||
|
||||
**Prevents Social Engineering**:
|
||||
- User cannot claim arbitrary email addresses
|
||||
- Email must be published on the user's own site
|
||||
- Attacker cannot trick user into entering wrong email
|
||||
- Self-attested identity (user declares on their domain)
|
||||
|
||||
**Reduces Attack Surface**:
|
||||
- No user input field for email (no typos, no XSS)
|
||||
- No email enumeration via guessing
|
||||
- Email discovery transparent and auditable
|
||||
- User controls what email is published
|
||||
|
||||
**Transparency**:
|
||||
- User explicitly publishes email on their site
|
||||
- Public declaration of email relationship
|
||||
- User aware they're making email public
|
||||
- No hidden or implicit email collection
|
||||
|
||||
### Implementation Simplicity
|
||||
|
||||
**Standard Libraries**:
|
||||
- BeautifulSoup: Robust HTML parsing (handles malformed HTML)
|
||||
- requests: HTTP client (widely used, well-tested)
|
||||
- No custom protocols or complex parsing
|
||||
- Python standard library for email validation
|
||||
|
||||
**Error Handling**:
|
||||
- Clear error messages with setup instructions
|
||||
- Graceful degradation (site unavailable, etc.)
|
||||
- Standard HTTP status codes
|
||||
- No complex state management
|
||||
|
||||
**Testing**:
|
||||
- Easy to mock HTTP responses
|
||||
- Straightforward unit tests
|
||||
- BeautifulSoup handles edge cases (malformed HTML)
|
||||
- No external service dependencies
|
||||
|
||||
### User Experience
|
||||
|
||||
**Self-Documenting**:
|
||||
- User adds one HTML tag to their site
|
||||
- Clear relationship between domain and email
|
||||
- User understands what they're publishing
|
||||
- No hidden configuration
|
||||
|
||||
**Familiar Pattern**:
|
||||
- Similar to verifying site ownership (Google Search Console, etc.)
|
||||
- Adding meta tags is common web practice
|
||||
- Many users already have rel="me" for other purposes
|
||||
- Works with static sites (no backend required)
|
||||
|
||||
**Setup Time**:
|
||||
- ~1 minute to add link tag
|
||||
- No waiting (unlike DNS propagation)
|
||||
- Immediate verification possible
|
||||
- Can be combined with other rel="me" links
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive Consequences
|
||||
|
||||
1. **IndieWeb Standard Compliance**:
|
||||
- Follows established rel="me" pattern
|
||||
- Interoperability with IndieWeb tools
|
||||
- Community-vetted approach
|
||||
- Well-documented standard
|
||||
|
||||
2. **Enhanced Security**:
|
||||
- No user-provided email input (prevents social engineering)
|
||||
- Email explicitly published by user
|
||||
- Transparent and auditable
|
||||
- Reduces phishing risk
|
||||
|
||||
3. **Implementation Simplicity**:
|
||||
- Standard libraries (BeautifulSoup, requests)
|
||||
- No complex protocols
|
||||
- Easy to test and maintain
|
||||
- Handles malformed HTML gracefully
|
||||
|
||||
4. **User Control**:
|
||||
- User explicitly declares email on their site
|
||||
- Can change email by updating HTML
|
||||
- No hidden email collection
|
||||
- User aware of public email
|
||||
|
||||
5. **Flexibility**:
|
||||
- Works with static sites (no backend needed)
|
||||
- Can use any email provider
|
||||
- Email can be at different domain (e.g., Gmail)
|
||||
- Supports multiple rel="me" links
|
||||
|
||||
### Negative Consequences
|
||||
|
||||
1. **Public Email Requirement**:
|
||||
- User must publish email publicly on their site
|
||||
- Not suitable for users who want private email
|
||||
- Email harvesters can discover address
|
||||
- Spam risk (mitigated: users can use spam filters)
|
||||
|
||||
2. **HTML Parsing Complexity**:
|
||||
- Must handle various HTML formats
|
||||
- Malformed HTML can cause issues (mitigated: BeautifulSoup)
|
||||
- Case sensitivity considerations
|
||||
- Multiple possible HTML structures
|
||||
|
||||
3. **Website Dependency**:
|
||||
- User's site must be available during authentication
|
||||
- Site downtime blocks authentication
|
||||
- No fallback if site unreachable
|
||||
- Requires HTTPS (not all sites have valid certificates)
|
||||
|
||||
4. **Discovery Failures**:
|
||||
- User may not have rel="me" configured
|
||||
- Link may be in wrong format
|
||||
- Email may be invalid format
|
||||
- Clear error messages required
|
||||
|
||||
5. **Privacy Considerations**:
|
||||
- Email addresses visible to anyone
|
||||
- Cannot use email verification without public disclosure
|
||||
- Users must accept public email
|
||||
- May deter privacy-conscious users
|
||||
|
||||
### Mitigation Strategies
|
||||
|
||||
**For Public Email Concern**:
|
||||
- Document clearly that email will be public
|
||||
- Suggest using dedicated email for IndieAuth
|
||||
- Recommend spam filtering
|
||||
- Note: Email is user's choice (they publish it)
|
||||
|
||||
**For HTML Parsing**:
|
||||
```python
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
# BeautifulSoup handles malformed HTML gracefully
|
||||
soup = BeautifulSoup(html_content, 'html.parser')
|
||||
|
||||
# Case-insensitive attribute matching
|
||||
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
|
||||
|
||||
# Multiple link formats supported
|
||||
# <link rel="me" href="mailto:user@example.com">
|
||||
# <a rel="me" href="mailto:user@example.com">Email</a>
|
||||
```
|
||||
|
||||
**For Website Dependency**:
|
||||
- Clear error messages with retry option
|
||||
- Suggest checking site availability
|
||||
- Timeout limits (10 seconds)
|
||||
- Log errors for debugging
|
||||
|
||||
**For Discovery Failures**:
|
||||
```markdown
|
||||
Error: No rel="me" email link found
|
||||
|
||||
Please add this to your homepage:
|
||||
<link rel="me" href="mailto:your-email@example.com">
|
||||
|
||||
See: https://indieweb.org/rel-me for more information.
|
||||
```
|
||||
|
||||
## Implementation
|
||||
|
||||
### Email Discovery Service
|
||||
|
||||
```python
|
||||
from bs4 import BeautifulSoup
|
||||
import requests
|
||||
from typing import Optional
|
||||
import re
|
||||
|
||||
class RelMeEmailDiscovery:
|
||||
"""
|
||||
Discover email addresses from rel="me" links on user's homepage.
|
||||
"""
|
||||
|
||||
def discover_email(self, domain: str) -> Optional[str]:
|
||||
"""
|
||||
Fetch domain homepage and discover email from rel="me" link.
|
||||
|
||||
Args:
|
||||
domain: User's domain (e.g., "example.com")
|
||||
|
||||
Returns:
|
||||
Email address or None if not found
|
||||
"""
|
||||
url = f"https://{domain}"
|
||||
|
||||
try:
|
||||
# Fetch homepage with safety limits
|
||||
response = requests.get(
|
||||
url,
|
||||
timeout=10,
|
||||
allow_redirects=True,
|
||||
max_redirects=5,
|
||||
verify=True # Verify SSL certificate
|
||||
)
|
||||
response.raise_for_status()
|
||||
|
||||
# Parse HTML (handles malformed HTML)
|
||||
soup = BeautifulSoup(response.content, 'html.parser')
|
||||
|
||||
# Find all rel="me" links
|
||||
# Both <link> and <a> tags supported
|
||||
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
|
||||
|
||||
# Look for mailto: links
|
||||
for link in me_links:
|
||||
href = link.get('href', '')
|
||||
if href.startswith('mailto:'):
|
||||
email = href.replace('mailto:', '').strip()
|
||||
|
||||
# Validate email format
|
||||
if self._validate_email_format(email):
|
||||
logger.info(f"Discovered email via rel='me' for {domain}")
|
||||
return email
|
||||
|
||||
logger.warning(f"No rel='me' mailto: link found on {domain}")
|
||||
return None
|
||||
|
||||
except requests.exceptions.SSLError as e:
|
||||
logger.error(f"SSL verification failed for {domain}: {e}")
|
||||
return None
|
||||
except requests.exceptions.Timeout:
|
||||
logger.error(f"Timeout fetching {domain}")
|
||||
return None
|
||||
except requests.exceptions.HTTPError as e:
|
||||
logger.error(f"HTTP error fetching {domain}: {e}")
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to discover email for {domain}: {e}")
|
||||
return None
|
||||
|
||||
def _validate_email_format(self, email: str) -> bool:
|
||||
"""
|
||||
Validate email address format.
|
||||
|
||||
Args:
|
||||
email: Email address to validate
|
||||
|
||||
Returns:
|
||||
True if valid format, False otherwise
|
||||
"""
|
||||
# Basic RFC 5322 format check
|
||||
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
|
||||
if not re.match(email_regex, email):
|
||||
return False
|
||||
|
||||
# Length check (RFC 5321)
|
||||
if len(email) > 254:
|
||||
return False
|
||||
|
||||
# Must have exactly one @
|
||||
if email.count('@') != 1:
|
||||
return False
|
||||
|
||||
return True
|
||||
```
|
||||
|
||||
### Error Messages
|
||||
|
||||
```python
|
||||
# DNS TXT found, but no rel="me" link
|
||||
error_message = """
|
||||
Domain verified via DNS, but no email found on your site.
|
||||
|
||||
Please add this to your homepage:
|
||||
<link rel="me" href="mailto:your-email@example.com">
|
||||
|
||||
This allows us to discover your email address automatically.
|
||||
|
||||
Learn more: https://indieweb.org/rel-me
|
||||
"""
|
||||
|
||||
# Site unreachable
|
||||
error_message = """
|
||||
Could not fetch your site at https://{domain}
|
||||
|
||||
Please check:
|
||||
- Site is accessible via HTTPS
|
||||
- SSL certificate is valid
|
||||
- No firewall blocking requests
|
||||
|
||||
Try again once your site is accessible.
|
||||
"""
|
||||
|
||||
# Invalid email format in rel="me"
|
||||
error_message = """
|
||||
Found rel="me" link, but email format is invalid: {email}
|
||||
|
||||
Please check your rel="me" link uses valid email format:
|
||||
<link rel="me" href="mailto:valid-email@example.com">
|
||||
"""
|
||||
```
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Alternative 1: User-Provided Email Input
|
||||
|
||||
**Pros**:
|
||||
- Simpler implementation (no HTTP fetch, no parsing)
|
||||
- Works even if site is down
|
||||
- User can use private email (not public)
|
||||
- Immediate (no HTTP round-trip)
|
||||
|
||||
**Cons**:
|
||||
- Social engineering risk (attacker tricks user into entering wrong email)
|
||||
- Typo risk (user enters incorrect email)
|
||||
- No self-attestation (email not on user's site)
|
||||
- Not aligned with IndieWeb standards
|
||||
|
||||
**Rejected**: Security risks outweigh simplicity benefits. rel="me" provides self-attestation and prevents social engineering.
|
||||
|
||||
---
|
||||
|
||||
### Alternative 2: DNS TXT Record for Email
|
||||
|
||||
**Pros**:
|
||||
- Stronger proof of domain control (DNS)
|
||||
- No website dependency
|
||||
- Machine-readable format
|
||||
- Fast lookups (DNS cache)
|
||||
|
||||
**Cons**:
|
||||
- Requires DNS configuration (more complex than HTML)
|
||||
- DNS propagation delays (can be hours)
|
||||
- Not user-friendly for non-technical users
|
||||
- Not standard IndieWeb practice
|
||||
|
||||
**Rejected**: DNS configuration is more complex than adding HTML tag. rel="me" is more aligned with IndieWeb standards.
|
||||
|
||||
---
|
||||
|
||||
### Alternative 3: WebFinger Protocol
|
||||
|
||||
**Pros**:
|
||||
- Standard protocol (RFC 7033)
|
||||
- Machine-readable format (JSON)
|
||||
- Supports multiple identities
|
||||
- Well-defined spec
|
||||
|
||||
**Cons**:
|
||||
- Requires server-side endpoint (not for static sites)
|
||||
- More complex implementation
|
||||
- Not common in IndieWeb ecosystem
|
||||
- Overkill for email discovery
|
||||
|
||||
**Rejected**: Too complex for v1.0.0 MVP. Doesn't work with static sites. rel="me" is simpler and more aligned with IndieWeb.
|
||||
|
||||
---
|
||||
|
||||
### Alternative 4: Well-Known URI
|
||||
|
||||
**Pros**:
|
||||
- Standard approach (`/.well-known/email`)
|
||||
- Simple file-based implementation
|
||||
- No HTML parsing required
|
||||
- Fast lookups
|
||||
|
||||
**Cons**:
|
||||
- Not an established standard for email
|
||||
- Requires server configuration
|
||||
- Not aligned with IndieWeb practices
|
||||
- Duplicate effort (rel="me" already exists)
|
||||
|
||||
**Rejected**: Not standard practice. rel="me" is already established in IndieWeb ecosystem.
|
||||
|
||||
## References
|
||||
|
||||
- IndieWeb rel="me": https://indieweb.org/rel-me
|
||||
- Example Implementation: https://thesatelliteoflove.com (Phil Skents' identity page)
|
||||
- HTML Link Relations (W3C): https://www.w3.org/TR/html5/links.html#linkTypes
|
||||
- BeautifulSoup Documentation: https://www.crummy.com/software/BeautifulSoup/
|
||||
- RFC 5322 (Email Format): https://datatracker.ietf.org/doc/html/rfc5322
|
||||
- RFC 5321 (SMTP): https://datatracker.ietf.org/doc/html/rfc5321
|
||||
- WebFinger (RFC 7033): https://datatracker.ietf.org/doc/html/rfc7033 (alternative considered)
|
||||
|
||||
## Decision History
|
||||
|
||||
- 2025-11-20: Proposed (Architect)
|
||||
- 2025-11-20: Accepted (Architect)
|
||||
- Related to ADR-005 (Two-Factor Domain Verification)
|
||||
Reference in New Issue
Block a user