Add comprehensive Phase 2 documentation: - Complete design document for two-factor domain verification - Implementation guide with code examples - ADR for implementation decisions (ADR-0004) - ADR for rel="me" email discovery (ADR-008) - Phase 1 impact assessment - All 23 clarification questions answered - Updated architecture docs (indieauth-protocol, security) - Updated ADR-005 with rel="me" approach - Updated backlog with technical debt items Design ready for Phase 2 implementation. Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
15 KiB
ADR-008: rel="me" Email Discovery Pattern
Date: 2025-11-20
Status
Accepted
Context
Gondulf's authentication flow requires email verification as part of two-factor domain verification (see ADR-005). This raises the question: How do we obtain the user's email address?
Email Acquisition Methods Evaluated
1. User-Provided Email Input
- User manually enters their email address
- Server validates email domain matches identity domain
- Simple UX pattern (familiar from many sites)
2. DNS TXT Record
- Email address stored in DNS:
_email.example.comTXTuser@example.com - Server queries DNS to discover email
- Requires DNS configuration
3. rel="me" Link Discovery (IndieWeb Standard)
- User publishes email on their site:
<link rel="me" href="mailto:user@example.com"> - Server fetches site and parses HTML for rel="me" links
- Follows IndieWeb standards for identity claims
4. WebFinger Protocol
- Server queries
/.well-known/webfinger?resource={domain} - Standard protocol for identity discovery
- Requires additional endpoint implementation
Requirements
From the user requirement and IndieAuth ecosystem:
- Security: Prevent social engineering and email spoofing
- Simplicity: Keep v1.0.0 implementation straightforward
- Standards: Align with IndieWeb/IndieAuth community practices
- Self-Documenting: Users should understand what they're publishing
IndieWeb Context
The IndieWeb community uses rel="me" as a standard way to assert identity relationships:
- Users publish rel="me" links on their homepage to various profiles (GitHub, Twitter, email, etc.)
- Other tools can discover these relationships by parsing the page
- Well-established pattern in the IndieWeb ecosystem
- Reference implementation: https://thesatelliteoflove.com
Decision
Gondulf v1.0.0 will discover email addresses from rel="me" links published on the user's homepage, following the IndieWeb standard.
Implementation Approach
-
Fetch User's Homepage
- When user initiates authentication with domain (e.g.,
https://example.com) - Server fetches the homepage over HTTPS
- Timeout: 10 seconds
- Follow redirects (max 5)
- Verify SSL certificate
- When user initiates authentication with domain (e.g.,
-
Parse HTML for rel="me" Links
- Use BeautifulSoup for robust HTML parsing (handles malformed HTML)
- Search for
<link rel="me" href="mailto:...">tags - Also check
<a rel="me" href="mailto:...">tags - Extract first matching mailto: link
- Case-insensitive rel attribute matching
-
Validate Email Format
- Basic RFC 5322 format validation
- Length checks (max 254 characters per RFC 5321)
- Format:
user@domain.tld
-
Use Discovered Email
- Send verification code to discovered email
- Display partially masked email to user:
u***@example.com - User cannot modify email (discovered automatically)
-
Error Handling
- If no rel="me" link found: Display setup instructions
- If multiple mailto: links: Use first one
- If site unreachable: Display error with retry option
- If SSL verification fails: Reject (security)
Example HTML
User adds this to their homepage:
<!DOCTYPE html>
<html>
<head>
<title>Phil Skents</title>
<!-- rel="me" link for email -->
<link rel="me" href="mailto:phil@example.com">
<!-- Other rel="me" links (optional) -->
<link rel="me" href="https://github.com/philskents">
<link rel="me" href="https://twitter.com/philskents">
</head>
<body>
<h1>Phil Skents</h1>
<p>This is my personal website.</p>
</body>
</html>
Or visible link:
<a rel="me" href="mailto:phil@example.com">Email me</a>
Rationale
Follows IndieWeb Standards
IndieWeb Alignment:
- rel="me" is the standard way to assert identity in IndieWeb
- Users familiar with IndieAuth likely already have rel="me" configured
- Interoperability with other IndieWeb tools
- Well-documented pattern: https://indieweb.org/rel-me
Community Expectations:
- IndieAuth ecosystem uses rel="me" extensively
- Users understand the pattern
- Existing tutorials and documentation available
- Aligns with decentralized identity principles
Security Benefits
Prevents Social Engineering:
- User cannot claim arbitrary email addresses
- Email must be published on the user's own site
- Attacker cannot trick user into entering wrong email
- Self-attested identity (user declares on their domain)
Reduces Attack Surface:
- No user input field for email (no typos, no XSS)
- No email enumeration via guessing
- Email discovery transparent and auditable
- User controls what email is published
Transparency:
- User explicitly publishes email on their site
- Public declaration of email relationship
- User aware they're making email public
- No hidden or implicit email collection
Implementation Simplicity
Standard Libraries:
- BeautifulSoup: Robust HTML parsing (handles malformed HTML)
- requests: HTTP client (widely used, well-tested)
- No custom protocols or complex parsing
- Python standard library for email validation
Error Handling:
- Clear error messages with setup instructions
- Graceful degradation (site unavailable, etc.)
- Standard HTTP status codes
- No complex state management
Testing:
- Easy to mock HTTP responses
- Straightforward unit tests
- BeautifulSoup handles edge cases (malformed HTML)
- No external service dependencies
User Experience
Self-Documenting:
- User adds one HTML tag to their site
- Clear relationship between domain and email
- User understands what they're publishing
- No hidden configuration
Familiar Pattern:
- Similar to verifying site ownership (Google Search Console, etc.)
- Adding meta tags is common web practice
- Many users already have rel="me" for other purposes
- Works with static sites (no backend required)
Setup Time:
- ~1 minute to add link tag
- No waiting (unlike DNS propagation)
- Immediate verification possible
- Can be combined with other rel="me" links
Consequences
Positive Consequences
-
IndieWeb Standard Compliance:
- Follows established rel="me" pattern
- Interoperability with IndieWeb tools
- Community-vetted approach
- Well-documented standard
-
Enhanced Security:
- No user-provided email input (prevents social engineering)
- Email explicitly published by user
- Transparent and auditable
- Reduces phishing risk
-
Implementation Simplicity:
- Standard libraries (BeautifulSoup, requests)
- No complex protocols
- Easy to test and maintain
- Handles malformed HTML gracefully
-
User Control:
- User explicitly declares email on their site
- Can change email by updating HTML
- No hidden email collection
- User aware of public email
-
Flexibility:
- Works with static sites (no backend needed)
- Can use any email provider
- Email can be at different domain (e.g., Gmail)
- Supports multiple rel="me" links
Negative Consequences
-
Public Email Requirement:
- User must publish email publicly on their site
- Not suitable for users who want private email
- Email harvesters can discover address
- Spam risk (mitigated: users can use spam filters)
-
HTML Parsing Complexity:
- Must handle various HTML formats
- Malformed HTML can cause issues (mitigated: BeautifulSoup)
- Case sensitivity considerations
- Multiple possible HTML structures
-
Website Dependency:
- User's site must be available during authentication
- Site downtime blocks authentication
- No fallback if site unreachable
- Requires HTTPS (not all sites have valid certificates)
-
Discovery Failures:
- User may not have rel="me" configured
- Link may be in wrong format
- Email may be invalid format
- Clear error messages required
-
Privacy Considerations:
- Email addresses visible to anyone
- Cannot use email verification without public disclosure
- Users must accept public email
- May deter privacy-conscious users
Mitigation Strategies
For Public Email Concern:
- Document clearly that email will be public
- Suggest using dedicated email for IndieAuth
- Recommend spam filtering
- Note: Email is user's choice (they publish it)
For HTML Parsing:
from bs4 import BeautifulSoup
# BeautifulSoup handles malformed HTML gracefully
soup = BeautifulSoup(html_content, 'html.parser')
# Case-insensitive attribute matching
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Multiple link formats supported
# <link rel="me" href="mailto:user@example.com">
# <a rel="me" href="mailto:user@example.com">Email</a>
For Website Dependency:
- Clear error messages with retry option
- Suggest checking site availability
- Timeout limits (10 seconds)
- Log errors for debugging
For Discovery Failures:
Error: No rel="me" email link found
Please add this to your homepage:
<link rel="me" href="mailto:your-email@example.com">
See: https://indieweb.org/rel-me for more information.
Implementation
Email Discovery Service
from bs4 import BeautifulSoup
import requests
from typing import Optional
import re
class RelMeEmailDiscovery:
"""
Discover email addresses from rel="me" links on user's homepage.
"""
def discover_email(self, domain: str) -> Optional[str]:
"""
Fetch domain homepage and discover email from rel="me" link.
Args:
domain: User's domain (e.g., "example.com")
Returns:
Email address or None if not found
"""
url = f"https://{domain}"
try:
# Fetch homepage with safety limits
response = requests.get(
url,
timeout=10,
allow_redirects=True,
max_redirects=5,
verify=True # Verify SSL certificate
)
response.raise_for_status()
# Parse HTML (handles malformed HTML)
soup = BeautifulSoup(response.content, 'html.parser')
# Find all rel="me" links
# Both <link> and <a> tags supported
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Look for mailto: links
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Validate email format
if self._validate_email_format(email):
logger.info(f"Discovered email via rel='me' for {domain}")
return email
logger.warning(f"No rel='me' mailto: link found on {domain}")
return None
except requests.exceptions.SSLError as e:
logger.error(f"SSL verification failed for {domain}: {e}")
return None
except requests.exceptions.Timeout:
logger.error(f"Timeout fetching {domain}")
return None
except requests.exceptions.HTTPError as e:
logger.error(f"HTTP error fetching {domain}: {e}")
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain}: {e}")
return None
def _validate_email_format(self, email: str) -> bool:
"""
Validate email address format.
Args:
email: Email address to validate
Returns:
True if valid format, False otherwise
"""
# Basic RFC 5322 format check
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(email_regex, email):
return False
# Length check (RFC 5321)
if len(email) > 254:
return False
# Must have exactly one @
if email.count('@') != 1:
return False
return True
Error Messages
# DNS TXT found, but no rel="me" link
error_message = """
Domain verified via DNS, but no email found on your site.
Please add this to your homepage:
<link rel="me" href="mailto:your-email@example.com">
This allows us to discover your email address automatically.
Learn more: https://indieweb.org/rel-me
"""
# Site unreachable
error_message = """
Could not fetch your site at https://{domain}
Please check:
- Site is accessible via HTTPS
- SSL certificate is valid
- No firewall blocking requests
Try again once your site is accessible.
"""
# Invalid email format in rel="me"
error_message = """
Found rel="me" link, but email format is invalid: {email}
Please check your rel="me" link uses valid email format:
<link rel="me" href="mailto:valid-email@example.com">
"""
Alternatives Considered
Alternative 1: User-Provided Email Input
Pros:
- Simpler implementation (no HTTP fetch, no parsing)
- Works even if site is down
- User can use private email (not public)
- Immediate (no HTTP round-trip)
Cons:
- Social engineering risk (attacker tricks user into entering wrong email)
- Typo risk (user enters incorrect email)
- No self-attestation (email not on user's site)
- Not aligned with IndieWeb standards
Rejected: Security risks outweigh simplicity benefits. rel="me" provides self-attestation and prevents social engineering.
Alternative 2: DNS TXT Record for Email
Pros:
- Stronger proof of domain control (DNS)
- No website dependency
- Machine-readable format
- Fast lookups (DNS cache)
Cons:
- Requires DNS configuration (more complex than HTML)
- DNS propagation delays (can be hours)
- Not user-friendly for non-technical users
- Not standard IndieWeb practice
Rejected: DNS configuration is more complex than adding HTML tag. rel="me" is more aligned with IndieWeb standards.
Alternative 3: WebFinger Protocol
Pros:
- Standard protocol (RFC 7033)
- Machine-readable format (JSON)
- Supports multiple identities
- Well-defined spec
Cons:
- Requires server-side endpoint (not for static sites)
- More complex implementation
- Not common in IndieWeb ecosystem
- Overkill for email discovery
Rejected: Too complex for v1.0.0 MVP. Doesn't work with static sites. rel="me" is simpler and more aligned with IndieWeb.
Alternative 4: Well-Known URI
Pros:
- Standard approach (
/.well-known/email) - Simple file-based implementation
- No HTML parsing required
- Fast lookups
Cons:
- Not an established standard for email
- Requires server configuration
- Not aligned with IndieWeb practices
- Duplicate effort (rel="me" already exists)
Rejected: Not standard practice. rel="me" is already established in IndieWeb ecosystem.
References
- IndieWeb rel="me": https://indieweb.org/rel-me
- Example Implementation: https://thesatelliteoflove.com (Phil Skents' identity page)
- HTML Link Relations (W3C): https://www.w3.org/TR/html5/links.html#linkTypes
- BeautifulSoup Documentation: https://www.crummy.com/software/BeautifulSoup/
- RFC 5322 (Email Format): https://datatracker.ietf.org/doc/html/rfc5322
- RFC 5321 (SMTP): https://datatracker.ietf.org/doc/html/rfc5321
- WebFinger (RFC 7033): https://datatracker.ietf.org/doc/html/rfc7033 (alternative considered)
Decision History
- 2025-11-20: Proposed (Architect)
- 2025-11-20: Accepted (Architect)
- Related to ADR-005 (Two-Factor Domain Verification)