Files
Gondulf/docs/decisions/ADR-008-rel-me-email-discovery.md
Phil Skentelbery 6f06aebf40 docs: add Phase 2 domain verification design and clarifications
Add comprehensive Phase 2 documentation:
- Complete design document for two-factor domain verification
- Implementation guide with code examples
- ADR for implementation decisions (ADR-0004)
- ADR for rel="me" email discovery (ADR-008)
- Phase 1 impact assessment
- All 23 clarification questions answered
- Updated architecture docs (indieauth-protocol, security)
- Updated ADR-005 with rel="me" approach
- Updated backlog with technical debt items

Design ready for Phase 2 implementation.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 13:05:09 -07:00

15 KiB

ADR-008: rel="me" Email Discovery Pattern

Date: 2025-11-20

Status

Accepted

Context

Gondulf's authentication flow requires email verification as part of two-factor domain verification (see ADR-005). This raises the question: How do we obtain the user's email address?

Email Acquisition Methods Evaluated

1. User-Provided Email Input

  • User manually enters their email address
  • Server validates email domain matches identity domain
  • Simple UX pattern (familiar from many sites)

2. DNS TXT Record

  • Email address stored in DNS: _email.example.com TXT user@example.com
  • Server queries DNS to discover email
  • Requires DNS configuration

3. rel="me" Link Discovery (IndieWeb Standard)

  • User publishes email on their site: <link rel="me" href="mailto:user@example.com">
  • Server fetches site and parses HTML for rel="me" links
  • Follows IndieWeb standards for identity claims

4. WebFinger Protocol

  • Server queries /.well-known/webfinger?resource={domain}
  • Standard protocol for identity discovery
  • Requires additional endpoint implementation

Requirements

From the user requirement and IndieAuth ecosystem:

  • Security: Prevent social engineering and email spoofing
  • Simplicity: Keep v1.0.0 implementation straightforward
  • Standards: Align with IndieWeb/IndieAuth community practices
  • Self-Documenting: Users should understand what they're publishing

IndieWeb Context

The IndieWeb community uses rel="me" as a standard way to assert identity relationships:

  • Users publish rel="me" links on their homepage to various profiles (GitHub, Twitter, email, etc.)
  • Other tools can discover these relationships by parsing the page
  • Well-established pattern in the IndieWeb ecosystem
  • Reference implementation: https://thesatelliteoflove.com

Decision

Gondulf v1.0.0 will discover email addresses from rel="me" links published on the user's homepage, following the IndieWeb standard.

Implementation Approach

  1. Fetch User's Homepage

    • When user initiates authentication with domain (e.g., https://example.com)
    • Server fetches the homepage over HTTPS
    • Timeout: 10 seconds
    • Follow redirects (max 5)
    • Verify SSL certificate
  2. Parse HTML for rel="me" Links

    • Use BeautifulSoup for robust HTML parsing (handles malformed HTML)
    • Search for <link rel="me" href="mailto:..."> tags
    • Also check <a rel="me" href="mailto:..."> tags
    • Extract first matching mailto: link
    • Case-insensitive rel attribute matching
  3. Validate Email Format

    • Basic RFC 5322 format validation
    • Length checks (max 254 characters per RFC 5321)
    • Format: user@domain.tld
  4. Use Discovered Email

    • Send verification code to discovered email
    • Display partially masked email to user: u***@example.com
    • User cannot modify email (discovered automatically)
  5. Error Handling

    • If no rel="me" link found: Display setup instructions
    • If multiple mailto: links: Use first one
    • If site unreachable: Display error with retry option
    • If SSL verification fails: Reject (security)

Example HTML

User adds this to their homepage:

<!DOCTYPE html>
<html>
<head>
    <title>Phil Skents</title>
    <!-- rel="me" link for email -->
    <link rel="me" href="mailto:phil@example.com">

    <!-- Other rel="me" links (optional) -->
    <link rel="me" href="https://github.com/philskents">
    <link rel="me" href="https://twitter.com/philskents">
</head>
<body>
    <h1>Phil Skents</h1>
    <p>This is my personal website.</p>
</body>
</html>

Or visible link:

<a rel="me" href="mailto:phil@example.com">Email me</a>

Rationale

Follows IndieWeb Standards

IndieWeb Alignment:

  • rel="me" is the standard way to assert identity in IndieWeb
  • Users familiar with IndieAuth likely already have rel="me" configured
  • Interoperability with other IndieWeb tools
  • Well-documented pattern: https://indieweb.org/rel-me

Community Expectations:

  • IndieAuth ecosystem uses rel="me" extensively
  • Users understand the pattern
  • Existing tutorials and documentation available
  • Aligns with decentralized identity principles

Security Benefits

Prevents Social Engineering:

  • User cannot claim arbitrary email addresses
  • Email must be published on the user's own site
  • Attacker cannot trick user into entering wrong email
  • Self-attested identity (user declares on their domain)

Reduces Attack Surface:

  • No user input field for email (no typos, no XSS)
  • No email enumeration via guessing
  • Email discovery transparent and auditable
  • User controls what email is published

Transparency:

  • User explicitly publishes email on their site
  • Public declaration of email relationship
  • User aware they're making email public
  • No hidden or implicit email collection

Implementation Simplicity

Standard Libraries:

  • BeautifulSoup: Robust HTML parsing (handles malformed HTML)
  • requests: HTTP client (widely used, well-tested)
  • No custom protocols or complex parsing
  • Python standard library for email validation

Error Handling:

  • Clear error messages with setup instructions
  • Graceful degradation (site unavailable, etc.)
  • Standard HTTP status codes
  • No complex state management

Testing:

  • Easy to mock HTTP responses
  • Straightforward unit tests
  • BeautifulSoup handles edge cases (malformed HTML)
  • No external service dependencies

User Experience

Self-Documenting:

  • User adds one HTML tag to their site
  • Clear relationship between domain and email
  • User understands what they're publishing
  • No hidden configuration

Familiar Pattern:

  • Similar to verifying site ownership (Google Search Console, etc.)
  • Adding meta tags is common web practice
  • Many users already have rel="me" for other purposes
  • Works with static sites (no backend required)

Setup Time:

  • ~1 minute to add link tag
  • No waiting (unlike DNS propagation)
  • Immediate verification possible
  • Can be combined with other rel="me" links

Consequences

Positive Consequences

  1. IndieWeb Standard Compliance:

    • Follows established rel="me" pattern
    • Interoperability with IndieWeb tools
    • Community-vetted approach
    • Well-documented standard
  2. Enhanced Security:

    • No user-provided email input (prevents social engineering)
    • Email explicitly published by user
    • Transparent and auditable
    • Reduces phishing risk
  3. Implementation Simplicity:

    • Standard libraries (BeautifulSoup, requests)
    • No complex protocols
    • Easy to test and maintain
    • Handles malformed HTML gracefully
  4. User Control:

    • User explicitly declares email on their site
    • Can change email by updating HTML
    • No hidden email collection
    • User aware of public email
  5. Flexibility:

    • Works with static sites (no backend needed)
    • Can use any email provider
    • Email can be at different domain (e.g., Gmail)
    • Supports multiple rel="me" links

Negative Consequences

  1. Public Email Requirement:

    • User must publish email publicly on their site
    • Not suitable for users who want private email
    • Email harvesters can discover address
    • Spam risk (mitigated: users can use spam filters)
  2. HTML Parsing Complexity:

    • Must handle various HTML formats
    • Malformed HTML can cause issues (mitigated: BeautifulSoup)
    • Case sensitivity considerations
    • Multiple possible HTML structures
  3. Website Dependency:

    • User's site must be available during authentication
    • Site downtime blocks authentication
    • No fallback if site unreachable
    • Requires HTTPS (not all sites have valid certificates)
  4. Discovery Failures:

    • User may not have rel="me" configured
    • Link may be in wrong format
    • Email may be invalid format
    • Clear error messages required
  5. Privacy Considerations:

    • Email addresses visible to anyone
    • Cannot use email verification without public disclosure
    • Users must accept public email
    • May deter privacy-conscious users

Mitigation Strategies

For Public Email Concern:

  • Document clearly that email will be public
  • Suggest using dedicated email for IndieAuth
  • Recommend spam filtering
  • Note: Email is user's choice (they publish it)

For HTML Parsing:

from bs4 import BeautifulSoup

# BeautifulSoup handles malformed HTML gracefully
soup = BeautifulSoup(html_content, 'html.parser')

# Case-insensitive attribute matching
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')

# Multiple link formats supported
# <link rel="me" href="mailto:user@example.com">
# <a rel="me" href="mailto:user@example.com">Email</a>

For Website Dependency:

  • Clear error messages with retry option
  • Suggest checking site availability
  • Timeout limits (10 seconds)
  • Log errors for debugging

For Discovery Failures:

Error: No rel="me" email link found

Please add this to your homepage:
<link rel="me" href="mailto:your-email@example.com">

See: https://indieweb.org/rel-me for more information.

Implementation

Email Discovery Service

from bs4 import BeautifulSoup
import requests
from typing import Optional
import re

class RelMeEmailDiscovery:
    """
    Discover email addresses from rel="me" links on user's homepage.
    """

    def discover_email(self, domain: str) -> Optional[str]:
        """
        Fetch domain homepage and discover email from rel="me" link.

        Args:
            domain: User's domain (e.g., "example.com")

        Returns:
            Email address or None if not found
        """
        url = f"https://{domain}"

        try:
            # Fetch homepage with safety limits
            response = requests.get(
                url,
                timeout=10,
                allow_redirects=True,
                max_redirects=5,
                verify=True  # Verify SSL certificate
            )
            response.raise_for_status()

            # Parse HTML (handles malformed HTML)
            soup = BeautifulSoup(response.content, 'html.parser')

            # Find all rel="me" links
            # Both <link> and <a> tags supported
            me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')

            # Look for mailto: links
            for link in me_links:
                href = link.get('href', '')
                if href.startswith('mailto:'):
                    email = href.replace('mailto:', '').strip()

                    # Validate email format
                    if self._validate_email_format(email):
                        logger.info(f"Discovered email via rel='me' for {domain}")
                        return email

            logger.warning(f"No rel='me' mailto: link found on {domain}")
            return None

        except requests.exceptions.SSLError as e:
            logger.error(f"SSL verification failed for {domain}: {e}")
            return None
        except requests.exceptions.Timeout:
            logger.error(f"Timeout fetching {domain}")
            return None
        except requests.exceptions.HTTPError as e:
            logger.error(f"HTTP error fetching {domain}: {e}")
            return None
        except Exception as e:
            logger.error(f"Failed to discover email for {domain}: {e}")
            return None

    def _validate_email_format(self, email: str) -> bool:
        """
        Validate email address format.

        Args:
            email: Email address to validate

        Returns:
            True if valid format, False otherwise
        """
        # Basic RFC 5322 format check
        email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
        if not re.match(email_regex, email):
            return False

        # Length check (RFC 5321)
        if len(email) > 254:
            return False

        # Must have exactly one @
        if email.count('@') != 1:
            return False

        return True

Error Messages

# DNS TXT found, but no rel="me" link
error_message = """
Domain verified via DNS, but no email found on your site.

Please add this to your homepage:
<link rel="me" href="mailto:your-email@example.com">

This allows us to discover your email address automatically.

Learn more: https://indieweb.org/rel-me
"""

# Site unreachable
error_message = """
Could not fetch your site at https://{domain}

Please check:
- Site is accessible via HTTPS
- SSL certificate is valid
- No firewall blocking requests

Try again once your site is accessible.
"""

# Invalid email format in rel="me"
error_message = """
Found rel="me" link, but email format is invalid: {email}

Please check your rel="me" link uses valid email format:
<link rel="me" href="mailto:valid-email@example.com">
"""

Alternatives Considered

Alternative 1: User-Provided Email Input

Pros:

  • Simpler implementation (no HTTP fetch, no parsing)
  • Works even if site is down
  • User can use private email (not public)
  • Immediate (no HTTP round-trip)

Cons:

  • Social engineering risk (attacker tricks user into entering wrong email)
  • Typo risk (user enters incorrect email)
  • No self-attestation (email not on user's site)
  • Not aligned with IndieWeb standards

Rejected: Security risks outweigh simplicity benefits. rel="me" provides self-attestation and prevents social engineering.


Alternative 2: DNS TXT Record for Email

Pros:

  • Stronger proof of domain control (DNS)
  • No website dependency
  • Machine-readable format
  • Fast lookups (DNS cache)

Cons:

  • Requires DNS configuration (more complex than HTML)
  • DNS propagation delays (can be hours)
  • Not user-friendly for non-technical users
  • Not standard IndieWeb practice

Rejected: DNS configuration is more complex than adding HTML tag. rel="me" is more aligned with IndieWeb standards.


Alternative 3: WebFinger Protocol

Pros:

  • Standard protocol (RFC 7033)
  • Machine-readable format (JSON)
  • Supports multiple identities
  • Well-defined spec

Cons:

  • Requires server-side endpoint (not for static sites)
  • More complex implementation
  • Not common in IndieWeb ecosystem
  • Overkill for email discovery

Rejected: Too complex for v1.0.0 MVP. Doesn't work with static sites. rel="me" is simpler and more aligned with IndieWeb.


Alternative 4: Well-Known URI

Pros:

  • Standard approach (/.well-known/email)
  • Simple file-based implementation
  • No HTML parsing required
  • Fast lookups

Cons:

  • Not an established standard for email
  • Requires server configuration
  • Not aligned with IndieWeb practices
  • Duplicate effort (rel="me" already exists)

Rejected: Not standard practice. rel="me" is already established in IndieWeb ecosystem.

References

Decision History

  • 2025-11-20: Proposed (Architect)
  • 2025-11-20: Accepted (Architect)
  • Related to ADR-005 (Two-Factor Domain Verification)