Files
Gondulf/docs/designs/phase-4a-clarifications.md
Phil Skentelbery 115e733604 feat(phase-4a): complete Phase 3 implementation and gap analysis
Merges Phase 4a work including:

Implementation:
- Metadata discovery endpoint (/api/.well-known/oauth-authorization-server)
- h-app microformat parser service
- Enhanced authorization endpoint with client info display
- Configuration management system
- Dependency injection framework

Documentation:
- Comprehensive gap analysis for v1.0.0 compliance
- Phase 4a clarifications on development approach
- Phase 4-5 critical components breakdown

Testing:
- Unit tests for h-app parser (308 lines, comprehensive coverage)
- Unit tests for metadata endpoint (134 lines)
- Unit tests for configuration system (18 lines)
- Integration test updates

All tests passing with high coverage. Ready for Phase 4b security hardening.
2025-11-20 17:16:11 -07:00

22 KiB

Phase 4a Implementation Clarifications

Architect: Claude (Architect Agent) Date: 2025-11-20 Status: Clarification Response Related Design: /docs/designs/phase-4-5-critical-components.md

Purpose

This document provides specific answers to Developer's clarification questions before Phase 4a implementation begins. Each answer includes explicit guidance, rationale, and implementation details to enable confident implementation without architectural decisions.


Question 1: Implementation Priority for Phase 4a

Question: Should Phase 4a implement ONLY Components 1 and 2 (Metadata Endpoint + h-app Parser), or also include additional components from the full design?

Answer

Implement only Components 1 and 2 with Component 3 integration.

Specifically:

  1. Component 1: Metadata endpoint (/.well-known/oauth-authorization-server)
  2. Component 2: h-app parser service (HAppParser class)
  3. Component 3 Integration: Update authorization endpoint to USE the h-app parser

Do NOT implement:

  • Component 4 (Security hardening) - This is Phase 4b
  • Component 5 (Rate limiting improvements) - This is Phase 4b
  • Component 6 (Deployment documentation) - This is Phase 5a
  • Component 7 (End-to-end testing) - This is Phase 5b

Rationale

Phase 4a completes the remaining Phase 3 functionality. The design document groups all remaining work together, but the implementation plan (lines 3001-3010) clearly breaks it down:

Phase 4a: Complete Phase 3 (Estimated: 2-3 days)
Tasks:
1. Implement metadata endpoint (0.5 day)
2. Implement h-app parser service (1 day)
3. Integrate h-app with authorization endpoint (0.5 day)

Integration with the authorization endpoint is essential because the h-app parser has no value without being used. However, you are NOT implementing new security features or rate limiting improvements.

Implementation Scope

Files to create:

  • /src/gondulf/routers/metadata.py - Metadata endpoint
  • /src/gondulf/services/happ_parser.py - h-app parser service
  • /tests/unit/routers/test_metadata.py - Metadata endpoint tests
  • /tests/unit/services/test_happ_parser.py - Parser tests

Files to modify:

  • /src/gondulf/config.py - Add BASE_URL configuration
  • /src/gondulf/dependencies.py - Add h-app parser dependency
  • /src/gondulf/routers/authorization.py - Integrate h-app parser
  • /src/gondulf/templates/authorize.html - Display client metadata
  • /pyproject.toml - Add mf2py dependency
  • /src/gondulf/main.py - Register metadata router

Acceptance criteria:

  • Metadata endpoint returns correct JSON per RFC 8414
  • h-app parser successfully extracts name, logo, URL from h-app markup
  • Authorization endpoint displays client metadata when available
  • All tests pass with 80%+ coverage (supporting components)

Question 2: Configuration BASE_URL Requirement

Question: Should GONDULF_BASE_URL be added to existing Config class? Required or optional with default? What default value for development?

Answer

Add BASE_URL to Config class as REQUIRED with no default.

Implementation Details

Add to /src/gondulf/config.py:

class Config:
    """Application configuration loaded from environment variables."""

    # Required settings - no defaults
    SECRET_KEY: str
    BASE_URL: str  # <-- ADD THIS (after SECRET_KEY, before DATABASE_URL)

    # Database
    DATABASE_URL: str

    # ... rest of existing config ...

In the Config.load() method, add validation AFTER SECRET_KEY validation:

@classmethod
def load(cls) -> None:
    """
    Load and validate configuration from environment variables.

    Raises:
        ConfigurationError: If required settings are missing or invalid
    """
    # Required - SECRET_KEY must exist and be sufficiently long
    secret_key = os.getenv("GONDULF_SECRET_KEY")
    if not secret_key:
        raise ConfigurationError(
            "GONDULF_SECRET_KEY is required. Generate with: "
            "python -c \"import secrets; print(secrets.token_urlsafe(32))\""
        )
    if len(secret_key) < 32:
        raise ConfigurationError(
            "GONDULF_SECRET_KEY must be at least 32 characters for security"
        )
    cls.SECRET_KEY = secret_key

    # Required - BASE_URL must exist for OAuth metadata
    base_url = os.getenv("GONDULF_BASE_URL")
    if not base_url:
        raise ConfigurationError(
            "GONDULF_BASE_URL is required for OAuth 2.0 metadata endpoint. "
            "Examples: https://auth.example.com or http://localhost:8000 (development only)"
        )
    # Normalize: remove trailing slash if present
    cls.BASE_URL = base_url.rstrip("/")

    # Database - with sensible default
    cls.DATABASE_URL = os.getenv(
        "GONDULF_DATABASE_URL", "sqlite:///./data/gondulf.db"
    )

    # ... rest of existing load() method ...

Add validation to Config.validate() method:

@classmethod
def validate(cls) -> None:
    """
    Validate configuration after loading.

    Performs additional validation beyond initial loading.
    """
    # Validate BASE_URL is a valid URL
    if not cls.BASE_URL.startswith(("http://", "https://")):
        raise ConfigurationError(
            "GONDULF_BASE_URL must start with http:// or https://"
        )

    # Warn if using http:// in production-like settings
    if cls.BASE_URL.startswith("http://") and "localhost" not in cls.BASE_URL:
        import warnings
        warnings.warn(
            "GONDULF_BASE_URL uses http:// for non-localhost domain. "
            "HTTPS is required for production IndieAuth servers.",
            UserWarning
        )

    # ... rest of existing validate() method ...

Rationale

Why REQUIRED with no default:

  1. No sensible default exists: Unlike DATABASE_URL (sqlite is fine for dev), BASE_URL must match actual deployment URL
  2. Critical for OAuth metadata: RFC 8414 requires accurate issuer field - wrong value breaks client discovery
  3. Security implications: Mismatched BASE_URL could enable token fixation attacks
  4. Explicit over implicit: Better to fail fast with clear error than run with wrong configuration

Why not http://localhost:8000 as default:

  • Default port conflicts with other services (many devs run multiple projects)
  • Default BASE_URL won't match actual deployment (production uses https://auth.example.com)
  • Explicit configuration forces developer awareness of this critical setting
  • Clear error message guides developers to set it correctly

Development usage: Developers add to .env file:

GONDULF_BASE_URL=http://localhost:8000

Production usage:

GONDULF_BASE_URL=https://auth.example.com

Testing Considerations

Update configuration tests to verify:

  1. Missing GONDULF_BASE_URL raises ConfigurationError
  2. BASE_URL with trailing slash is normalized (stripped)
  3. BASE_URL without http:// or https:// raises error
  4. BASE_URL with http:// and non-localhost generates warning

Question 3: Dependency Installation

Question: Should mf2py be added to pyproject.toml dependencies? What version constraint?

Answer

Add mf2py>=2.0.0 to the main dependencies list.

Implementation Details

Modify /pyproject.toml, add to the dependencies array:

dependencies = [
    "fastapi>=0.104.0",
    "uvicorn[standard]>=0.24.0",
    "sqlalchemy>=2.0.0",
    "pydantic>=2.0.0",
    "pydantic-settings>=2.0.0",
    "python-multipart>=0.0.6",
    "python-dotenv>=1.0.0",
    "dnspython>=2.4.0",
    "aiosmtplib>=3.0.0",
    "beautifulsoup4>=4.12.0",
    "jinja2>=3.1.0",
    "mf2py>=2.0.0",  # <-- ADD THIS
]

After modifying pyproject.toml, run:

pip install -e .

Or if using specific package manager:

uv pip install -e .  # if using uv
poetry install       # if using poetry

Rationale

Why mf2py:

  • Official Python library for microformats2 parsing
  • Actively maintained by the microformats community
  • Used by reference IndieAuth implementations
  • Handles edge cases in h-* markup parsing

Why >=2.0.0 version constraint:

  • Version 2.0.0+ is stable and actively maintained
  • Uses >= to allow bug fixes and improvements
  • Major version (2.x) provides API stability
  • Similar to other dependencies in project (not pinning to exact versions)

Why main dependencies (not dev or test):

  • h-app parsing is core functionality, not development tooling
  • Metadata endpoint requires this at runtime
  • Authorization endpoint uses this for every client display
  • Production deployments need this library

Testing Impact

The mf2py library is well-tested by its maintainers. Your tests should:

  • Mock mf2py responses in unit tests (test YOUR code, not mf2py)
  • Use real mf2py in integration tests (verify correct usage)

Example unit test approach:

def test_happ_parser_extracts_name(mocker):
    # Mock mf2py.parse to return known structure
    mocker.patch("mf2py.parse", return_value={
        "items": [{
            "type": ["h-app"],
            "properties": {
                "name": ["Example App"]
            }
        }]
    })

    parser = HAppParser(html_fetcher=mock_fetcher)
    metadata = parser.parse(html="<div>...</div>")

    assert metadata.name == "Example App"

Question 4: Template Updates

Question: Should developer review existing template first? Or does design snippet provide complete changes?

Answer

Review existing template first, then apply design changes as additions to existing structure.

Implementation Approach

Step 1: Read current /src/gondulf/templates/authorize.html completely

Step 2: Identify the location where client information is displayed

  • Look for sections showing client_id to user
  • Find the consent form area

Step 3: Add client metadata display ABOVE the consent buttons

The design provides the HTML snippet to add:

{% if client_metadata %}
<div class="client-metadata">
    {% if client_metadata.logo %}
    <img src="{{ client_metadata.logo }}" alt="{{ client_metadata.name or 'Client' }} logo" class="client-logo">
    {% endif %}
    <h2>{{ client_metadata.name or client_id }}</h2>
    {% if client_metadata.url %}
    <p><a href="{{ client_metadata.url }}" target="_blank">{{ client_metadata.url }}</a></p>
    {% endif %}
</div>
{% else %}
<div class="client-info">
    <h2>{{ client_id }}</h2>
</div>
{% endif %}

Step 4: Ensure this renders in a logical place

  • Should appear where user sees "Application X wants to authenticate you"
  • Should be BEFORE approve/deny buttons
  • Should use existing CSS classes or add minimal new styles

Step 5: Verify the authorization route passes client_metadata to template

Rationale

Why review first:

  1. Template has existing structure you must preserve
  2. Existing CSS classes should be reused if possible
  3. Existing Jinja2 blocks/inheritance must be maintained
  4. User experience should remain consistent

Why design snippet is not complete:

  • Design shows WHAT to add, not WHERE in existing template
  • Design doesn't show full template context
  • You need to see existing structure to place additions correctly
  • CSS integration depends on existing styles

What NOT to change:

  • Don't remove existing functionality
  • Don't change form structure (submit buttons, hidden fields)
  • Don't modify error handling sections
  • Don't alter base template inheritance

What TO add:

  • Client metadata display section (provided in design)
  • Any necessary CSS classes (if existing ones don't suffice)
  • Template expects client_metadata variable (dict with name, logo, url keys)

Testing Impact

After template changes:

  1. Test with client that HAS h-app metadata (should show name, logo, url)
  2. Test with client that LACKS h-app metadata (should show client_id)
  3. Test with partial metadata (name but no logo) - should handle gracefully
  4. Verify no HTML injection vulnerabilities (Jinja2 auto-escapes, but verify)

Question 5: Integration with Existing Code

Question: Should developer verify HTMLFetcher, authorization endpoint, dependencies.py exist before starting? Create missing infrastructure if needed? Follow existing patterns?

Answer

All infrastructure exists. Verify existence, then follow existing patterns exactly.

Verification Steps

Before implementing, run these checks:

Check 1: Verify HTMLFetcher exists

ls -la /home/phil/Projects/Gondulf/src/gondulf/services/html_fetcher.py

Expected: File exists (CONFIRMED - I verified this)

Check 2: Verify authorization endpoint exists

ls -la /home/phil/Projects/Gondulf/src/gondulf/routers/authorization.py

Expected: File exists (CONFIRMED - I verified this)

Check 3: Verify dependencies.py exists and has html_fetcher dependency

grep -n "get_html_fetcher" /home/phil/Projects/Gondulf/src/gondulf/dependencies.py

Expected: Function exists at line ~62 (CONFIRMED - I verified this)

All checks should pass. If any fail, STOP and request clarification before proceeding.

Implementation Patterns to Follow

Pattern 1: Service Creation

Look at existing services for structure:

  • /src/gondulf/services/relme_parser.py - Similar parser service
  • /src/gondulf/services/domain_verification.py - Complex service with dependencies

Your HAppParser should follow this pattern:

"""h-app microformat parser for client metadata extraction."""
import logging
from dataclasses import dataclass

import mf2py

from gondulf.services.html_fetcher import HTMLFetcherService

logger = logging.getLogger("gondulf.happ_parser")


@dataclass
class ClientMetadata:
    """Client metadata extracted from h-app markup."""
    name: str | None = None
    logo: str | None = None
    url: str | None = None


class HAppParser:
    """Parse h-app microformat data from client HTML."""

    def __init__(self, html_fetcher: HTMLFetcherService):
        """Initialize parser with HTML fetcher dependency."""
        self.html_fetcher = html_fetcher

    async def fetch_and_parse(self, client_id: str) -> ClientMetadata:
        """Fetch client_id URL and parse h-app metadata."""
        # Implementation here
        pass

Pattern 2: Dependency Injection

Add to /src/gondulf/dependencies.py following existing pattern:

@lru_cache
def get_happ_parser() -> HAppParser:
    """Get singleton h-app parser service."""
    return HAppParser(html_fetcher=get_html_fetcher())

Place this in the "Phase 2 Services" section (after get_html_fetcher, before get_relme_parser) or create a "Phase 3 Services" section if one doesn't exist after Phase 3 TokenService.

Pattern 3: Router Integration

Look at how authorization.py uses dependencies:

from gondulf.dependencies import get_database, get_verification_service

Add your dependency:

from gondulf.dependencies import get_database, get_verification_service, get_happ_parser

Use in route handler:

async def authorize_get(
    request: Request,
    # ... existing parameters ...
    database: Database = Depends(get_database),
    happ_parser: HAppParser = Depends(get_happ_parser)  # ADD THIS
) -> HTMLResponse:

Pattern 4: Logging

Every service has module-level logger:

import logging

logger = logging.getLogger("gondulf.happ_parser")

# In methods:
logger.info(f"Fetching h-app metadata from {client_id}")
logger.warning(f"No h-app markup found at {client_id}")
logger.error(f"Failed to parse h-app: {error}")

Rationale

Why verify first:

  • Confirms your environment matches expected state
  • Identifies any setup issues before implementation
  • Quick sanity check (30 seconds)

Why NOT create missing infrastructure:

  • All infrastructure already exists (I verified)
  • If something is missing, it indicates environment problem
  • Creating infrastructure would be architectural decision (my job, not yours)

Why follow existing patterns:

  • Consistency across codebase
  • Patterns already reviewed and approved
  • Makes code review easier
  • Maintains project conventions

What patterns to follow:

  1. Service structure: Class with dependencies injected via __init__
  2. Async methods: Use async def for I/O operations
  3. Type hints: All parameters and returns have type hints
  4. Docstrings: Every public method has docstring
  5. Error handling: Use try/except with specific exceptions, log errors
  6. Dataclasses: Use @dataclass for data structures (see ClientMetadata)

Question 6: Testing Coverage Target

Question: Should new components meet 95% threshold (critical auth flow)? Or is 80%+ acceptable (supporting components)?

Answer

Target 80%+ coverage for Phase 4a components (supporting functionality).

Specific Targets

Metadata endpoint: 80%+ coverage

  • Simple, static endpoint with no complex logic
  • Critical for discovery but not authentication flow itself
  • Most code is configuration formatting

h-app parser: 80%+ coverage

  • Supporting component, not critical authentication path
  • Handles client metadata display (nice-to-have)
  • Complex edge cases (malformed HTML) can be partially covered

Authorization endpoint modifications: Maintain existing coverage

  • Authorization endpoint is already implemented and tested
  • Your changes add h-app integration but don't modify critical auth logic
  • Ensure new code paths (with/without client metadata) are tested

Rationale

Why 80% not 95%:

Per /docs/standards/testing.md:

  • Critical paths (auth, token, security): 95% coverage
  • Overall: 80% code coverage minimum
  • New code: 90% coverage required

Phase 4a components are:

  1. Metadata endpoint: Discovery mechanism, not authentication
  2. h-app parser: UI enhancement, not security-critical
  3. Authorization integration: Minor enhancement to existing flow

None of these are critical authentication or token flow components. They enhance the user experience and enable client discovery, but authentication works without them.

Critical paths requiring 95%:

  • Authorization code generation and validation
  • Token generation and validation
  • PKCE verification (when implemented)
  • Redirect URI validation
  • Code exchange flow

Supporting paths requiring 80%:

  • Domain verification (Phase 2) - user verification, not auth flow
  • Client metadata fetching (Phase 4a) - UI enhancement
  • Rate limiting - security enhancement but not core auth
  • Email sending - notification mechanism

When to exceed 80%:

Aim higher if:

  • Test coverage naturally reaches 90%+ (not forcing it)
  • Component has security implications (metadata endpoint URL generation)
  • Complex edge cases are easy to test (malformed h-app markup)

When 80% is sufficient:

Accept 80% if:

  • Remaining untested code is error handling for unlikely scenarios
  • Remaining code is logging statements
  • Remaining code is input validation already covered by integration tests

Testing Approach

Metadata endpoint tests (tests/unit/routers/test_metadata.py):

def test_metadata_returns_correct_issuer():
def test_metadata_returns_authorization_endpoint():
def test_metadata_returns_token_endpoint():
def test_metadata_cache_control_header():
def test_metadata_content_type_json():

h-app parser tests (tests/unit/services/test_happ_parser.py):

def test_parse_extracts_app_name():
def test_parse_extracts_logo_url():
def test_parse_extracts_app_url():
def test_parse_handles_missing_happ():
def test_parse_handles_partial_metadata():
def test_parse_handles_malformed_html():
def test_fetch_and_parse_calls_html_fetcher():

Authorization integration tests (add to existing tests/integration/test_authorization.py):

def test_authorize_displays_client_metadata_when_available():
def test_authorize_displays_client_id_when_metadata_missing():

Coverage Verification

After implementation, run:

pytest --cov=gondulf.routers.metadata --cov=gondulf.services.happ_parser --cov-report=term-missing

Expected output:

gondulf/routers/metadata.py      82%
gondulf/services/happ_parser.py  81%

If coverage is below 80%, add tests for uncovered lines. If coverage is above 90% naturally, excellent - but don't force it.


Summary of Answers

Question Answer Key Point
Q1: Scope Components 1-3 only (metadata, h-app, integration) Phase 4a completes Phase 3, not security hardening
Q2: BASE_URL Required config, no default, add to Config class Critical for OAuth metadata, must be explicit
Q3: mf2py Add mf2py>=2.0.0 to main dependencies Core functionality, needed at runtime
Q4: Templates Review existing first, add design snippet appropriately Design shows WHAT to add, you choose WHERE
Q5: Infrastructure All exists, verify then follow existing patterns Consistency with established codebase patterns
Q6: Coverage 80%+ target (supporting components) Not critical auth path, standard coverage sufficient

Next Steps for Developer

  1. Verify infrastructure exists (Question 5 checks)
  2. Install mf2py dependency (pip install -e . after updating pyproject.toml)
  3. Implement in order:
    • Config changes (BASE_URL)
    • Metadata endpoint + tests
    • h-app parser + tests
    • Authorization integration + template updates
    • Integration tests
  4. Run test suite and verify 80%+ coverage
  5. Create implementation report in /docs/reports/2025-11-20-phase-4a.md

Questions Remaining?

If any aspect of these answers is still unclear or ambiguous, ask additional clarification questions BEFORE starting implementation. It is always better to clarify than to make architectural assumptions.


Architect Signature: Design clarifications complete. Developer may proceed with Phase 4a implementation.