feat(core): implement Phase 1 foundation infrastructure

Implements Phase 1 Foundation with all core services:

Core Components:
- Configuration management with GONDULF_ environment variables
- Database layer with SQLAlchemy and migration system
- In-memory code storage with TTL support
- Email service with SMTP and TLS support (STARTTLS + implicit TLS)
- DNS service with TXT record verification
- Structured logging with Python standard logging
- FastAPI application with health check endpoint

Database Schema:
- authorization_codes table for OAuth 2.0 authorization codes
- domains table for domain verification
- migrations table for tracking schema versions
- Simple sequential migration system (001_initial_schema.sql)

Configuration:
- Environment-based configuration with validation
- .env.example template with all GONDULF_ variables
- Fail-fast validation on startup
- Sensible defaults for optional settings

Testing:
- 96 comprehensive tests (77 unit, 5 integration)
- 94.16% code coverage (exceeds 80% requirement)
- All tests passing
- Test coverage includes:
  - Configuration loading and validation
  - Database migrations and health checks
  - In-memory storage with expiration
  - Email service (STARTTLS, implicit TLS, authentication)
  - DNS service (TXT records, domain verification)
  - Health check endpoint integration

Documentation:
- Implementation report with test results
- Phase 1 clarifications document
- ADRs for key decisions (config, database, email, logging)

Technical Details:
- Python 3.10+ with type hints
- SQLite with configurable database URL
- System DNS with public DNS fallback
- Port-based TLS detection (465=SSL, 587=STARTTLS)
- Lazy configuration loading for testability

Exit Criteria Met:
✓ All foundation services implemented
✓ Application starts without errors
✓ Health check endpoint operational
✓ Database migrations working
✓ Test coverage exceeds 80%
✓ All tests passing

Ready for Architect review and Phase 2 development.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-11-20 12:21:42 -07:00
parent 7255867fde
commit bebd47955f
39 changed files with 8134 additions and 13 deletions

View File

@@ -0,0 +1,356 @@
# System Architecture Overview
## Project Context
Gondulf is a self-hosted IndieAuth server implementing the W3C IndieAuth specification. It enables users to use their own domain as their identity when authenticating to third-party applications, providing a decentralized alternative to centralized authentication providers.
### Key Differentiators
- **Email-based authentication**: v1.0.0 uses email verification for domain ownership
- **No client pre-registration**: Clients validate themselves through domain ownership verification
- **Simplicity-first**: Minimal complexity, production-ready MVP
- **Single-admin model**: Designed for individual operators, not multi-tenancy
## Technology Stack
### Core Platform
- **Language**: Python 3.10+
- **Web Framework**: FastAPI 0.104+
- Chosen for: Native async/await, type hints, OAuth 2.0 support, automatic OpenAPI docs
- See: `/docs/decisions/ADR-001-python-framework-selection.md`
- **ASGI Server**: uvicorn with standard extras
- **Data Validation**: Pydantic 2.0+ (bundled with FastAPI)
### Data Storage
- **Primary Database**: SQLite 3.35+
- Sufficient for 10s of users
- Simple file-based backups
- No separate database server required
- **Database Interface**: SQLAlchemy Core (NOT ORM)
- Direct SQL-like interface without ORM complexity
- Explicit queries, no hidden behavior
- Simple schema management
### Session/State Storage (v1.0.0)
- **In-Memory Storage**: Python dictionaries with TTL management
- **Rationale**:
- No Redis in v1.0.0 per user requirements
- Authorization codes are short-lived (10 minutes max)
- Single-process deployment acceptable for MVP
- Upgrade path: Can add Redis later without code changes if persistence needed
### Development Environment
- **Package Manager**: uv (Astral Rust-based tool)
- See: `/docs/decisions/ADR-002-uv-environment-management.md`
- Direct execution model (no environment activation)
- **Linting**: Ruff + flake8
- **Type Checking**: mypy (strict mode)
- **Formatting**: Black (88 character line length)
- **Testing**: pytest with async, coverage, mocking
## System Architecture
### Component Diagram
```
┌─────────────────────────────────────────────────────────────────┐
│ Client Application │
│ (Third-party IndieAuth client) │
└───────────────────────────┬─────────────────────────────────────┘
│ HTTPS
│ IndieAuth Protocol
┌─────────────────────────────────────────────────────────────────┐
│ Gondulf IndieAuth Server │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ FastAPI Application │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌─────────────────┐ │ │
│ │ │ Authorization │ │ Token │ │ Metadata │ │ │
│ │ │ Endpoint │ │ Endpoint │ │ Endpoint │ │ │
│ │ │ /authorize │ │ /token │ │ /.well-known │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ └────────┬────────┘ │ │
│ │ │ │ │ │ │
│ │ └──────────────────┼────────────────────┘ │ │
│ │ │ │ │
│ │ ┌─────────────────────────▼──────────────────────────────┐ │ │
│ │ │ Business Logic Layer │ │ │
│ │ │ ┌───────────────┐ ┌────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ AuthService │ │TokenService│ │DomainService │ │ │ │
│ │ │ │ - Auth flow │ │ - Token │ │ - Domain │ │ │
│ │ │ │ - Email send │ │ creation │ │ validation │ │ │
│ │ │ │ - Code gen │ │ - Token │ │ - TXT record │ │ │
│ │ │ │ │ │ verify │ │ check │ │ │
│ │ │ └───────────────┘ └────────────┘ └──────────────┘ │ │ │
│ │ └────────────────────────┬───────────────────────────────┘ │ │
│ │ │ │ │
│ │ ┌────────────────────────▼──────────────────────────────┐ │ │
│ │ │ Storage Layer │ │ │
│ │ │ ┌──────────────────┐ ┌────────────────────────┐ │ │ │
│ │ │ │ SQLite Database │ │ In-Memory Store │ │ │ │
│ │ │ │ - Tokens │ │ - Auth codes (10min) │ │ │ │
│ │ │ │ - Domains │ │ - Email codes (15min)│ │ │ │
│ │ │ └──────────────────┘ └────────────────────────┘ │ │ │
│ │ └───────────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────┘ │
└──────────┬──────────────────────────────────────┬───────────────┘
│ SMTP │ DNS
▼ ▼
┌────────────────┐ ┌──────────────────┐
│ Email Server │ │ DNS Provider │
│ (external) │ │ (external) │
└────────────────┘ └──────────────────┘
```
### Component Responsibilities
#### HTTP Endpoints Layer
Handles all HTTP concerns:
- Request validation (Pydantic models)
- Parameter parsing and type coercion
- HTTP response formatting
- Error responses (OAuth 2.0 compliant)
- CORS headers
- Rate limiting (future)
#### Business Logic Layer (Services)
Contains all domain logic, completely independent of HTTP:
**AuthService**:
- Authorization flow orchestration
- Email verification code generation and validation
- Authorization code generation (cryptographically secure)
- User consent management
- PKCE support (future)
**TokenService**:
- Access token generation (JWT or opaque)
- Token validation and introspection
- Token revocation (future)
- Token refresh (future)
**DomainService**:
- Domain ownership validation
- DNS TXT record checking
- Domain normalization
- Security validation (prevent open redirects)
#### Storage Layer
Provides data persistence:
**SQLite Database**:
- Access tokens (long-lived)
- Verified domains
- Audit logs
- Configuration
**In-Memory Store**:
- Authorization codes (TTL: 10 minutes)
- Email verification codes (TTL: 15 minutes)
- Rate limit counters (future)
### Data Flow: Authorization Flow
```
1. Client → /authorize
2. Gondulf validates client_id, redirect_uri, state
3. Gondulf checks domain ownership (TXT record or cached)
4. User enters email address for their domain
5. Gondulf sends verification code to email
6. User enters code
7. Gondulf generates authorization code
8. Gondulf redirects to client with code + state
9. Client → /token with code
10. Gondulf validates code, generates access token
11. Gondulf returns token + me (user's domain)
```
## Deployment Model
### Target Deployment
- **Platform**: Docker container
- **Scale**: 10s of users initially
- **Process Model**: Single uvicorn process (sufficient for MVP)
- **File System**:
- `/data/gondulf.db` - SQLite database
- `/data/backups/` - Database backups
- `/app/` - Application code
### Configuration Management
- **Environment Variables**: All configuration via environment
- **Secrets**: Loaded from environment (SECRET_KEY, SMTP credentials)
- **Config Validation**: Pydantic Settings validates on startup
### Backup Strategy
Simple file-based SQLite backups:
- Daily automated backups of `gondulf.db`
- Backup rotation (keep last 7 days)
- Simple shell script + cron
- Future: S3/object storage support
## Security Architecture
### Authentication Method (v1.0.0)
**Email-based verification only**:
- User provides email address for their domain
- Server sends time-limited verification code
- User enters code to prove email access
- No password storage
- No external identity providers in v1.0.0
### Domain Ownership Validation
**Two-tier validation**:
1. **TXT Record (preferred)**:
- Admin adds TXT record: `_gondulf.example.com` = `verified`
- Server checks DNS before first use
- Result cached in database
- Periodic re-verification (configurable)
2. **Email-based (alternative)**:
- If no TXT record, fall back to email verification
- Email must be at verified domain (e.g., `admin@example.com`)
- Less secure but more accessible for users
### Token Security
- **Generation**: Cryptographically secure random tokens (secrets.token_urlsafe)
- **Storage**: Hashed in database (SHA-256)
- **Transmission**: HTTPS only (enforced in production)
- **Expiration**: Configurable (default 1 hour)
- **Validation**: Constant-time comparison (prevent timing attacks)
### Privacy Principles
**Minimal Data Collection**:
- NEVER store email addresses beyond verification flow
- NEVER log user personal data
- Store only:
- Domain name (user's identity)
- Token hashes (security)
- Timestamps (auditing)
- Client IDs (protocol requirement)
## Operational Architecture
### Logging Strategy
**Structured logging** with appropriate levels:
- **INFO**: Normal operations (auth success, token issued)
- **WARNING**: Suspicious activity (failed validations, rate limit near)
- **ERROR**: Failures requiring investigation (email send failed, DNS timeout)
- **CRITICAL**: System failures (database unavailable, config invalid)
**Log fields**:
- Timestamp (ISO 8601)
- Level
- Event type
- Domain (never email)
- Client ID
- Request ID (correlation)
**Privacy**:
- NEVER log email addresses
- NEVER log full tokens (only first 8 chars for correlation)
- NEVER log user-agent or IP in production (GDPR)
### Monitoring (Future)
- Health check endpoint: `/health`
- Metrics endpoint: `/metrics` (Prometheus format)
- Key metrics:
- Authorization requests/min
- Token generation rate
- Email delivery success rate
- Domain validation cache hit rate
- Error rate by type
## Upgrade Paths
### Future Enhancements (Post v1.0.0)
**Persistence Layer**:
- Add Redis for distributed sessions
- Support PostgreSQL for larger deployments
- No code changes required (SQLAlchemy abstraction)
**Authentication Methods**:
- GitHub/GitLab provider support
- IndieAuth delegation
- WebAuthn for passwordless
- All additive, no breaking changes
**Protocol Features**:
- Token refresh
- Token revocation endpoint
- Scope management (authorization)
- Dynamic client registration
**Operational**:
- Multi-process deployment (gunicorn)
- Horizontal scaling (with Redis)
- Metrics and monitoring
- Admin dashboard
## Constraints and Trade-offs
### Conscious Simplifications (v1.0.0)
1. **No Redis**: In-memory storage acceptable for single-process deployment
- Trade-off: Lose codes on restart (acceptable for 10-minute TTL)
- Upgrade path: Add Redis when scaling needed
2. **No client pre-registration**: Domain-based validation sufficient
- Trade-off: Must validate client_id on every request
- Mitigation: Cache validation results
3. **Email-only authentication**: Simplest secure method
- Trade-off: Requires SMTP configuration
- Upgrade path: Add providers in future releases
4. **SQLite database**: Perfect for small deployments
- Trade-off: No built-in replication
- Upgrade path: Migrate to PostgreSQL when needed
5. **Single process**: No distributed coordination needed
- Trade-off: Limited concurrent capacity
- Upgrade path: Add Redis + gunicorn when scaling
### Non-Negotiable Requirements
1. **W3C IndieAuth compliance**: Full protocol compliance required
2. **Security best practices**: No shortcuts on security
3. **HTTPS in production**: Required for OAuth 2.0 security
4. **Minimal data collection**: Privacy by design
5. **Comprehensive testing**: 80%+ coverage minimum
## Documentation Structure
### For Developers
- `/docs/architecture/` - This directory
- `/docs/designs/` - Feature-specific designs
- `/docs/decisions/` - Architecture Decision Records
### For Operators
- `README.md` - Installation and usage
- `/docs/operations/` - Deployment guides (future)
- Environment variable reference (future)
### For Protocol Compliance
- `/docs/architecture/indieauth-protocol.md` - Protocol implementation
- `/docs/architecture/security.md` - Security model
- Test suite demonstrating compliance
## Next Steps
See `/docs/roadmap/v1.0.0.md` for the MVP feature set and implementation plan.
Key architectural documents to review:
- `/docs/architecture/indieauth-protocol.md` - Protocol design
- `/docs/architecture/security.md` - Security design
- `/docs/roadmap/backlog.md` - Feature prioritization