Create developer-qa.md with architect's answers to all 20 implementation questions from the developer's design review. This is the proper format for Q&A between developer and architect during design review, not an ADR (which is for architectural decisions with lasting impact). Content includes: - 6 critical questions with answers (config, db pool, logging, etc.) - 8 important questions (session migration, Unicode, health checks) - 6 nice-to-have clarifications (testing, monitoring, dashboard) - Implementation phases (3 weeks) - Integration guidance Developer now has clear guidance to proceed with v1.1.1 implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
19 KiB
StarPunk v1.1.1 "Polish" - Developer Q&A
Date: 2025-11-25 Developer: Developer Agent Architect: Architect Agent
This document contains the Q&A session between the developer and architect during v1.1.1 design review.
Purpose
The developer reviewed all v1.1.1 design documentation and prepared questions about implementation details, integration points, and edge cases. This document contains the architect's answers to guide implementation.
Critical Questions (Must be answered before implementation)
Q1: Configuration System Integration
Developer Question: The design calls for centralized configuration. I see we have config.py at the root for Flask app config. Should the new starpunk/config.py module replace this, wrap it, or co-exist as a separate configuration layer? How do we avoid breaking existing code that directly imports from config?
Architect Answer: Keep both files with clear separation of concerns. The existing config.py remains for Flask app configuration, while the new starpunk/config.py becomes a configuration helper module that wraps Flask's app.config for runtime access.
Rationale: This maintains backward compatibility, separates Flask-specific config from application logic, and allows gradual migration without breaking changes.
Implementation Guidance:
- Create
starpunk/config.pyas a helper that usescurrent_app.config - Provide methods like
get_database_path(),get_upload_folder(), etc. - Gradually replace direct config access with helper methods
- Document both in the configuration guide
Q2: Database Connection Pool Scope
Developer Question: The connection pool will replace the current get_db() context manager used throughout routes. Should it also replace direct sqlite3.connect() calls in migrations and utilities? How do we ensure proper connection lifecycle in Flask's request context?
Architect Answer: Connection pool replaces get_db() but NOT migrations. The pool replaces all runtime sqlite3.connect() calls but migrations must use direct connections for isolation. Integrate the pool with Flask's g object for request-scoped connections.
Rationale: Migrations need isolated transactions without pool interference. The pool improves runtime performance while request-scoped connections via g maintain Flask patterns.
Implementation Guidance:
- Implement pool in
starpunk/database/pool.py - Use
g.dbfor request-scoped connections - Replace
get_db()in all route files - Keep direct connections for migrations only
- Add pool statistics to metrics
Q3: Logging vs. Print Statements Migration
Developer Question: Current code has many print statements for debugging. Should we phase these out gradually or remove all at once? Should we use Python's logging module directly or Flask's app.logger? For CLI commands, should they use logging or click.echo()?
Architect Answer: Phase out print statements immediately in v1.1.1. Remove ALL print statements in this release. Use Flask's app.logger as the base, enhanced with structured logging. CLI commands use click.echo() for user output and logger for diagnostics.
Rationale: A clean break prevents confusion. Flask's logger integrates with the framework, and click.echo() is the proper CLI output method.
Implementation Guidance:
- Set up RotatingFileHandler in app factory
- Configure structured logging with correlation IDs
- Replace all print() with appropriate logging calls
- Use click.echo() for CLI user feedback
- Use logger for CLI diagnostic output
Q4: Error Handling Middleware Integration
Developer Question: For consistent error handling, should we use Flask's @app.errorhandler decorator or implement custom middleware? How do we ensure Micropub endpoints return spec-compliant error responses while other endpoints return HTML error pages?
Architect Answer: Use Flask's @app.errorhandler for all error handling. Register error handlers in the app factory. Micropub endpoints get specialized error handlers for spec compliance. No decorators on individual routes.
Rationale: Flask's error handler is the idiomatic approach. Centralized error handling reduces code duplication, and Micropub spec requires specific error formats.
Implementation Guidance:
- Create
starpunk/errors.pywithregister_error_handlers(app) - Check request path to determine response format
- Return JSON for
/micropubendpoints - Return HTML templates for other endpoints
- Log all errors with correlation IDs
Q5: FTS5 Fallback Search Implementation
Developer Question: If FTS5 isn't available, should fallback search be in the same module or separate? Should it have the same function signature? How do we detect FTS5 support - at startup or runtime?
Architect Answer: Same module, runtime detection with decorator pattern. Keep in search.py module with the same function signature. Determine support at startup and cache for performance.
Rationale: A single module maintains cohesion. Same signature allows transparent switching. Startup detection avoids runtime overhead.
Implementation Guidance:
- Detect FTS5 support at startup using a test table
- Cache the result in a module-level variable
- Use function pointer to select implementation
- Both implementations use identical signatures
- Log which implementation is active
Q6: Performance Monitoring Circular Buffer
Developer Question: For the circular buffer storing performance metrics - in a multi-process deployment (like gunicorn), should each process have its own buffer or should we use shared memory? How do we aggregate metrics across processes?
Architect Answer: Per-process buffer with aggregation endpoint. Each process maintains its own circular buffer. /admin/metrics aggregates across all workers. Use multiprocessing.Manager for shared state if needed.
Rationale: Per-process avoids locking overhead. Aggregation provides complete picture. This is a standard pattern for multi-process Flask apps.
Implementation Guidance:
- Create
MetricsBufferclass with deque - Include process ID in all metrics
- Aggregate in
/admin/metricsendpoint - Consider shared memory for future enhancement
- Default to 1000 entries per buffer
Important Questions
Q7: Session Table Migration
Developer Question: The session management enhancement requires a new database table. Should this be added to an existing migration file or create a new one? What happens to existing sessions during upgrade?
Architect Answer: New migration file 008_add_session_table.sql. This is a separate migration that maintains clarity. Drop existing sessions (document in upgrade guide). Use RETURNING clause with version check where supported.
Rationale: Clean migration history is important. Sessions are ephemeral and safe to drop. RETURNING improves performance where available.
Implementation Guidance:
- Create new migration file
- Drop table if exists before creation
- Add proper indexes for user_id and expires_at
- Document session reset in upgrade guide
- Test migration rollback procedure
Q8: Unicode Slug Generation
Developer Question: When slug generation from title fails (e.g., all emoji title), what should the fallback be? Should we return an error to the Micropub client or generate a default slug? What pattern for auto-generated slugs?
Architect Answer: Timestamp-based fallback with warning. Use YYYYMMDD-HHMMSS pattern when normalization fails. Log warning with original text for debugging. Return 201 Created to Micropub client (not an error).
Rationale: Timestamp ensures uniqueness. Warning helps identify encoding issues. Micropub spec doesn't define this as an error condition.
Implementation Guidance:
- Try Unicode normalization first
- Fall back to timestamp if result is empty
- Log warnings for debugging
- Include original text in logs
- Never fail the Micropub request
Q9: RSS Memory Optimization
Developer Question: The current RSS generator builds the entire feed in memory. For optimization, should we stream the XML directly to the response or use a generator? How do we handle large feeds (1000+ items)?
Architect Answer: Use generator with yield for streaming. Implement as generator function. Use Flask's Response(generate(), mimetype='application/rss+xml'). Stream directly to client.
Rationale: Generators minimize memory footprint. Flask handles streaming automatically. This scales to any feed size.
Implementation Guidance:
- Convert RSS generation to generator function
- Yield XML chunks, not individual characters
- Query notes in batches if needed
- Set appropriate response headers
- Test with large feed counts
Q10: Health Check Authentication
Developer Question: Should health check endpoints require authentication? Load balancers need to access them, but detailed health info might be sensitive. How do we balance security with operational needs?
Architect Answer: Basic check public, detailed check requires auth. /health returns 200 OK (no auth, for load balancers). /health?detailed=true requires authentication. Separate /admin/health for full diagnostics (always auth).
Rationale: Load balancers need unauthenticated access. Detailed info could leak sensitive data. This follows industry standard patterns.
Implementation Guidance:
- Basic health: just return 200 if app responds
- Detailed health: check database, disk space, etc.
- Admin health: full diagnostics with metrics
- Use query parameter to trigger detailed mode
- Document endpoints in operations guide
Q11: Request Correlation ID Scope
Developer Question: Should the correlation ID be per-request or per-session? If a request triggers background tasks, should they inherit the correlation ID? What about CLI commands?
Architect Answer: New ID for each HTTP request, inherit in background tasks. Each HTTP request gets a unique ID. Background tasks spawned from requests inherit the parent ID. CLI commands generate their own root ID.
Rationale: This maintains request tracing through async operations. CLI commands are independent operations. It's a standard distributed tracing pattern.
Implementation Guidance:
- Generate UUID for each request
- Store in Flask's
gobject - Pass to background tasks as parameter
- Include in all log messages
- Add to response headers
Q12: Performance Monitoring Sampling
Developer Question: To reduce overhead, should we sample performance metrics (e.g., only track 10% of requests)? Should sampling be configurable? Apply to all metrics or just specific types?
Architect Answer: Configuration-based sampling with operation types. Default 10% sampling rate with different rates per operation type. Applied at collection point, not in slow query log.
Rationale: Reduces overhead in production. Operation-specific rates allow focused monitoring. Slow query log should capture everything for debugging.
Implementation Guidance:
- Define sampling rates in config
- Different rates for database/http/render
- Use random sampling at collection point
- Always log slow queries regardless
- Make rates runtime configurable
Q13: Search Highlighting XSS Prevention
Developer Question: When highlighting search terms in results, how do we prevent XSS if the search term contains HTML? Should we use a library like bleach or implement our own escaping?
Architect Answer: Use markupsafe.escape() with whitelist. Use Flask's standard markupsafe.escape(). Whitelist only <mark> tags for highlighting. Validate class attribute against whitelist.
Rationale: markupsafe is Flask's security standard. Whitelist approach is most secure. Prevents class-based XSS attacks.
Implementation Guidance:
- Escape all text first
- Then add safe mark tags
- Use Markup() for safe strings
- Limit to single highlight class
- Test with malicious input
Q14: Configuration Validation Timing
Developer Question: When should configuration validation run - at startup, on first use, or both? Should invalid config crash the app or fall back to defaults? Should we validate before or after migrations?
Architect Answer: Validate at startup, fail fast with clear errors. Validate immediately after loading config. Invalid config crashes app with descriptive error. Validate both presence and type. Run BEFORE migrations.
Rationale: Fail fast prevents subtle runtime errors. Clear errors help operators fix issues. Type validation catches common mistakes.
Implementation Guidance:
- Create validation schema
- Check required fields exist
- Validate types and ranges
- Provide clear error messages
- Exit with non-zero status on failure
Nice-to-Have Clarifications
Q15: Test Race Condition Fix Priority
Developer Question: Some tests have intermittent failures due to race conditions. Should fixing these block v1.1.1 release, or can we defer to v1.1.2?
Architect Answer: Fix in Phase 2, after core features. Not blocking for v1.1.1 release. Fix after performance monitoring is in place. Add to technical debt backlog.
Rationale: Race conditions are intermittent, not blocking. Focus on user-visible improvements first. Can be addressed in v1.1.2.
Q16: Memory Monitoring Thread
Developer Question: The memory monitoring thread needs to record metrics periodically. How should it handle database unavailability? Should it stop gracefully on shutdown?
Architect Answer: Use threading.Event for graceful shutdown. Stop gracefully using Event. Log warning if database unavailable, don't crash. Reconnect automatically on database recovery.
Rationale: Graceful shutdown prevents data corruption. Monitoring shouldn't crash the app. Self-healing improves reliability.
Implementation Guidance:
- Use daemon thread with Event
- Check stop event in loop
- Handle database errors gracefully
- Retry with exponential backoff
- Log issues but don't propagate
Q17: Log Rotation Strategy
Developer Question: For log rotation, should we use Python's RotatingFileHandler, Linux logrotate, or a custom solution? What size/count limits are appropriate?
Architect Answer: Use RotatingFileHandler with 10MB files. Python's built-in RotatingFileHandler. 10MB per file, keep 10 files. No compression for simplicity.
Rationale: Built-in solution requires no dependencies. 100MB total is reasonable for small deployment. Compression adds complexity for minimal benefit.
Q18: Error Budget Tracking
Developer Question: How should we track error budgets - as a percentage, count, or rate? Over what time window? Should exceeding budget trigger any automatic actions?
Architect Answer: Simple counter-based tracking. Track in metrics buffer. Display in dashboard as percentage. No auto-alerting in v1.1.1 (future enhancement).
Rationale: Simple to implement and understand. Provides visibility without complexity. Alerting can be added later.
Implementation Guidance:
- Track last 1000 requests
- Calculate success rate
- Display remaining budget
- Log when budget low
- Manual monitoring for now
Q19: Dashboard UI Framework
Developer Question: For the admin dashboard, should we use a JavaScript framework (React/Vue), server-side rendering, or a hybrid approach? Any CSS framework preferences?
Architect Answer: Server-side rendering with htmx for updates. No JavaScript framework for simplicity. Use htmx for real-time updates. Chart.js for graphs via CDN. Existing CSS, no new framework.
Rationale: Maintains "works without JavaScript" principle. htmx provides reactivity without complexity. Chart.js is simple and sufficient.
Implementation Guidance:
- Use Jinja2 templates
- Add htmx for auto-refresh
- Include Chart.js from CDN
- Keep existing CSS styles
- Progressive enhancement approach
Q20: Micropub Error Response Format
Developer Question: The Micropub spec defines error responses, but should we add additional debugging info in development mode? How much detail in error_description field?
Architect Answer: Maintain strict Micropub spec compliance. Use spec-defined error format exactly. Add error_description for clarity. Log additional details server-side only.
Rationale: Spec compliance is non-negotiable. error_description is allowed by spec. Server logs provide debugging info.
Implementation Guidance:
- Use exact error codes from spec
- Include helpful error_description
- Never expose internal details
- Log full context server-side
- Keep development/production responses identical
Implementation Priorities
The architect recommends implementing v1.1.1 in three phases:
Phase 1: Core Infrastructure (Week 1)
Focus on foundational improvements that other features depend on:
- Logging system replacement - Remove all print statements
- Configuration validation - Fail fast on invalid config
- Database connection pool - Improve performance
- Error handling middleware - Consistent error responses
Phase 2: Enhancements (Week 2)
Add the user-facing improvements: 5. Session management - Secure session handling 6. Performance monitoring - Track system health 7. Health checks - Enable monitoring 8. Search improvements - Better search experience
Phase 3: Polish (Week 3)
Complete the release with final touches: 9. Admin dashboard - Visualize metrics 10. Memory optimization - RSS streaming 11. Documentation - Update all guides 12. Testing improvements - Fix flaky tests
Additional Architectural Guidance
Configuration Integration Strategy
The developer should implement configuration in layers:
- Keep existing config.py for Flask settings
- Add starpunk/config.py as helper module
- Migrate gradually by replacing direct config access
- Document both systems in configuration guide
Connection Pool Implementation Notes
The pool should be transparent to calling code:
- Same interface as get_db()
- Automatic cleanup on request end
- Connection recycling for performance
- Statistics collection for monitoring
Validation Specifications
Create centralized validation schemas for:
- Configuration values (types, ranges, requirements)
- Micropub requests (required fields, formats)
- Input data (lengths, patterns, encoding)
Migration Ordering
The developer must run migrations in this specific order:
- 008_add_session_table.sql
- 009_add_performance_indexes.sql
- 010_add_metrics_table.sql
Testing Gaps to Address
While not blocking v1.1.1, these should be noted for v1.1.2:
- Connection pool stress tests
- Unicode edge cases
- Memory leak detection
- Error recovery scenarios
Required Documentation
Before release, create these operational guides:
/docs/operations/upgrade-to-v1.1.1.md- Step-by-step upgrade process/docs/operations/troubleshooting.md- Common issues and solutions/docs/operations/performance-tuning.md- Optimization guidelines
Final Architectural Notes
These answers prioritize:
- Simplicity over features - Every addition must justify its complexity
- Compatibility over clean breaks - Don't break existing deployments
- Gradual migration over big bang - Incremental improvements reduce risk
- Flask patterns over custom solutions - Use idiomatic Flask approaches
The developer should implement in the phase order specified, testing thoroughly between phases. Any blockers or uncertainties should be escalated immediately for architectural review.
Remember: v1.1.1 is about polish, not new features. Focus on making existing functionality more robust, observable, and maintainable.