Files

Phil Skentelbery f62d3c5382 docs: Add v1.1.1 developer Q&A session

Create developer-qa.md with architect's answers to all 20
implementation questions from the developer's design review.

This is the proper format for Q&A between developer and architect
during design review, not an ADR (which is for architectural
decisions with lasting impact).

Content includes:
- 6 critical questions with answers (config, db pool, logging, etc.)
- 8 important questions (session migration, Unicode, health checks)
- 6 nice-to-have clarifications (testing, monitoring, dashboard)
- Implementation phases (3 weeks)
- Integration guidance

Developer now has clear guidance to proceed with v1.1.1 implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-25 13:43:56 -07:00

19 KiB

Raw Blame History

StarPunk v1.1.1 "Polish" - Developer Q&A

Date: 2025-11-25 Developer: Developer Agent Architect: Architect Agent

This document contains the Q&A session between the developer and architect during v1.1.1 design review.

Purpose

The developer reviewed all v1.1.1 design documentation and prepared questions about implementation details, integration points, and edge cases. This document contains the architect's answers to guide implementation.

Critical Questions (Must be answered before implementation)

Q1: Configuration System Integration

Developer Question: The design calls for centralized configuration. I see we have config.py at the root for Flask app config. Should the new starpunk/config.py module replace this, wrap it, or co-exist as a separate configuration layer? How do we avoid breaking existing code that directly imports from config?

Architect Answer: Keep both files with clear separation of concerns. The existing config.py remains for Flask app configuration, while the new starpunk/config.py becomes a configuration helper module that wraps Flask's app.config for runtime access.

Rationale: This maintains backward compatibility, separates Flask-specific config from application logic, and allows gradual migration without breaking changes.

Implementation Guidance:

Create starpunk/config.py as a helper that uses current_app.config
Provide methods like get_database_path(), get_upload_folder(), etc.
Gradually replace direct config access with helper methods
Document both in the configuration guide

Q2: Database Connection Pool Scope

Developer Question: The connection pool will replace the current get_db() context manager used throughout routes. Should it also replace direct sqlite3.connect() calls in migrations and utilities? How do we ensure proper connection lifecycle in Flask's request context?

Architect Answer: Connection pool replaces get_db() but NOT migrations. The pool replaces all runtime sqlite3.connect() calls but migrations must use direct connections for isolation. Integrate the pool with Flask's g object for request-scoped connections.

Rationale: Migrations need isolated transactions without pool interference. The pool improves runtime performance while request-scoped connections via g maintain Flask patterns.

Implementation Guidance:

Implement pool in starpunk/database/pool.py
Use g.db for request-scoped connections
Replace get_db() in all route files
Keep direct connections for migrations only
Add pool statistics to metrics

Q3: Logging vs. Print Statements Migration

Developer Question: Current code has many print statements for debugging. Should we phase these out gradually or remove all at once? Should we use Python's logging module directly or Flask's app.logger? For CLI commands, should they use logging or click.echo()?

Architect Answer: Phase out print statements immediately in v1.1.1. Remove ALL print statements in this release. Use Flask's app.logger as the base, enhanced with structured logging. CLI commands use click.echo() for user output and logger for diagnostics.

Rationale: A clean break prevents confusion. Flask's logger integrates with the framework, and click.echo() is the proper CLI output method.

Implementation Guidance:

Set up RotatingFileHandler in app factory
Configure structured logging with correlation IDs
Replace all print() with appropriate logging calls
Use click.echo() for CLI user feedback
Use logger for CLI diagnostic output

Q4: Error Handling Middleware Integration

Developer Question: For consistent error handling, should we use Flask's @app.errorhandler decorator or implement custom middleware? How do we ensure Micropub endpoints return spec-compliant error responses while other endpoints return HTML error pages?

Architect Answer: Use Flask's @app.errorhandler for all error handling. Register error handlers in the app factory. Micropub endpoints get specialized error handlers for spec compliance. No decorators on individual routes.

Rationale: Flask's error handler is the idiomatic approach. Centralized error handling reduces code duplication, and Micropub spec requires specific error formats.

Implementation Guidance:

Create starpunk/errors.py with register_error_handlers(app)
Check request path to determine response format
Return JSON for /micropub endpoints
Return HTML templates for other endpoints
Log all errors with correlation IDs

Q5: FTS5 Fallback Search Implementation

Developer Question: If FTS5 isn't available, should fallback search be in the same module or separate? Should it have the same function signature? How do we detect FTS5 support - at startup or runtime?

Architect Answer: Same module, runtime detection with decorator pattern. Keep in search.py module with the same function signature. Determine support at startup and cache for performance.

Rationale: A single module maintains cohesion. Same signature allows transparent switching. Startup detection avoids runtime overhead.

Implementation Guidance:

Detect FTS5 support at startup using a test table
Cache the result in a module-level variable
Use function pointer to select implementation
Both implementations use identical signatures
Log which implementation is active

Q6: Performance Monitoring Circular Buffer

Developer Question: For the circular buffer storing performance metrics - in a multi-process deployment (like gunicorn), should each process have its own buffer or should we use shared memory? How do we aggregate metrics across processes?

Architect Answer: Per-process buffer with aggregation endpoint. Each process maintains its own circular buffer. /admin/metrics aggregates across all workers. Use multiprocessing.Manager for shared state if needed.

Rationale: Per-process avoids locking overhead. Aggregation provides complete picture. This is a standard pattern for multi-process Flask apps.

Implementation Guidance:

Create MetricsBuffer class with deque
Include process ID in all metrics
Aggregate in /admin/metrics endpoint
Consider shared memory for future enhancement
Default to 1000 entries per buffer

Important Questions

Q7: Session Table Migration

Developer Question: The session management enhancement requires a new database table. Should this be added to an existing migration file or create a new one? What happens to existing sessions during upgrade?

Architect Answer: New migration file 008_add_session_table.sql. This is a separate migration that maintains clarity. Drop existing sessions (document in upgrade guide). Use RETURNING clause with version check where supported.

Rationale: Clean migration history is important. Sessions are ephemeral and safe to drop. RETURNING improves performance where available.

Implementation Guidance:

Create new migration file
Drop table if exists before creation
Add proper indexes for user_id and expires_at
Document session reset in upgrade guide
Test migration rollback procedure

Q8: Unicode Slug Generation

Developer Question: When slug generation from title fails (e.g., all emoji title), what should the fallback be? Should we return an error to the Micropub client or generate a default slug? What pattern for auto-generated slugs?

Architect Answer: Timestamp-based fallback with warning. Use YYYYMMDD-HHMMSS pattern when normalization fails. Log warning with original text for debugging. Return 201 Created to Micropub client (not an error).

Rationale: Timestamp ensures uniqueness. Warning helps identify encoding issues. Micropub spec doesn't define this as an error condition.

Implementation Guidance:

Try Unicode normalization first
Fall back to timestamp if result is empty
Log warnings for debugging
Include original text in logs
Never fail the Micropub request

Q9: RSS Memory Optimization

Developer Question: The current RSS generator builds the entire feed in memory. For optimization, should we stream the XML directly to the response or use a generator? How do we handle large feeds (1000+ items)?

Architect Answer: Use generator with yield for streaming. Implement as generator function. Use Flask's Response(generate(), mimetype='application/rss+xml'). Stream directly to client.

Rationale: Generators minimize memory footprint. Flask handles streaming automatically. This scales to any feed size.

Implementation Guidance:

Convert RSS generation to generator function
Yield XML chunks, not individual characters
Query notes in batches if needed
Set appropriate response headers
Test with large feed counts

Q10: Health Check Authentication

Developer Question: Should health check endpoints require authentication? Load balancers need to access them, but detailed health info might be sensitive. How do we balance security with operational needs?

Architect Answer: Basic check public, detailed check requires auth. /health returns 200 OK (no auth, for load balancers). /health?detailed=true requires authentication. Separate /admin/health for full diagnostics (always auth).

Rationale: Load balancers need unauthenticated access. Detailed info could leak sensitive data. This follows industry standard patterns.

Implementation Guidance:

Basic health: just return 200 if app responds
Detailed health: check database, disk space, etc.
Admin health: full diagnostics with metrics
Use query parameter to trigger detailed mode
Document endpoints in operations guide

Q11: Request Correlation ID Scope

Developer Question: Should the correlation ID be per-request or per-session? If a request triggers background tasks, should they inherit the correlation ID? What about CLI commands?

Architect Answer: New ID for each HTTP request, inherit in background tasks. Each HTTP request gets a unique ID. Background tasks spawned from requests inherit the parent ID. CLI commands generate their own root ID.

Rationale: This maintains request tracing through async operations. CLI commands are independent operations. It's a standard distributed tracing pattern.

Implementation Guidance:

Generate UUID for each request
Store in Flask's g object
Pass to background tasks as parameter
Include in all log messages
Add to response headers

Q12: Performance Monitoring Sampling

Developer Question: To reduce overhead, should we sample performance metrics (e.g., only track 10% of requests)? Should sampling be configurable? Apply to all metrics or just specific types?

Architect Answer: Configuration-based sampling with operation types. Default 10% sampling rate with different rates per operation type. Applied at collection point, not in slow query log.

Rationale: Reduces overhead in production. Operation-specific rates allow focused monitoring. Slow query log should capture everything for debugging.

Implementation Guidance:

Define sampling rates in config
Different rates for database/http/render
Use random sampling at collection point
Always log slow queries regardless
Make rates runtime configurable

Q13: Search Highlighting XSS Prevention

Developer Question: When highlighting search terms in results, how do we prevent XSS if the search term contains HTML? Should we use a library like bleach or implement our own escaping?

Architect Answer: Use markupsafe.escape() with whitelist. Use Flask's standard markupsafe.escape(). Whitelist only <mark> tags for highlighting. Validate class attribute against whitelist.

Rationale: markupsafe is Flask's security standard. Whitelist approach is most secure. Prevents class-based XSS attacks.

Implementation Guidance:

Escape all text first
Then add safe mark tags
Use Markup() for safe strings
Limit to single highlight class
Test with malicious input

Q14: Configuration Validation Timing

Developer Question: When should configuration validation run - at startup, on first use, or both? Should invalid config crash the app or fall back to defaults? Should we validate before or after migrations?

Architect Answer: Validate at startup, fail fast with clear errors. Validate immediately after loading config. Invalid config crashes app with descriptive error. Validate both presence and type. Run BEFORE migrations.

Rationale: Fail fast prevents subtle runtime errors. Clear errors help operators fix issues. Type validation catches common mistakes.

Implementation Guidance:

Create validation schema
Check required fields exist
Validate types and ranges
Provide clear error messages
Exit with non-zero status on failure

Nice-to-Have Clarifications

Q15: Test Race Condition Fix Priority

Developer Question: Some tests have intermittent failures due to race conditions. Should fixing these block v1.1.1 release, or can we defer to v1.1.2?

Architect Answer: Fix in Phase 2, after core features. Not blocking for v1.1.1 release. Fix after performance monitoring is in place. Add to technical debt backlog.

Rationale: Race conditions are intermittent, not blocking. Focus on user-visible improvements first. Can be addressed in v1.1.2.

Q16: Memory Monitoring Thread

Developer Question: The memory monitoring thread needs to record metrics periodically. How should it handle database unavailability? Should it stop gracefully on shutdown?

Architect Answer: Use threading.Event for graceful shutdown. Stop gracefully using Event. Log warning if database unavailable, don't crash. Reconnect automatically on database recovery.

Rationale: Graceful shutdown prevents data corruption. Monitoring shouldn't crash the app. Self-healing improves reliability.

Implementation Guidance:

Use daemon thread with Event
Check stop event in loop
Handle database errors gracefully
Retry with exponential backoff
Log issues but don't propagate

Q17: Log Rotation Strategy

Developer Question: For log rotation, should we use Python's RotatingFileHandler, Linux logrotate, or a custom solution? What size/count limits are appropriate?

Architect Answer: Use RotatingFileHandler with 10MB files. Python's built-in RotatingFileHandler. 10MB per file, keep 10 files. No compression for simplicity.

Rationale: Built-in solution requires no dependencies. 100MB total is reasonable for small deployment. Compression adds complexity for minimal benefit.

Q18: Error Budget Tracking

Developer Question: How should we track error budgets - as a percentage, count, or rate? Over what time window? Should exceeding budget trigger any automatic actions?

Architect Answer: Simple counter-based tracking. Track in metrics buffer. Display in dashboard as percentage. No auto-alerting in v1.1.1 (future enhancement).

Rationale: Simple to implement and understand. Provides visibility without complexity. Alerting can be added later.

Implementation Guidance:

Track last 1000 requests
Calculate success rate
Display remaining budget
Log when budget low
Manual monitoring for now

Q19: Dashboard UI Framework

Developer Question: For the admin dashboard, should we use a JavaScript framework (React/Vue), server-side rendering, or a hybrid approach? Any CSS framework preferences?

Architect Answer: Server-side rendering with htmx for updates. No JavaScript framework for simplicity. Use htmx for real-time updates. Chart.js for graphs via CDN. Existing CSS, no new framework.

Rationale: Maintains "works without JavaScript" principle. htmx provides reactivity without complexity. Chart.js is simple and sufficient.

Implementation Guidance:

Use Jinja2 templates
Add htmx for auto-refresh
Include Chart.js from CDN
Keep existing CSS styles
Progressive enhancement approach

Q20: Micropub Error Response Format

Developer Question: The Micropub spec defines error responses, but should we add additional debugging info in development mode? How much detail in error_description field?

Architect Answer: Maintain strict Micropub spec compliance. Use spec-defined error format exactly. Add error_description for clarity. Log additional details server-side only.

Rationale: Spec compliance is non-negotiable. error_description is allowed by spec. Server logs provide debugging info.

Implementation Guidance:

Use exact error codes from spec
Include helpful error_description
Never expose internal details
Log full context server-side
Keep development/production responses identical

Implementation Priorities

The architect recommends implementing v1.1.1 in three phases:

Phase 1: Core Infrastructure (Week 1)

Focus on foundational improvements that other features depend on:

Logging system replacement - Remove all print statements
Configuration validation - Fail fast on invalid config
Database connection pool - Improve performance
Error handling middleware - Consistent error responses

Phase 2: Enhancements (Week 2)

Add the user-facing improvements: 5. Session management - Secure session handling 6. Performance monitoring - Track system health 7. Health checks - Enable monitoring 8. Search improvements - Better search experience

Phase 3: Polish (Week 3)

Complete the release with final touches: 9. Admin dashboard - Visualize metrics 10. Memory optimization - RSS streaming 11. Documentation - Update all guides 12. Testing improvements - Fix flaky tests

Additional Architectural Guidance

Configuration Integration Strategy

The developer should implement configuration in layers:

Keep existing config.py for Flask settings
Add starpunk/config.py as helper module
Migrate gradually by replacing direct config access
Document both systems in configuration guide

Connection Pool Implementation Notes

The pool should be transparent to calling code:

Same interface as get_db()
Automatic cleanup on request end
Connection recycling for performance
Statistics collection for monitoring

Validation Specifications

Create centralized validation schemas for:

Configuration values (types, ranges, requirements)
Micropub requests (required fields, formats)
Input data (lengths, patterns, encoding)

Migration Ordering

The developer must run migrations in this specific order:

008_add_session_table.sql
009_add_performance_indexes.sql
010_add_metrics_table.sql

Testing Gaps to Address

While not blocking v1.1.1, these should be noted for v1.1.2:

Connection pool stress tests
Unicode edge cases
Memory leak detection
Error recovery scenarios

Required Documentation

Before release, create these operational guides:

/docs/operations/upgrade-to-v1.1.1.md - Step-by-step upgrade process
/docs/operations/troubleshooting.md - Common issues and solutions
/docs/operations/performance-tuning.md - Optimization guidelines

Final Architectural Notes

These answers prioritize:

Simplicity over features - Every addition must justify its complexity
Compatibility over clean breaks - Don't break existing deployments
Gradual migration over big bang - Incremental improvements reduce risk
Flask patterns over custom solutions - Use idiomatic Flask approaches

The developer should implement in the phase order specified, testing thoroughly between phases. Any blockers or uncertainties should be escalated immediately for architectural review.

Remember: v1.1.1 is about polish, not new features. Focus on making existing functionality more robust, observable, and maintainable.

19 KiB Raw Blame History

StarPunk v1.1.1 "Polish" - Developer Q&A

Purpose

Critical Questions (Must be answered before implementation)

Q1: Configuration System Integration

Q2: Database Connection Pool Scope

Q3: Logging vs. Print Statements Migration

Q4: Error Handling Middleware Integration

Q5: FTS5 Fallback Search Implementation

Q6: Performance Monitoring Circular Buffer

Important Questions

Q7: Session Table Migration

Q8: Unicode Slug Generation

Q9: RSS Memory Optimization

Q10: Health Check Authentication

Q11: Request Correlation ID Scope

Q12: Performance Monitoring Sampling

Q13: Search Highlighting XSS Prevention

Q14: Configuration Validation Timing

Nice-to-Have Clarifications

Q15: Test Race Condition Fix Priority

Q16: Memory Monitoring Thread

Q17: Log Rotation Strategy

Q18: Error Budget Tracking

Q19: Dashboard UI Framework

Q20: Micropub Error Response Format

Implementation Priorities

Phase 1: Core Infrastructure (Week 1)

Phase 2: Enhancements (Week 2)

Phase 3: Polish (Week 3)

Additional Architectural Guidance

Configuration Integration Strategy

Connection Pool Implementation Notes

Validation Specifications

Migration Ordering

Testing Gaps to Address

Required Documentation

Final Architectural Notes

19 KiB

Raw Blame History