StarPunk/docs/design/v1.1.1/bug-fixes-spec.md

# Bug Fixes and Edge Cases Specification

## Overview
This specification details the bug fixes and edge case handling improvements planned for v1.1.1, focusing on test stability, Unicode handling, memory optimization, and session management.

## Bug Fixes

### 1. Migration Race Condition in Tests

#### Problem
10 tests exhibit flaky behavior due to race conditions during database migration execution. Tests occasionally fail when migrations are executed concurrently or when the test database isn't properly initialized.

#### Root Cause
- Concurrent test execution without proper isolation
- Shared database state between tests
- Migration lock not properly acquired
- Test fixtures not waiting for migration completion

#### Solution
```python
# starpunk/testing/fixtures.py
import threading
import tempfile
from contextlib import contextmanager

# Global lock for test database operations
_test_db_lock = threading.Lock()

@contextmanager
def isolated_test_database():
    """Create isolated database for testing"""
    with _test_db_lock:
        # Create unique temp database
        temp_db = tempfile.NamedTemporaryFile(
            suffix='.db',
            delete=False
        )
        db_path = temp_db.name
        temp_db.close()

        try:
            # Initialize database with migrations
            run_migrations_sync(db_path)

            # Yield database for test
            yield db_path
        finally:
            # Cleanup
            try:
                os.unlink(db_path)
            except:
                pass

def run_migrations_sync(db_path: str):
    """Run migrations synchronously with proper locking"""
    conn = sqlite3.connect(db_path)

    # Use exclusive lock during migrations
    conn.execute("BEGIN EXCLUSIVE")

    try:
        migrator = DatabaseMigrator(conn)
        migrator.run_all()
        conn.commit()
    except Exception:
        conn.rollback()
        raise
    finally:
        conn.close()

# Test base class
class StarPunkTestCase(unittest.TestCase):
    """Base test case with proper database isolation"""

    def setUp(self):
        """Set up test with isolated database"""
        self.db_context = isolated_test_database()
        self.db_path = self.db_context.__enter__()
        self.app = create_app(database=self.db_path)
        self.client = self.app.test_client()

    def tearDown(self):
        """Clean up test database"""
        self.db_context.__exit__(None, None, None)

# Example test with proper isolation
class TestMigrations(StarPunkTestCase):
    def test_migration_idempotency(self):
        """Test that migrations can be run multiple times"""
        # First run happens in setUp

        # Second run should be safe
        run_migrations_sync(self.db_path)

        # Verify database state
        with sqlite3.connect(self.db_path) as conn:
            tables = conn.execute(
                "SELECT name FROM sqlite_master WHERE type='table'"
            ).fetchall()
            self.assertIn(('notes',), tables)
```

#### Test Timing Improvements
```python
# starpunk/testing/wait.py
import time
from typing import Callable

def wait_for_condition(
    condition: Callable[[], bool],
    timeout: float = 5.0,
    interval: float = 0.1
) -> bool:
    """Wait for condition to become true"""
    start = time.time()

    while time.time() - start < timeout:
        if condition():
            return True
        time.sleep(interval)

    return False

# Usage in tests
def test_async_operation(self):
    """Test with proper waiting"""
    self.client.post('/notes', data={'content': 'Test'})

    # Wait for indexing to complete
    success = wait_for_condition(
        lambda: search_index_updated(),
        timeout=2.0
    )
    self.assertTrue(success)
```

### 2. Unicode Edge Cases in Slug Generation

#### Problem
Slug generation fails or produces invalid slugs for certain Unicode inputs, including emoji, RTL text, and combining characters.

#### Current Issues
- Emoji in titles break slug generation
- RTL languages produce confusing slugs
- Combining characters aren't normalized
- Zero-width characters remain in slugs

#### Solution
```python
# starpunk/utils/slugify.py
import unicodedata
import re

def generate_slug(text: str, max_length: int = 50) -> str:
    """Generate URL-safe slug from text with Unicode handling"""

    if not text:
        return generate_random_slug()

    # Normalize Unicode (NFKD = compatibility decomposition)
    text = unicodedata.normalize('NFKD', text)

    # Remove non-ASCII characters but keep numbers and letters
    text = text.encode('ascii', 'ignore').decode('ascii')

    # Convert to lowercase
    text = text.lower()

    # Replace spaces and punctuation with hyphens
    text = re.sub(r'[^a-z0-9]+', '-', text)

    # Remove leading/trailing hyphens
    text = text.strip('-')

    # Collapse multiple hyphens
    text = re.sub(r'-+', '-', text)

    # Truncate to max length (at word boundary if possible)
    if len(text) > max_length:
        text = text[:max_length].rsplit('-', 1)[0]

    # If we end up with empty string, generate random
    if not text:
        return generate_random_slug()

    return text

def generate_random_slug() -> str:
    """Generate random slug when text-based generation fails"""
    import random
    import string

    return 'note-' + ''.join(
        random.choices(string.ascii_lowercase + string.digits, k=8)
    )

# Extended test cases
TEST_CASES = [
    ("Hello World", "hello-world"),
    ("Hello 👋 World", "hello-world"),  # Emoji removed
    ("مرحبا بالعالم", "note-a1b2c3d4"),  # Arabic -> random
    ("Ĥëłłö Ŵöŕłđ", "hello-world"),  # Diacritics removed
    ("Hello\u200bWorld", "helloworld"),  # Zero-width space
    ("---Hello---", "hello"),  # Multiple hyphens
    ("123", "123"),  # Numbers only
    ("!@#$%", "note-x1y2z3a4"),  # Special chars -> random
    ("a" * 100, "a" * 50),  # Truncation
    ("", "note-r4nd0m12"),  # Empty -> random
]

def test_slug_generation():
    """Test slug generation with Unicode edge cases"""
    for input_text, expected in TEST_CASES:
        result = generate_slug(input_text)
        if expected.startswith("note-"):
            # Random slug - just check format
            assert result.startswith("note-")
            assert len(result) == 13
        else:
            assert result == expected
```

### 3. RSS Feed Memory Optimization

#### Problem
RSS feed generation for sites with thousands of notes causes high memory usage and slow response times.

#### Current Issues
- Loading all notes into memory at once
- No pagination or limits
- Inefficient XML building
- No caching of generated feeds

#### Solution
```python
# starpunk/feeds/rss.py
from typing import Iterator
import sqlite3

class OptimizedRSSGenerator:
    """Memory-efficient RSS feed generator"""

    def __init__(self, base_url: str, limit: int = 50):
        self.base_url = base_url
        self.limit = limit

    def generate_feed(self) -> str:
        """Generate RSS feed with streaming"""
        # Use string builder for efficiency
        parts = []
        parts.append(self._generate_header())

        # Stream notes from database
        for note in self._stream_recent_notes():
            parts.append(self._generate_item(note))

        parts.append(self._generate_footer())

        return ''.join(parts)

    def _stream_recent_notes(self) -> Iterator[dict]:
        """Stream notes without loading all into memory"""
        with get_db() as conn:
            # Use server-side cursor equivalent
            conn.row_factory = sqlite3.Row

            cursor = conn.execute(
                """
                SELECT
                    id,
                    content,
                    slug,
                    created_at,
                    updated_at
                FROM notes
                WHERE published = 1
                ORDER BY created_at DESC
                LIMIT ?
                """,
                (self.limit,)
            )

            # Yield one at a time
            for row in cursor:
                yield dict(row)

    def _generate_item(self, note: dict) -> str:
        """Generate single RSS item efficiently"""
        # Pre-calculate values once
        title = extract_title(note['content'])
        url = f"{self.base_url}/notes/{note['id']}"

        # Use string formatting for efficiency
        return f"""
        <item>
            <title>{escape_xml(title)}</title>
            <link>{url}</link>
            <guid isPermaLink="true">{url}</guid>
            <description>{escape_xml(note['content'][:500])}</description>
            <pubDate>{format_rfc822(note['created_at'])}</pubDate>
        </item>
        """

# Caching layer
from functools import lru_cache
from datetime import datetime, timedelta

class CachedRSSFeed:
    """RSS feed with caching"""

    def __init__(self):
        self.cache = {}
        self.cache_duration = timedelta(minutes=5)

    def get_feed(self) -> str:
        """Get RSS feed with caching"""
        now = datetime.now()

        # Check cache
        if 'feed' in self.cache:
            cached_feed, cached_time = self.cache['feed']
            if now - cached_time < self.cache_duration:
                return cached_feed

        # Generate new feed
        generator = OptimizedRSSGenerator(
            base_url=config.BASE_URL,
            limit=config.RSS_ITEM_LIMIT
        )
        feed = generator.generate_feed()

        # Update cache
        self.cache['feed'] = (feed, now)

        return feed

    def invalidate(self):
        """Invalidate cache when notes change"""
        self.cache.clear()

# Memory-efficient XML escaping
def escape_xml(text: str) -> str:
    """Escape XML special characters efficiently"""
    if not text:
        return ""

    # Use replace instead of xml.sax.saxutils for efficiency
    return (
        text.replace("&", "&amp;")
            .replace("<", "&lt;")
            .replace(">", "&gt;")
            .replace('"', "&quot;")
            .replace("'", "&apos;")
    )
```

### 4. Session Timeout Handling

#### Problem
Sessions don't properly timeout, leading to security issues and stale session accumulation.

#### Current Issues
- No automatic session expiration
- No cleanup of old sessions
- Session extension not working
- No timeout configuration

#### Solution
```python
# starpunk/auth/session_improved.py
from datetime import datetime, timedelta
import threading
import time

class ImprovedSessionManager:
    """Session manager with proper timeout handling"""

    def __init__(self):
        self.timeout = config.SESSION_TIMEOUT
        self.cleanup_interval = 3600  # 1 hour
        self._start_cleanup_thread()

    def _start_cleanup_thread(self):
        """Start background cleanup thread"""
        def cleanup_loop():
            while True:
                try:
                    self.cleanup_expired_sessions()
                except Exception as e:
                    logger.error(f"Session cleanup error: {e}")
                time.sleep(self.cleanup_interval)

        thread = threading.Thread(target=cleanup_loop)
        thread.daemon = True
        thread.start()

    def create_session(self, user_id: str, remember: bool = False) -> dict:
        """Create session with appropriate timeout"""
        session_id = generate_secure_token()

        # Longer timeout for "remember me"
        if remember:
            timeout = config.SESSION_TIMEOUT_REMEMBER
        else:
            timeout = self.timeout

        expires_at = datetime.now() + timedelta(seconds=timeout)

        with get_db() as conn:
            conn.execute(
                """
                INSERT INTO sessions (
                    id, user_id, expires_at, created_at, last_activity
                )
                VALUES (?, ?, ?, ?, ?)
                """,
                (
                    session_id,
                    user_id,
                    expires_at,
                    datetime.now(),
                    datetime.now()
                )
            )

        logger.info(f"Session created for user {user_id}")

        return {
            'session_id': session_id,
            'expires_at': expires_at.isoformat(),
            'timeout': timeout
        }

    def validate_and_extend(self, session_id: str) -> Optional[str]:
        """Validate session and extend timeout on activity"""
        now = datetime.now()

        with get_db() as conn:
            # Get session
            result = conn.execute(
                """
                SELECT user_id, expires_at, last_activity
                FROM sessions
                WHERE id = ? AND expires_at > ?
                """,
                (session_id, now)
            ).fetchone()

            if not result:
                return None

            user_id = result['user_id']
            last_activity = datetime.fromisoformat(result['last_activity'])

            # Extend session if active
            if now - last_activity > timedelta(minutes=5):
                # Only extend if there's been recent activity
                new_expires = now + timedelta(seconds=self.timeout)

                conn.execute(
                    """
                    UPDATE sessions
                    SET expires_at = ?, last_activity = ?
                    WHERE id = ?
                    """,
                    (new_expires, now, session_id)
                )

                logger.debug(f"Session extended for user {user_id}")

            return user_id

    def cleanup_expired_sessions(self):
        """Remove expired sessions from database"""
        with get_db() as conn:
            result = conn.execute(
                """
                DELETE FROM sessions
                WHERE expires_at < ?
                RETURNING id
                """,
                (datetime.now(),)
            )

            deleted_count = len(result.fetchall())

            if deleted_count > 0:
                logger.info(f"Cleaned up {deleted_count} expired sessions")

    def invalidate_session(self, session_id: str):
        """Explicitly invalidate a session"""
        with get_db() as conn:
            conn.execute(
                "DELETE FROM sessions WHERE id = ?",
                (session_id,)
            )

        logger.info(f"Session {session_id} invalidated")

    def get_active_sessions(self, user_id: str) -> list:
        """Get all active sessions for a user"""
        with get_db() as conn:
            result = conn.execute(
                """
                SELECT id, created_at, last_activity, expires_at
                FROM sessions
                WHERE user_id = ? AND expires_at > ?
                ORDER BY last_activity DESC
                """,
                (user_id, datetime.now())
            )

            return [dict(row) for row in result]

# Session middleware
@app.before_request
def check_session():
    """Check and extend session on each request"""
    session_id = request.cookies.get('session_id')

    if session_id:
        user_id = session_manager.validate_and_extend(session_id)

        if user_id:
            g.user_id = user_id
            g.authenticated = True
        else:
            # Clear invalid session cookie
            g.clear_session = True
            g.authenticated = False
    else:
        g.authenticated = False

@app.after_request
def update_session_cookie(response):
    """Update session cookie if needed"""
    if hasattr(g, 'clear_session') and g.clear_session:
        response.set_cookie(
            'session_id',
            '',
            expires=0,
            secure=config.SESSION_SECURE,
            httponly=True,
            samesite='Lax'
        )

    return response
```

## Testing Strategy

### Test Stability Improvements
```python
# starpunk/testing/stability.py
import pytest
from unittest.mock import patch

@pytest.fixture
def stable_test_env():
    """Provide stable test environment"""
    with patch('time.time', return_value=1234567890):
        with patch('random.choice', side_effect=cycle('abcd')):
            with isolated_test_database() as db:
                yield db

def test_with_stability(stable_test_env):
    """Test with predictable environment"""
    # Time and randomness are now deterministic
    pass
```

### Unicode Test Suite
```python
# starpunk/testing/unicode.py
import pytest

UNICODE_TEST_STRINGS = [
    "Simple ASCII",
    "Émoji 😀🎉🚀",
    "العربية",
    "中文字符",
    "🏳️‍🌈 flags",
    "Math: ∑∏∫",
    "Ñoño",
    "Combining: é (e + ́)",
]

@pytest.mark.parametrize("text", UNICODE_TEST_STRINGS)
def test_unicode_handling(text):
    """Test Unicode handling throughout system"""
    # Test slug generation
    slug = generate_slug(text)
    assert slug  # Should always produce something

    # Test note creation
    note = create_note(content=text)
    assert note.content == text

    # Test search
    results = search_notes(text)
    # Should not crash

    # Test RSS
    feed = generate_rss_feed()
    # Should be valid XML
```

## Performance Testing

### Memory Usage Tests
```python
def test_rss_memory_usage():
    """Test RSS generation memory usage"""
    import tracemalloc

    # Create many notes
    for i in range(10000):
        create_note(content=f"Note {i}")

    # Measure memory for RSS generation
    tracemalloc.start()
    initial = tracemalloc.get_traced_memory()

    feed = generate_rss_feed()

    peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()

    memory_used = (peak[0] - initial[0]) / 1024 / 1024  # MB

    assert memory_used < 10  # Should use less than 10MB
```

## Acceptance Criteria

### Race Condition Fixes
1. ✅ All 10 flaky tests pass consistently
2. ✅ Test isolation properly implemented
3. ✅ Migration locks prevent concurrent execution
4. ✅ Test fixtures properly synchronized

### Unicode Handling
1. ✅ Slug generation handles all Unicode input
2. ✅ Never produces invalid/empty slugs
3. ✅ Emoji and special characters handled gracefully
4. ✅ RTL languages don't break system

### RSS Memory Optimization
1. ✅ Memory usage stays under 10MB for 10,000 notes
2. ✅ Response time under 500ms
3. ✅ Streaming implementation works correctly
4. ✅ Cache invalidation on note changes

### Session Management
1. ✅ Sessions expire after configured timeout
2. ✅ Expired sessions automatically cleaned up
3. ✅ Active sessions properly extended
4. ✅ Session invalidation works correctly

## Risk Mitigation

1. **Test Stability**: Run test suite 100 times to verify
2. **Unicode Compatibility**: Test with real-world data
3. **Memory Leaks**: Monitor long-running instances
4. **Session Security**: Security review of implementation