Monitoring & Health Checks

Proactive monitoring ensures Contract Lucidity delivers consistent performance for your legal team. This guide covers the key metrics to watch, built-in health checks, and integration with enterprise monitoring tools.

Health Check Endpoint

CL exposes a lightweight health check at:

GET /api/health

Response (200 OK):

{
  "status": "healthy",
  "service": "Contract Lucidity"
}

This endpoint confirms the FastAPI backend is running and responding. It does not verify downstream dependencies (database, Redis, worker). For comprehensive health monitoring, combine this with the component checks described below.

Load Balancer Configuration

Use /api/health as the health check path for your load balancer or reverse proxy. Configure:

Interval: 10 seconds
Timeout: 5 seconds
Healthy threshold: 2 consecutive successes
Unhealthy threshold: 3 consecutive failures

What to Monitor

Component Health Matrix

Key Metrics and Alert Thresholds

Metric	How to Check	Warning Threshold	Critical Threshold
Backend response time	HTTP GET `/api/health`	> 2 seconds	> 5 seconds
Backend availability	HTTP GET `/api/health`	1 failure	3 consecutive failures
Worker queue depth	Redis `LLEN` on Celery queue	> 20 pending tasks	> 50 pending tasks
Database connections	`pg_stat_activity`	> 80% of `max_connections`	> 95% of `max_connections`
Database disk usage	`pg_database_size()`	> 80% of volume	> 90% of volume
Redis memory	`redis-cli INFO memory`	> 70% of `maxmemory`	> 90% of `maxmemory`
Document storage disk	`df` on `/data/storage`	> 80% capacity	> 90% capacity
Pipeline error rate	Failed documents / total	> 5% over 1 hour	> 15% over 1 hour
Average pipeline duration	Time from QUEUED to COMPLETE	> 5 minutes (20-page doc)	> 15 minutes (20-page doc)

Checking Individual Components

Backend (cl-backend)

# Health check
curl -s https://contractlucidity.com/api/health | jq .

# Response time (should be < 500ms)
curl -o /dev/null -s -w "%{time_total}\n" https://contractlucidity.com/api/health

Worker (cl-worker)

The Celery worker does not expose an HTTP endpoint. Monitor it via:

# Check if the worker process is running
docker exec cl-worker celery -A app.celery_app inspect ping

# List active tasks
docker exec cl-worker celery -A app.celery_app inspect active

# List reserved (queued) tasks
docker exec cl-worker celery -A app.celery_app inspect reserved

# Check worker stats
docker exec cl-worker celery -A app.celery_app inspect stats

Redis Queue Depth

# Check the number of pending tasks in the default Celery queue
docker exec cl-redis redis-cli LLEN celery

# Check Redis memory usage
docker exec cl-redis redis-cli INFO memory | grep used_memory_human

# Check Redis connectivity
docker exec cl-redis redis-cli ping
# Expected: PONG

Queue Depth Alerts

A consistently growing queue (especially above 50 tasks) indicates the worker cannot keep up with incoming documents. Common causes:

Worker concurrency too low -- increase CELERY_CONCURRENCY
AI provider rate limiting -- upgrade your API tier
Worker crashed -- check docker logs cl-worker

PostgreSQL Database

# Check active connections
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
  -c "SELECT count(*) FROM pg_stat_activity WHERE state = 'active';"

# Check database size
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
  -c "SELECT pg_size_pretty(pg_database_size('contract_lucidity'));"

# Check for long-running queries (> 60 seconds)
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
  -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query
      FROM pg_stat_activity
      WHERE state = 'active' AND now() - pg_stat_activity.query_start > interval '60 seconds';"

# Check replication lag (if using replicas)
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
  -c "SELECT * FROM pg_stat_replication;"

Docker Container Health

# Overview of all CL containers
docker ps --filter "name=cl-" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

# Container resource usage
docker stats --no-stream --filter "name=cl-"

# Check container logs (last 100 lines)
docker logs --tail 100 cl-backend
docker logs --tail 100 cl-worker
docker logs --tail 100 cl-frontend

Pipeline Processing Monitoring

Document Pipeline Stages

Each document progresses through these stages:

Version-Aware Pipeline (Subsequent Versions)

When a revised version of a document is uploaded (parent_document_id is set), the pipeline runs a different flow at the report generation stage. The extraction, classification, and embedding stages are the same, but the analysis and reporting use a version-aware process:

Gold stages indicate where the version-aware logic differs from the standard pipeline. During STORING, the system:

Runs a programmatic text diff against the previous version
For each v1 clause: checks if the diff overlaps — if no, copies forward (no AI); if yes, AI validates
Assembles the final report from validated clauses
Carries forward obligations, contract data, executive summary, and negotiation strategy

Key differences from initial ingestion:

Aspect	Initial (v1)	Subsequent (v2+)
Clause analysis	AI analyzes from scratch	Per-clause validation against diff — unchanged clauses copied forward
Report generation	Full AI analysis	Diff-gated: only re-evaluate impacted clauses
Executive summary	AI generates fresh	Carried forward unless clause changes warrant update
Negotiation strategy	AI generates fresh	Carried forward from v1
Contract data	AI extracts all fields	Carry forward, re-extract only fields touched by diff
Obligations	AI extracts all	Carry forward all (including custom), add genuinely new only
AI cost	Full (all clauses analyzed)	Reduced 70%+ (only impacted clauses use AI)

Size Guard

If the raw text sizes between v1 and v2 differ by more than 2x (text extraction artifact, not actual content change), the per-clause validation is skipped entirely and all clauses are carried forward unchanged.

Querying Pipeline Status

# Count documents by pipeline status
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
  -c "SELECT pipeline_status, count(*) FROM documents GROUP BY pipeline_status ORDER BY count DESC;"

# Find stuck documents (in non-terminal state for > 30 minutes)
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
  -c "SELECT id, original_filename, pipeline_status, created_at
      FROM documents
      WHERE pipeline_status NOT IN ('complete', 'failed')
      AND created_at < now() - interval '30 minutes'
      ORDER BY created_at;"

# Recent failures with error details
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
  -c "SELECT id, original_filename, failed_at_stage, error_message, created_at
      FROM documents
      WHERE pipeline_status = 'failed'
      ORDER BY created_at DESC
      LIMIT 10;"

Recommended Monitoring Tools

Uptime Kuma (Self-Hosted, Free)

Ideal for small to mid-size deployments. Monitors HTTP endpoints and sends alerts via email, Slack, Teams, Discord, etc.

Setup:

Deploy Uptime Kuma alongside CL (e.g., on the same Docker host)
Add a monitor for https://contractlucidity.com/api/health
Set check interval to 60 seconds
Configure notification channels

Grafana + Prometheus (Self-Hosted)

For comprehensive dashboards combining application metrics, container stats, and database performance.

Key Dashboards to Create:

Pipeline throughput (documents processed per hour)
Average processing time by document classification
AI provider token consumption and costs
Database connection pool utilisation
Redis queue depth over time

Cloud Provider Monitoring

Platform	Service	Best For
AWS	CloudWatch	Container metrics, custom alarms, log aggregation
Azure	Azure Monitor + Application Insights	End-to-end tracing, smart alerts
GCP	Cloud Monitoring + Cloud Logging	Uptime checks, log-based metrics

Alerting Recommendations

Configure alerts for these scenarios:

Alert	Channel	Priority
Backend health check fails (3x)	Slack + Email	Critical
Worker not responding to ping	Slack + Email	Critical
Queue depth > 50	Slack	Warning
Pipeline error rate > 10% (1h)	Email	Warning
Database disk > 80%	Email	Warning
Database disk > 90%	Slack + Email + PagerDuty	Critical
No documents processed in 24h (weekday)	Email	Informational

Log Aggregation

Structured Logging

CL uses Python's standard logging module. The backend and worker both output structured log messages:

2026-03-19 10:30:15 INFO  Document abc-123: status -> classifying
2026-03-19 10:30:18 INFO  Document abc-123: classified as master_services_agreement
2026-03-19 10:30:45 INFO  Document abc-123: extracted 23 clauses
2026-03-19 10:31:02 INFO  Document abc-123: embedded 28 chunks
2026-03-19 10:31:15 INFO  Document abc-123: pipeline COMPLETE

Centralised Log Collection

For production deployments, forward Docker logs to a centralised logging system:

# Docker logging driver (add to docker-compose.yml)
# Example: forward to Loki, Fluentd, or CloudWatch
services:
  cl-backend:
    logging:
      driver: "json-file"
      options:
        max-size: "50m"
        max-file: "5"

info

Set the LOG_LEVEL environment variable in .env to control verbosity:

INFO (default) -- pipeline progress, key events
WARNING -- only errors and warnings
DEBUG -- verbose output including AI prompts and playbook context (do not use in production -- may log sensitive contract text)

Health Check Endpoint​

What to Monitor​

Component Health Matrix​

Key Metrics and Alert Thresholds​

Checking Individual Components​

Backend (cl-backend)​

Worker (cl-worker)​

Redis Queue Depth​

PostgreSQL Database​

Docker Container Health​

Pipeline Processing Monitoring​

Document Pipeline Stages​

Version-Aware Pipeline (Subsequent Versions)​

Querying Pipeline Status​

Recommended Monitoring Tools​

Uptime Kuma (Self-Hosted, Free)​

Grafana + Prometheus (Self-Hosted)​

Cloud Provider Monitoring​

Alerting Recommendations​

Log Aggregation​

Structured Logging​

Centralised Log Collection​