Monitoring & Health Checks
Proactive monitoring ensures Contract Lucidity delivers consistent performance for your legal team. This guide covers the key metrics to watch, built-in health checks, and integration with enterprise monitoring tools.
Health Check Endpoint
CL exposes a lightweight health check at:
GET /api/health
Response (200 OK):
{
"status": "healthy",
"service": "Contract Lucidity"
}
This endpoint confirms the FastAPI backend is running and responding. It does not verify downstream dependencies (database, Redis, worker). For comprehensive health monitoring, combine this with the component checks described below.
Use /api/health as the health check path for your load balancer or reverse proxy. Configure:
- Interval: 10 seconds
- Timeout: 5 seconds
- Healthy threshold: 2 consecutive successes
- Unhealthy threshold: 3 consecutive failures
What to Monitor
Component Health Matrix
Key Metrics and Alert Thresholds
| Metric | How to Check | Warning Threshold | Critical Threshold |
|---|---|---|---|
| Backend response time | HTTP GET /api/health | > 2 seconds | > 5 seconds |
| Backend availability | HTTP GET /api/health | 1 failure | 3 consecutive failures |
| Worker queue depth | Redis LLEN on Celery queue | > 20 pending tasks | > 50 pending tasks |
| Database connections | pg_stat_activity | > 80% of max_connections | > 95% of max_connections |
| Database disk usage | pg_database_size() | > 80% of volume | > 90% of volume |
| Redis memory | redis-cli INFO memory | > 70% of maxmemory | > 90% of maxmemory |
| Document storage disk | df on /data/storage | > 80% capacity | > 90% capacity |
| Pipeline error rate | Failed documents / total | > 5% over 1 hour | > 15% over 1 hour |
| Average pipeline duration | Time from QUEUED to COMPLETE | > 5 minutes (20-page doc) | > 15 minutes (20-page doc) |
Checking Individual Components
Backend (cl-backend)
# Health check
curl -s https://contractlucidity.com/api/health | jq .
# Response time (should be < 500ms)
curl -o /dev/null -s -w "%{time_total}\n" https://contractlucidity.com/api/health
Worker (cl-worker)
The Celery worker does not expose an HTTP endpoint. Monitor it via:
# Check if the worker process is running
docker exec cl-worker celery -A app.celery_app inspect ping
# List active tasks
docker exec cl-worker celery -A app.celery_app inspect active
# List reserved (queued) tasks
docker exec cl-worker celery -A app.celery_app inspect reserved
# Check worker stats
docker exec cl-worker celery -A app.celery_app inspect stats
Redis Queue Depth
# Check the number of pending tasks in the default Celery queue
docker exec cl-redis redis-cli LLEN celery
# Check Redis memory usage
docker exec cl-redis redis-cli INFO memory | grep used_memory_human
# Check Redis connectivity
docker exec cl-redis redis-cli ping
# Expected: PONG
A consistently growing queue (especially above 50 tasks) indicates the worker cannot keep up with incoming documents. Common causes:
- Worker concurrency too low -- increase
CELERY_CONCURRENCY - AI provider rate limiting -- upgrade your API tier
- Worker crashed -- check
docker logs cl-worker
PostgreSQL Database
# Check active connections
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT count(*) FROM pg_stat_activity WHERE state = 'active';"
# Check database size
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT pg_size_pretty(pg_database_size('contract_lucidity'));"
# Check for long-running queries (> 60 seconds)
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active' AND now() - pg_stat_activity.query_start > interval '60 seconds';"
# Check replication lag (if using replicas)
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT * FROM pg_stat_replication;"
Docker Container Health
# Overview of all CL containers
docker ps --filter "name=cl-" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
# Container resource usage
docker stats --no-stream --filter "name=cl-"
# Check container logs (last 100 lines)
docker logs --tail 100 cl-backend
docker logs --tail 100 cl-worker
docker logs --tail 100 cl-frontend
Pipeline Processing Monitoring
Document Pipeline Stages
Each document progresses through these stages:
Version-Aware Pipeline (Subsequent Versions)
When a revised version of a document is uploaded (parent_document_id is set), the pipeline runs a different flow at the report generation stage. The extraction, classification, and embedding stages are the same, but the analysis and reporting use a version-aware process:
Gold stages indicate where the version-aware logic differs from the standard pipeline. During STORING, the system:
- Runs a programmatic text diff against the previous version
- For each v1 clause: checks if the diff overlaps — if no, copies forward (no AI); if yes, AI validates
- Assembles the final report from validated clauses
- Carries forward obligations, contract data, executive summary, and negotiation strategy
Key differences from initial ingestion:
| Aspect | Initial (v1) | Subsequent (v2+) |
|---|---|---|
| Clause analysis | AI analyzes from scratch | Per-clause validation against diff — unchanged clauses copied forward |
| Report generation | Full AI analysis | Diff-gated: only re-evaluate impacted clauses |
| Executive summary | AI generates fresh | Carried forward unless clause changes warrant update |
| Negotiation strategy | AI generates fresh | Carried forward from v1 |
| Contract data | AI extracts all fields | Carry forward, re-extract only fields touched by diff |
| Obligations | AI extracts all | Carry forward all (including custom), add genuinely new only |
| AI cost | Full (all clauses analyzed) | Reduced 70%+ (only impacted clauses use AI) |
If the raw text sizes between v1 and v2 differ by more than 2x (text extraction artifact, not actual content change), the per-clause validation is skipped entirely and all clauses are carried forward unchanged.
Querying Pipeline Status
# Count documents by pipeline status
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT pipeline_status, count(*) FROM documents GROUP BY pipeline_status ORDER BY count DESC;"
# Find stuck documents (in non-terminal state for > 30 minutes)
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT id, original_filename, pipeline_status, created_at
FROM documents
WHERE pipeline_status NOT IN ('complete', 'failed')
AND created_at < now() - interval '30 minutes'
ORDER BY created_at;"
# Recent failures with error details
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT id, original_filename, failed_at_stage, error_message, created_at
FROM documents
WHERE pipeline_status = 'failed'
ORDER BY created_at DESC
LIMIT 10;"
Recommended Monitoring Tools
Uptime Kuma (Self-Hosted, Free)
Ideal for small to mid-size deployments. Monitors HTTP endpoints and sends alerts via email, Slack, Teams, Discord, etc.
Setup:
- Deploy Uptime Kuma alongside CL (e.g., on the same Docker host)
- Add a monitor for
https://contractlucidity.com/api/health - Set check interval to 60 seconds
- Configure notification channels
Grafana + Prometheus (Self-Hosted)
For comprehensive dashboards combining application metrics, container stats, and database performance.
Key Dashboards to Create:
- Pipeline throughput (documents processed per hour)
- Average processing time by document classification
- AI provider token consumption and costs
- Database connection pool utilisation
- Redis queue depth over time
Cloud Provider Monitoring
| Platform | Service | Best For |
|---|---|---|
| AWS | CloudWatch | Container metrics, custom alarms, log aggregation |
| Azure | Azure Monitor + Application Insights | End-to-end tracing, smart alerts |
| GCP | Cloud Monitoring + Cloud Logging | Uptime checks, log-based metrics |
Alerting Recommendations
Configure alerts for these scenarios:
| Alert | Channel | Priority |
|---|---|---|
| Backend health check fails (3x) | Slack + Email | Critical |
| Worker not responding to ping | Slack + Email | Critical |
| Queue depth > 50 | Slack | Warning |
| Pipeline error rate > 10% (1h) | Warning | |
| Database disk > 80% | Warning | |
| Database disk > 90% | Slack + Email + PagerDuty | Critical |
| No documents processed in 24h (weekday) | Informational |
Log Aggregation
Structured Logging
CL uses Python's standard logging module. The backend and worker both output structured log messages:
2026-03-19 10:30:15 INFO Document abc-123: status -> classifying
2026-03-19 10:30:18 INFO Document abc-123: classified as master_services_agreement
2026-03-19 10:30:45 INFO Document abc-123: extracted 23 clauses
2026-03-19 10:31:02 INFO Document abc-123: embedded 28 chunks
2026-03-19 10:31:15 INFO Document abc-123: pipeline COMPLETE
Centralised Log Collection
For production deployments, forward Docker logs to a centralised logging system:
# Docker logging driver (add to docker-compose.yml)
# Example: forward to Loki, Fluentd, or CloudWatch
services:
cl-backend:
logging:
driver: "json-file"
options:
max-size: "50m"
max-file: "5"
Set the LOG_LEVEL environment variable in .env to control verbosity:
INFO(default) -- pipeline progress, key eventsWARNING-- only errors and warningsDEBUG-- verbose output including AI prompts and playbook context (do not use in production -- may log sensitive contract text)