Skip to main content

Monitoring & Health Checks

Proactive monitoring ensures Contract Lucidity delivers consistent performance for your legal team. This guide covers the key metrics to watch, built-in health checks, and integration with enterprise monitoring tools.

Health Check Endpoint

CL exposes a lightweight health check at:

GET /api/health

Response (200 OK):

{
"status": "healthy",
"service": "Contract Lucidity"
}

This endpoint confirms the FastAPI backend is running and responding. It does not verify downstream dependencies (database, Redis, worker). For comprehensive health monitoring, combine this with the component checks described below.

Load Balancer Configuration

Use /api/health as the health check path for your load balancer or reverse proxy. Configure:

  • Interval: 10 seconds
  • Timeout: 5 seconds
  • Healthy threshold: 2 consecutive successes
  • Unhealthy threshold: 3 consecutive failures

What to Monitor

Component Health Matrix

Key Metrics and Alert Thresholds

MetricHow to CheckWarning ThresholdCritical Threshold
Backend response timeHTTP GET /api/health> 2 seconds> 5 seconds
Backend availabilityHTTP GET /api/health1 failure3 consecutive failures
Worker queue depthRedis LLEN on Celery queue> 20 pending tasks> 50 pending tasks
Database connectionspg_stat_activity> 80% of max_connections> 95% of max_connections
Database disk usagepg_database_size()> 80% of volume> 90% of volume
Redis memoryredis-cli INFO memory> 70% of maxmemory> 90% of maxmemory
Document storage diskdf on /data/storage> 80% capacity> 90% capacity
Pipeline error rateFailed documents / total> 5% over 1 hour> 15% over 1 hour
Average pipeline durationTime from QUEUED to COMPLETE> 5 minutes (20-page doc)> 15 minutes (20-page doc)

Checking Individual Components

Backend (cl-backend)

# Health check
curl -s https://contractlucidity.com/api/health | jq .

# Response time (should be < 500ms)
curl -o /dev/null -s -w "%{time_total}\n" https://contractlucidity.com/api/health

Worker (cl-worker)

The Celery worker does not expose an HTTP endpoint. Monitor it via:

# Check if the worker process is running
docker exec cl-worker celery -A app.celery_app inspect ping

# List active tasks
docker exec cl-worker celery -A app.celery_app inspect active

# List reserved (queued) tasks
docker exec cl-worker celery -A app.celery_app inspect reserved

# Check worker stats
docker exec cl-worker celery -A app.celery_app inspect stats

Redis Queue Depth

# Check the number of pending tasks in the default Celery queue
docker exec cl-redis redis-cli LLEN celery

# Check Redis memory usage
docker exec cl-redis redis-cli INFO memory | grep used_memory_human

# Check Redis connectivity
docker exec cl-redis redis-cli ping
# Expected: PONG
Queue Depth Alerts

A consistently growing queue (especially above 50 tasks) indicates the worker cannot keep up with incoming documents. Common causes:

  1. Worker concurrency too low -- increase CELERY_CONCURRENCY
  2. AI provider rate limiting -- upgrade your API tier
  3. Worker crashed -- check docker logs cl-worker

PostgreSQL Database

# Check active connections
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT count(*) FROM pg_stat_activity WHERE state = 'active';"

# Check database size
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT pg_size_pretty(pg_database_size('contract_lucidity'));"

# Check for long-running queries (> 60 seconds)
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active' AND now() - pg_stat_activity.query_start > interval '60 seconds';"

# Check replication lag (if using replicas)
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT * FROM pg_stat_replication;"

Docker Container Health

# Overview of all CL containers
docker ps --filter "name=cl-" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

# Container resource usage
docker stats --no-stream --filter "name=cl-"

# Check container logs (last 100 lines)
docker logs --tail 100 cl-backend
docker logs --tail 100 cl-worker
docker logs --tail 100 cl-frontend

Pipeline Processing Monitoring

Document Pipeline Stages

Each document progresses through these stages:

Version-Aware Pipeline (Subsequent Versions)

When a revised version of a document is uploaded (parent_document_id is set), the pipeline runs a different flow at the report generation stage. The extraction, classification, and embedding stages are the same, but the analysis and reporting use a version-aware process:

Gold stages indicate where the version-aware logic differs from the standard pipeline. During STORING, the system:

  1. Runs a programmatic text diff against the previous version
  2. For each v1 clause: checks if the diff overlaps — if no, copies forward (no AI); if yes, AI validates
  3. Assembles the final report from validated clauses
  4. Carries forward obligations, contract data, executive summary, and negotiation strategy

Key differences from initial ingestion:

AspectInitial (v1)Subsequent (v2+)
Clause analysisAI analyzes from scratchPer-clause validation against diff — unchanged clauses copied forward
Report generationFull AI analysisDiff-gated: only re-evaluate impacted clauses
Executive summaryAI generates freshCarried forward unless clause changes warrant update
Negotiation strategyAI generates freshCarried forward from v1
Contract dataAI extracts all fieldsCarry forward, re-extract only fields touched by diff
ObligationsAI extracts allCarry forward all (including custom), add genuinely new only
AI costFull (all clauses analyzed)Reduced 70%+ (only impacted clauses use AI)
Size Guard

If the raw text sizes between v1 and v2 differ by more than 2x (text extraction artifact, not actual content change), the per-clause validation is skipped entirely and all clauses are carried forward unchanged.

Querying Pipeline Status

# Count documents by pipeline status
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT pipeline_status, count(*) FROM documents GROUP BY pipeline_status ORDER BY count DESC;"

# Find stuck documents (in non-terminal state for > 30 minutes)
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT id, original_filename, pipeline_status, created_at
FROM documents
WHERE pipeline_status NOT IN ('complete', 'failed')
AND created_at < now() - interval '30 minutes'
ORDER BY created_at;"

# Recent failures with error details
docker exec cl-postgres psql -U cl_user -d contract_lucidity \
-c "SELECT id, original_filename, failed_at_stage, error_message, created_at
FROM documents
WHERE pipeline_status = 'failed'
ORDER BY created_at DESC
LIMIT 10;"

Uptime Kuma (Self-Hosted, Free)

Ideal for small to mid-size deployments. Monitors HTTP endpoints and sends alerts via email, Slack, Teams, Discord, etc.

Setup:

  1. Deploy Uptime Kuma alongside CL (e.g., on the same Docker host)
  2. Add a monitor for https://contractlucidity.com/api/health
  3. Set check interval to 60 seconds
  4. Configure notification channels

Grafana + Prometheus (Self-Hosted)

For comprehensive dashboards combining application metrics, container stats, and database performance.

Key Dashboards to Create:

  • Pipeline throughput (documents processed per hour)
  • Average processing time by document classification
  • AI provider token consumption and costs
  • Database connection pool utilisation
  • Redis queue depth over time

Cloud Provider Monitoring

PlatformServiceBest For
AWSCloudWatchContainer metrics, custom alarms, log aggregation
AzureAzure Monitor + Application InsightsEnd-to-end tracing, smart alerts
GCPCloud Monitoring + Cloud LoggingUptime checks, log-based metrics

Alerting Recommendations

Configure alerts for these scenarios:

AlertChannelPriority
Backend health check fails (3x)Slack + EmailCritical
Worker not responding to pingSlack + EmailCritical
Queue depth > 50SlackWarning
Pipeline error rate > 10% (1h)EmailWarning
Database disk > 80%EmailWarning
Database disk > 90%Slack + Email + PagerDutyCritical
No documents processed in 24h (weekday)EmailInformational

Log Aggregation

Structured Logging

CL uses Python's standard logging module. The backend and worker both output structured log messages:

2026-03-19 10:30:15 INFO  Document abc-123: status -> classifying
2026-03-19 10:30:18 INFO Document abc-123: classified as master_services_agreement
2026-03-19 10:30:45 INFO Document abc-123: extracted 23 clauses
2026-03-19 10:31:02 INFO Document abc-123: embedded 28 chunks
2026-03-19 10:31:15 INFO Document abc-123: pipeline COMPLETE

Centralised Log Collection

For production deployments, forward Docker logs to a centralised logging system:

# Docker logging driver (add to docker-compose.yml)
# Example: forward to Loki, Fluentd, or CloudWatch
services:
cl-backend:
logging:
driver: "json-file"
options:
max-size: "50m"
max-file: "5"
info

Set the LOG_LEVEL environment variable in .env to control verbosity:

  • INFO (default) -- pipeline progress, key events
  • WARNING -- only errors and warnings
  • DEBUG -- verbose output including AI prompts and playbook context (do not use in production -- may log sensitive contract text)