AI Provider Overview

Contract Lucidity uses large language models (LLMs) to power every stage of the document processing pipeline -- from text extraction and classification through clause analysis, embedding generation, and report writing. Rather than locking you into a single vendor, CL provides a capability-based routing layer that lets you assign the best model to each task.

Supported Providers

Provider	Provider Key	Chat Models	Embedding Models
Anthropic	`anthropic`	Claude Opus 4.6, Sonnet 4.6, Haiku 4.5	Voyage AI (via Anthropic key)
OpenAI	`openai`	GPT-5.4, GPT-5.4 mini, GPT-5.4 nano	text-embedding-3-small, text-embedding-3-large
Azure OpenAI	`azure_openai`	Same as OpenAI (deployed in your Azure tenant)	Same as OpenAI

Capability Mapping

CL defines five AI capabilities. Each capability can be independently assigned to a different provider and model, giving you fine-grained control over cost, quality, and compliance.

Capability Definitions

Capability	Enum Value	Pipeline Stage	Purpose
Extraction & Classification	`extraction_classification`	Stage 2 -- Classification	Reads raw text, classifies the document type, extracts parties/dates/jurisdiction
Document Understanding	`document_understanding`	Stage 2b + Stage 3 -- Structured Extraction & Clause Analysis	Extracts structured fields and identifies/categorises every significant clause
Reasoning	`reasoning`	Report generation, AI drafting	Complex multi-step reasoning for risk assessment and contract review reports
Generation	`generation`	AI-assisted drafting, clause suggestions	Generates alternative clause language, summaries, and negotiation recommendations
Embeddings	`embeddings`	Stage 4 -- Chunking & Embedding	Converts text chunks into vector representations for semantic search and cross-document intelligence

How Routing Works

When the pipeline needs an AI call, it queries the ai_capability_mapping table joined to ai_providers:

Looks up the default mapping for the requested capability
Verifies the linked provider is active and has a valid API key
Routes the call to the correct provider SDK (Anthropic Messages API, OpenAI Chat Completions API, or OpenAI Responses API for GPT-5/o-series models)

# Simplified from backend/app/services/ai_client.py
api_key, model_name, provider_name = get_provider_for_capability(db, capability)

If no active provider is configured for a capability, the pipeline raises an AIConfigError and the document is marked as FAILED at the relevant stage with a clear error message directing the administrator to the Settings page.

Configuring AI Providers

Step 1: Add a Provider

Log in to Contract Lucidity as an administrator
Navigate to Settings (gear icon in the sidebar)
Under AI Providers, click Add Provider
Select the provider type (Anthropic, OpenAI, or Azure OpenAI)
Enter your API key
Click Save & Verify -- CL will make a lightweight test call to confirm the key is valid

Step 2: Map Capabilities

In the AI Capabilities section of Settings, you will see all five capabilities listed
For each capability, select:
- The provider to use
- The model to use (e.g., claude-sonnet-4-6, gpt-5.4-mini)
Mark one mapping per capability as the default
Click Save

Recommended Starting Configuration

For most organisations, we recommend:

Capability	Provider	Model	Rationale
Extraction & Classification	Anthropic	`claude-sonnet-4-20250514`	Fast, accurate, cost-effective
Document Understanding	Anthropic	`claude-sonnet-4-20250514`	Strong structured output
Reasoning	Anthropic	`claude-opus-4-20250514`	Best-in-class for complex analysis
Generation	Anthropic	`claude-sonnet-4-20250514`	Good balance of quality and speed
Embeddings	OpenAI	`text-embedding-3-small`	Low cost, high quality for retrieval

Token Usage and Cost Considerations

Understanding Token Consumption

Each document processed through the pipeline consumes tokens across multiple stages:

Estimated Cost per Document

Document Size	Approx. Total Tokens	Cost (Sonnet 4.6)	Cost (GPT-5.4 mini)
5 pages	~50K input / ~5K output	~$0.22	~$0.17
20 pages	~150K input / ~15K output	~$0.67	~$0.52
50 pages	~350K input / ~30K output	~$1.50	~$1.17
100+ pages	~600K input / ~50K output	~$2.55	~$2.00

info

These are estimates. Actual costs vary based on document complexity, clause density, and the specific models chosen. Long documents are processed in 10-page windows for clause analysis, which may result in additional API calls.

Cost Optimisation Strategies

Use smaller models for simple tasks -- GPT-5.4 nano or Claude Haiku for extraction/classification of straightforward documents
Reserve expensive models for reasoning -- Claude Opus 4 or GPT-5 only for complex report generation
Batch processing -- Upload documents during off-peak hours if your provider offers batch discounts (OpenAI Batch API offers 50% savings)
Prompt caching -- Anthropic's prompt caching (automatic for Claude Sonnet 4 and Opus 4) can reduce costs by up to 90% for repeated system prompts
Embedding model choice -- text-embedding-3-small at $0.02/M tokens is 6.5x cheaper than text-embedding-3-large at $0.13/M tokens, with minimal retrieval quality difference for contract text

Pre-flight Validation

Before processing any document, the pipeline performs a pre-flight check that verifies an active AI provider is configured for the EXTRACTION_CLASSIFICATION capability. If this check fails, the document is immediately marked as failed with a descriptive error -- no tokens are consumed.

AI provider not configured or unreachable.
Please go to Settings > AI Capabilities to assign a provider and model.

warning

If you change or deactivate a provider while documents are in the queue, any in-flight documents that reach an AI stage after the change will fail. Those documents can be reprocessed once a new provider is configured.

Next Steps

Anthropic (Claude) -- Setup guide for Anthropic models
OpenAI (GPT) -- Setup guide for OpenAI models
Azure OpenAI Service -- Enterprise Azure deployment guide

Supported Providers​

Capability Mapping​

Capability Definitions​

How Routing Works​

Configuring AI Providers​

Step 1: Add a Provider​

Step 2: Map Capabilities​

Token Usage and Cost Considerations​

Understanding Token Consumption​

Estimated Cost per Document​

Cost Optimisation Strategies​

Pre-flight Validation​

Next Steps​