Skip to main content

AI Provider Overview

Contract Lucidity uses large language models (LLMs) to power every stage of the document processing pipeline -- from text extraction and classification through clause analysis, embedding generation, and report writing. Rather than locking you into a single vendor, CL provides a capability-based routing layer that lets you assign the best model to each task.

Supported Providers

ProviderProvider KeyChat ModelsEmbedding Models
AnthropicanthropicClaude Opus 4.6, Sonnet 4.6, Haiku 4.5Voyage AI (via Anthropic key)
OpenAIopenaiGPT-5.4, GPT-5.4 mini, GPT-5.4 nanotext-embedding-3-small, text-embedding-3-large
Azure OpenAIazure_openaiSame as OpenAI (deployed in your Azure tenant)Same as OpenAI

Capability Mapping

CL defines five AI capabilities. Each capability can be independently assigned to a different provider and model, giving you fine-grained control over cost, quality, and compliance.

Capability Definitions

CapabilityEnum ValuePipeline StagePurpose
Extraction & Classificationextraction_classificationStage 2 -- ClassificationReads raw text, classifies the document type, extracts parties/dates/jurisdiction
Document Understandingdocument_understandingStage 2b + Stage 3 -- Structured Extraction & Clause AnalysisExtracts structured fields and identifies/categorises every significant clause
ReasoningreasoningReport generation, AI draftingComplex multi-step reasoning for risk assessment and contract review reports
GenerationgenerationAI-assisted drafting, clause suggestionsGenerates alternative clause language, summaries, and negotiation recommendations
EmbeddingsembeddingsStage 4 -- Chunking & EmbeddingConverts text chunks into vector representations for semantic search and cross-document intelligence

How Routing Works

When the pipeline needs an AI call, it queries the ai_capability_mapping table joined to ai_providers:

  1. Looks up the default mapping for the requested capability
  2. Verifies the linked provider is active and has a valid API key
  3. Routes the call to the correct provider SDK (Anthropic Messages API, OpenAI Chat Completions API, or OpenAI Responses API for GPT-5/o-series models)
# Simplified from backend/app/services/ai_client.py
api_key, model_name, provider_name = get_provider_for_capability(db, capability)

If no active provider is configured for a capability, the pipeline raises an AIConfigError and the document is marked as FAILED at the relevant stage with a clear error message directing the administrator to the Settings page.

Configuring AI Providers

Step 1: Add a Provider

  1. Log in to Contract Lucidity as an administrator
  2. Navigate to Settings (gear icon in the sidebar)
  3. Under AI Providers, click Add Provider
  4. Select the provider type (Anthropic, OpenAI, or Azure OpenAI)
  5. Enter your API key
  6. Click Save & Verify -- CL will make a lightweight test call to confirm the key is valid

Step 2: Map Capabilities

  1. In the AI Capabilities section of Settings, you will see all five capabilities listed
  2. For each capability, select:
    • The provider to use
    • The model to use (e.g., claude-sonnet-4-6, gpt-5.4-mini)
  3. Mark one mapping per capability as the default
  4. Click Save
Recommended Starting Configuration

For most organisations, we recommend:

CapabilityProviderModelRationale
Extraction & ClassificationAnthropicclaude-sonnet-4-20250514Fast, accurate, cost-effective
Document UnderstandingAnthropicclaude-sonnet-4-20250514Strong structured output
ReasoningAnthropicclaude-opus-4-20250514Best-in-class for complex analysis
GenerationAnthropicclaude-sonnet-4-20250514Good balance of quality and speed
EmbeddingsOpenAItext-embedding-3-smallLow cost, high quality for retrieval

Token Usage and Cost Considerations

Understanding Token Consumption

Each document processed through the pipeline consumes tokens across multiple stages:

Estimated Cost per Document

Document SizeApprox. Total TokensCost (Sonnet 4.6)Cost (GPT-5.4 mini)
5 pages~50K input / ~5K output~$0.22~$0.17
20 pages~150K input / ~15K output~$0.67~$0.52
50 pages~350K input / ~30K output~$1.50~$1.17
100+ pages~600K input / ~50K output~$2.55~$2.00
info

These are estimates. Actual costs vary based on document complexity, clause density, and the specific models chosen. Long documents are processed in 10-page windows for clause analysis, which may result in additional API calls.

Cost Optimisation Strategies

  1. Use smaller models for simple tasks -- GPT-5.4 nano or Claude Haiku for extraction/classification of straightforward documents
  2. Reserve expensive models for reasoning -- Claude Opus 4 or GPT-5 only for complex report generation
  3. Batch processing -- Upload documents during off-peak hours if your provider offers batch discounts (OpenAI Batch API offers 50% savings)
  4. Prompt caching -- Anthropic's prompt caching (automatic for Claude Sonnet 4 and Opus 4) can reduce costs by up to 90% for repeated system prompts
  5. Embedding model choice -- text-embedding-3-small at $0.02/M tokens is 6.5x cheaper than text-embedding-3-large at $0.13/M tokens, with minimal retrieval quality difference for contract text

Pre-flight Validation

Before processing any document, the pipeline performs a pre-flight check that verifies an active AI provider is configured for the EXTRACTION_CLASSIFICATION capability. If this check fails, the document is immediately marked as failed with a descriptive error -- no tokens are consumed.

AI provider not configured or unreachable.
Please go to Settings > AI Capabilities to assign a provider and model.
warning

If you change or deactivate a provider while documents are in the queue, any in-flight documents that reach an AI stage after the change will fail. Those documents can be reprocessed once a new provider is configured.

Next Steps