Azure OpenAI Service
Azure OpenAI Service provides OpenAI models (GPT-5.4, GPT-5.4 mini, GPT-5.4 nano, embedding models) hosted within Microsoft Azure — now accessed through Microsoft AI Foundry. This is the recommended path for enterprises that require Azure compliance, data residency guarantees, private networking, or Microsoft Entra ID integration.
GPT-5.4 mini and GPT-5.4 nano are now available on Azure via Microsoft Foundry in Standard Global deployment, Data Zone US, and rolling out to Data Zone EU. See the official announcement.
Contract Lucidity treats Azure OpenAI as a variant of the OpenAI provider. Internally, both use the openai Python SDK -- Azure OpenAI simply requires an Azure-specific endpoint URL and API key instead of the standard OpenAI endpoint. The provider key is azure_openai.
When to Use Azure OpenAI
| Requirement | Standard OpenAI | Azure OpenAI |
|---|---|---|
| Data stays in your Azure tenant | No | Yes |
| Private endpoint (VNet integration) | No | Yes |
| Azure AD / Entra ID authentication | No | Yes |
| SOC 2 Type II via your Azure subscription | Shared | Dedicated |
| HIPAA BAA through Azure | No | Yes |
| Content filtering customisation | Limited | Full control |
| Model availability | All models, immediately | Per-region, may lag |
| Setup complexity | Low | Medium-High |
Prerequisites
Before configuring CL, you need:
- An Azure subscription with billing enabled
- Access approval for Azure OpenAI Service (may require an application form at aka.ms/oai/access)
- Permission to create resources in your Azure subscription (Contributor role or higher)
Setting Up Azure OpenAI
Step 1: Create an Azure OpenAI Resource
- Sign in to the Azure Portal
- Click Create a resource
- Search for "Azure OpenAI" and select it
- Click Create and fill in:
- Subscription: Select your Azure subscription
- Resource group: Create new or select existing
- Region: Choose a region that supports your desired models (see model availability by region)
- Name: A unique name (e.g.,
cl-openai-prod) - Pricing tier: Standard S0
- Configure network access:
- All networks (simplest for initial setup)
- Selected networks (recommended for production -- restrict to your CL server's IP)
- Private endpoint (most secure -- requires VNet)
- Click Review + create, then Create
Not all models are available in all Azure regions. GPT-5.4 mini and nano are available in Standard Global and Data Zone US as of March 2026. Check the Azure OpenAI model availability page for current regional support.
Step 2: Deploy Models
After the resource is created:
- Open your Azure OpenAI resource
- Click Model deployments > Manage Deployments (opens Azure AI Studio)
- Click + Deploy model > Deploy base model
- Select the model you want to deploy:
| CL Capability | Recommended Model | Deployment Name Convention | Pricing (per 1M tokens) |
|---|---|---|---|
| Extraction & Classification | GPT-5.4 nano | cl-nano-extract | $0.20 input / $1.25 output |
| Document Understanding | GPT-5.4 mini | cl-mini-understand | $0.75 input / $4.50 output |
| Reasoning | GPT-5.4 | cl-54-reason | $2.50 input / $15.00 output |
| Generation | GPT-5.4 mini | cl-mini-generate | $0.75 input / $4.50 output |
| Embeddings | text-embedding-3-small | cl-embed-small | $0.02 input |
- Set the tokens per minute (TPM) quota for each deployment
- Click Deploy
Azure OpenAI requires you to use the deployment name (not the model name) in API calls. When configuring CL, enter the deployment name as the model name in the AI Capabilities settings.
For example, if you deploy GPT-5.4 nano with the deployment name cl-nano-extract, enter cl-nano-extract as the model in CL's capability mapping.
Step 3: Get Endpoint and API Key
- In the Azure Portal, navigate to your Azure OpenAI resource
- Click Keys and Endpoint in the left sidebar
- Copy:
- KEY 1 (or KEY 2) -- this is your API key
- Endpoint -- the URL (e.g.,
https://cl-openai-prod.openai.azure.com/)
Step 4: Configure in Contract Lucidity
- Navigate to Settings > AI Providers
- Click Add Provider
- Select Azure OpenAI as the provider type
- Enter:
- API Key: The key from Step 3
- Endpoint URL: The endpoint from Step 3
- Click Save & Verify
Then map capabilities as described in Step 2 of the Overview, using your deployment names as the model names.
Architecture
Quota and Scaling
Azure OpenAI uses a Tokens Per Minute (TPM) quota system per deployment. Default quotas vary by model and region.
Recommended Quotas for CL
| Deployment | Min TPM | Recommended TPM | Max Available |
|---|---|---|---|
| GPT-5.4 nano (extraction) | 30K | 120K | 600K+ (varies by region) |
| GPT-5.4 mini (understanding/generation) | 30K | 120K | 600K+ |
| GPT-5.4 (reasoning) | 30K | 80K | 300K+ |
| Embedding model | 120K | 350K | 2M+ |
Increasing Quotas
- In Azure AI Studio, click Quotas in the left sidebar
- Select your deployment
- Click Request quota increase
- Specify the desired TPM
If you are processing many documents simultaneously, prioritise increasing the embedding model's TPM quota. Embedding calls are made in batches of up to 100 texts and can consume quota quickly.
Cost Considerations
Azure OpenAI pricing is generally equivalent to standard OpenAI pricing for the same models, with minor regional variations. Key differences:
| Factor | Standard OpenAI | Azure OpenAI |
|---|---|---|
| Per-token pricing | Same | Same (or very close) |
| Provisioned throughput | Not available | Available (reserved capacity at discount) |
| Billing | Direct to OpenAI | Through your Azure subscription |
| Committed use discounts | No | Yes (Azure reservations) |
| Cost visibility | OpenAI dashboard | Azure Cost Management + tags |
Provisioned Throughput Units (PTU)
For high-volume, predictable workloads, Azure offers Provisioned Throughput -- reserved capacity billed hourly rather than per-token. This can reduce costs by 30-50% for Am Law 100 deployments processing thousands of documents monthly.
Security Best Practices
Network Isolation
For production deployments, restrict network access to your Azure OpenAI resource:
Azure OpenAI Resource > Networking > Firewalls and virtual networks
Options:
- Allow specific IP addresses -- add your CL server's public IP
- Private endpoint -- create a private endpoint in the same VNet as your CL deployment
- Service endpoint -- if CL runs in an Azure VM or AKS within the same VNet
Key Rotation
Azure provides two API keys (KEY 1 and KEY 2) to enable zero-downtime key rotation:
- Update CL to use KEY 2
- Regenerate KEY 1
- (Next rotation) Update CL to use KEY 1
- Regenerate KEY 2
Content Filtering
Azure OpenAI includes built-in content filtering that can be customised per deployment. For legal contract analysis, the default filter settings are generally appropriate. If you encounter false-positive content filtering on legitimate contract language (e.g., indemnification clauses discussing liability for bodily injury), you can adjust filters in Azure AI Studio.
Troubleshooting
| Symptom | Cause | Solution |
|---|---|---|
404 Resource Not Found | Wrong endpoint URL or deployment name | Verify endpoint URL includes trailing /, verify deployment name exactly matches |
401 Access Denied | Invalid API key | Regenerate key in Azure Portal > Keys and Endpoint |
429 Rate Limit Exceeded | TPM quota exceeded | Increase deployment quota or reduce concurrency |
DeploymentNotFound | Deployment name typo in CL config | Use deployment name (not model name) in CL capability mapping |
| Model not available in region | Region limitation | GPT-5.4 mini/nano: Standard Global, Data Zone US. Check model availability |
| Content filter triggered | Default filter blocking legal content | Customise content filter in Azure AI Studio |
| Slow responses | Low TPM allocation | Increase TPM quota; consider Provisioned Throughput for consistent latency |