Document Storage
Contract Lucidity stores uploaded documents and generated reports on a shared filesystem. The backend writes files when users upload documents, and the worker reads them during processing. Both services must access the same storage path.
Why Shared Storage Matters
If the backend and worker cannot access the same filesystem at the same path, document processing will silently fail. The worker will not find the uploaded file and the task will error out. There is no fallback.
The STORAGE_PATH environment variable (default: /data/storage) must resolve to the same physical storage on both services. How this is achieved depends on your deployment method.
Storage Options Comparison
| Option | Deployment Method | Shared Access | Durability | Scalability | Cost | Recommended For |
|---|---|---|---|---|---|---|
| Docker named volume | Docker Compose | Same host only | Host disk | Single server | Free | Dev, demos |
| EasyPanel volume + bind mount | EasyPanel | Same host only | Host disk | Single server | Free | Small teams |
| Azure Files | Azure | Multi-instance | Geo-redundant | Unlimited | ~$0.16/GB/mo (Premium) | Azure deployments |
| AWS EFS | AWS | Multi-instance | Multi-AZ | Unlimited | ~$0.30/GB/mo | AWS deployments |
| AWS S3 + FUSE | AWS | Multi-instance | 11 nines | Unlimited | ~$0.023/GB/mo | AWS (cost-optimized) |
| GCP Cloud Storage + FUSE | GCP | Multi-instance | Regional/Multi | Unlimited | ~$0.020/GB/mo | GCP deployments |
| GCP Filestore | GCP | Multi-instance | Regional | Up to 100 TB | ~$0.20/GB/mo | GCP (POSIX-required) |
| NFS share | Any | Multi-instance | Depends | Server-limited | Varies | On-premise, hybrid |
| Local disk | Any single-server | Single container only | No redundancy | Server disk | Free | Not recommended |
Configuration
Set the storage path in your environment variables:
STORAGE_PATH=/data/storage
This is the path inside the container where documents are stored. The actual backing storage depends on your deployment:
| Deployment | Backend Mount | Worker Mount |
|---|---|---|
| Docker Compose | cl-storage volume at /data/storage | Same cl-storage volume at /data/storage |
| EasyPanel | Named volume cl-storage at /data/storage | Bind mount to /etc/easypanel/projects/{project}/cl-backend/volumes/cl-storage at /data/storage |
| AWS ECS | EFS mount at /data/storage | Same EFS mount at /data/storage |
| Azure Container Apps | Azure Files mount at /data/storage | Same Azure Files mount at /data/storage |
| GCP Cloud Run | GCS FUSE mount at /data/storage | Same GCS FUSE mount at /data/storage |
Setup Instructions by Platform
Docker Compose
No special setup needed. The docker-compose.yml defines a shared named volume:
volumes:
cl-storage:
services:
cl-backend:
volumes:
- cl-storage:/data/storage
cl-worker:
volumes:
- cl-storage:/data/storage
Both containers mount the same Docker volume. This only works when both containers run on the same host.
EasyPanel
EasyPanel does not support sharing named volumes between services directly. The workaround:
- cl-backend gets a named volume
cl-storagemounted at/data/storage - cl-worker gets a bind mount pointing to the physical directory where EasyPanel stores the backend's volume:
/etc/easypanel/projects/{project-name}/cl-backend/volumes/cl-storage
For production deployments, mount external storage (Azure Files, NFS, etc.) to the host and bind mount it to both services:
# Both services mount the same host path
Host: /mnt/cl-storage → Container: /data/storage
AWS: Elastic File System (EFS)
EFS provides a fully managed NFS filesystem that can be mounted by multiple ECS Fargate tasks simultaneously.
# Create filesystem
aws efs create-file-system --performance-mode generalPurpose --encrypted
# Create mount targets in each subnet
aws efs create-mount-target --file-system-id <efs-id> --subnet-id <subnet-id> --security-groups <sg-id>
# Create access point
aws efs create-access-point --file-system-id <efs-id> \
--posix-user Uid=1000,Gid=1000 \
--root-directory "Path=/data/storage,CreationInfo={OwnerUid=1000,OwnerGid=1000,Permissions=755}"
In your ECS task definition, add the EFS volume:
{
"volumes": [{
"name": "cl-storage",
"efsVolumeConfiguration": {
"fileSystemId": "<efs-id>",
"transitEncryption": "ENABLED",
"authorizationConfig": {
"accessPointId": "<access-point-id>",
"iam": "ENABLED"
}
}
}],
"containerDefinitions": [{
"mountPoints": [{
"sourceVolume": "cl-storage",
"containerPath": "/data/storage"
}]
}]
}
Azure: Azure Files
Azure Files provides SMB/NFS shares that can be mounted by Azure Container Apps.
# Create storage account and share
az storage account create --name clstorage --resource-group cl-production --sku Premium_LRS --kind FileStorage
az storage share-rm create --storage-account clstorage --name cl-documents --quota 100
# Mount in Container Apps environment
az containerapp env storage set \
--name cl-environment \
--resource-group cl-production \
--storage-name clstorage \
--azure-file-account-name clstorage \
--azure-file-account-key <key> \
--azure-file-share-name cl-documents \
--access-mode ReadWrite
Then in each container app's YAML, reference the volume:
template:
volumes:
- name: cl-storage
storageName: clstorage
storageType: AzureFile
containers:
- volumeMounts:
- volumeName: cl-storage
mountPath: /data/storage
GCP: Cloud Storage (GCS FUSE)
Cloud Run supports mounting Cloud Storage buckets as volumes using GCS FUSE.
# Create bucket
gcloud storage buckets create gs://cl-documents-<project-id> --location=us-central1
# Deploy with volume mount
gcloud run deploy cl-backend \
--add-volume=name=cl-storage,type=cloud-storage,bucket=cl-documents-<project-id> \
--add-volume-mount=volume=cl-storage,mount-path=/data/storage
GCS FUSE translates filesystem operations to Cloud Storage API calls. For Contract Lucidity's usage pattern (write-once, read-many, no random access), performance is excellent. If you need true POSIX semantics (file locking, random writes), use Filestore instead.
NFS (On-Premise / Hybrid)
For on-premise or hybrid deployments, mount an NFS share to the host:
# On the host
sudo mount -t nfs nfs-server:/exports/cl-storage /mnt/cl-storage
# Add to /etc/fstab for persistence
echo "nfs-server:/exports/cl-storage /mnt/cl-storage nfs defaults 0 0" | sudo tee -a /etc/fstab
Then bind mount to containers:
services:
cl-backend:
volumes:
- /mnt/cl-storage:/data/storage
cl-worker:
volumes:
- /mnt/cl-storage:/data/storage
Capacity Planning
Average Document Size
| Content | Average Size |
|---|---|
| Uploaded PDF/DOCX | ~500 KB |
| Extracted text | ~50 KB |
| Analysis report (JSON) | ~25 KB |
| Total per document | ~575 KB |
Sizing Formula
Storage needed = Documents per month x 575 KB x Retention months x 1.2 (overhead)
| Documents/Month | 6 Months | 12 Months | 24 Months |
|---|---|---|---|
| 100 | ~400 MB | ~800 MB | ~1.6 GB |
| 500 | ~2 GB | ~4 GB | ~8 GB |
| 1,000 | ~4 GB | ~8 GB | ~16 GB |
| 5,000 | ~20 GB | ~40 GB | ~80 GB |
| 10,000 | ~40 GB | ~80 GB | ~160 GB |
Cost Projections by Platform
For 1,000 documents/month over 12 months (~8 GB):
| Platform | Storage Type | Monthly Cost |
|---|---|---|
| Docker Compose | Local disk | $0 (included in VPS) |
| EasyPanel | Local disk | $0 (included in VPS) |
| AWS EFS | Elastic File System | ~$2.40 |
| AWS S3 | Standard | ~$0.18 |
| Azure Files | Premium | ~$1.28 |
| GCP Cloud Storage | Standard | ~$0.16 |
| GCP Filestore | Basic HDD | ~$1.60 |
The Critical Warning About Local Disk
When using Docker Compose or EasyPanel without external storage, documents are stored on the server's local disk. This has severe limitations:
- No redundancy -- disk failure = total data loss
- No scaling -- cannot add more servers without storage migration
- Disk fills up -- a busy deployment (500+ docs/month) will fill a typical 40 GB VPS disk in weeks
- No backup -- unless you manually implement backup scripts
Local disk storage is NOT recommended past 15-30 days for any deployment processing more than a handful of documents. Mount external storage (Azure Files, S3, NFS) before going to production.
The deploy script (deploy-easypanel.sh) warns about this:
═══════════════════════════════════════════════════════════════
No external storage configured.
Documents will be stored on the server's local disk.
This WILL fill the disk in a production deployment.
NOT RECOMMENDED PAST 15-30 DAYS MAX.
For production: mount Azure Files / S3 / NFS at /mnt/cl-storage
═══════════════════════════════════════════════════════════════
Monitoring Storage Usage
Docker Compose
# Check volume size
docker system df -v | grep cl-storage
# Check host disk
df -h
EasyPanel
# Check the volume directory
du -sh /etc/easypanel/projects/*/cl-backend/volumes/cl-storage/
AWS EFS
aws cloudwatch get-metric-statistics \
--namespace AWS/EFS \
--metric-name StorageBytes \
--dimensions Name=FileSystemId,Value=<efs-id> \
--start-time $(date -d '-1 day' -u +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 86400 \
--statistics Average
Azure Files
az storage share-rm show \
--resource-group cl-production \
--storage-account clstorage \
--name cl-documents \
--query '{quota:shareQuota,usage:shareUsageBytes}'
GCP Cloud Storage
gcloud storage du --summarize gs://cl-documents-<project-id>
Backup Strategy
Regardless of storage backend, implement regular backups:
| Platform | Backup Method | RPO |
|---|---|---|
| Local disk | rsync to remote or scheduled tar | Manual |
| AWS EFS | EFS-to-EFS backup via AWS Backup | Daily (automatic) |
| AWS S3 | Cross-region replication + versioning | Near real-time |
| Azure Files | Azure Backup for file shares | Daily (automatic) |
| GCP Cloud Storage | Dual-region + object versioning | Near real-time |
| NFS | Snapshot + rsync | Depends on schedule |
For local disk deployments, at minimum set up a daily cron job:
# /etc/cron.daily/cl-backup
#!/bin/bash
BACKUP_DIR="/backups/cl-storage"
SOURCE="/etc/easypanel/projects/contract-lucidity/cl-backend/volumes/cl-storage"
mkdir -p "$BACKUP_DIR"
rsync -a --delete "$SOURCE/" "$BACKUP_DIR/"