Skip to main content

Document Storage

Contract Lucidity stores uploaded documents and generated reports on a shared filesystem. The backend writes files when users upload documents, and the worker reads them during processing. Both services must access the same storage path.

Why Shared Storage Matters

Critical Requirement

If the backend and worker cannot access the same filesystem at the same path, document processing will silently fail. The worker will not find the uploaded file and the task will error out. There is no fallback.

The STORAGE_PATH environment variable (default: /data/storage) must resolve to the same physical storage on both services. How this is achieved depends on your deployment method.

Storage Options Comparison

OptionDeployment MethodShared AccessDurabilityScalabilityCostRecommended For
Docker named volumeDocker ComposeSame host onlyHost diskSingle serverFreeDev, demos
EasyPanel volume + bind mountEasyPanelSame host onlyHost diskSingle serverFreeSmall teams
Azure FilesAzureMulti-instanceGeo-redundantUnlimited~$0.16/GB/mo (Premium)Azure deployments
AWS EFSAWSMulti-instanceMulti-AZUnlimited~$0.30/GB/moAWS deployments
AWS S3 + FUSEAWSMulti-instance11 ninesUnlimited~$0.023/GB/moAWS (cost-optimized)
GCP Cloud Storage + FUSEGCPMulti-instanceRegional/MultiUnlimited~$0.020/GB/moGCP deployments
GCP FilestoreGCPMulti-instanceRegionalUp to 100 TB~$0.20/GB/moGCP (POSIX-required)
NFS shareAnyMulti-instanceDependsServer-limitedVariesOn-premise, hybrid
Local diskAny single-serverSingle container onlyNo redundancyServer diskFreeNot recommended

Configuration

Set the storage path in your environment variables:

STORAGE_PATH=/data/storage

This is the path inside the container where documents are stored. The actual backing storage depends on your deployment:

DeploymentBackend MountWorker Mount
Docker Composecl-storage volume at /data/storageSame cl-storage volume at /data/storage
EasyPanelNamed volume cl-storage at /data/storageBind mount to /etc/easypanel/projects/{project}/cl-backend/volumes/cl-storage at /data/storage
AWS ECSEFS mount at /data/storageSame EFS mount at /data/storage
Azure Container AppsAzure Files mount at /data/storageSame Azure Files mount at /data/storage
GCP Cloud RunGCS FUSE mount at /data/storageSame GCS FUSE mount at /data/storage

Setup Instructions by Platform

Docker Compose

No special setup needed. The docker-compose.yml defines a shared named volume:

volumes:
cl-storage:

services:
cl-backend:
volumes:
- cl-storage:/data/storage
cl-worker:
volumes:
- cl-storage:/data/storage

Both containers mount the same Docker volume. This only works when both containers run on the same host.

EasyPanel

EasyPanel does not support sharing named volumes between services directly. The workaround:

  1. cl-backend gets a named volume cl-storage mounted at /data/storage
  2. cl-worker gets a bind mount pointing to the physical directory where EasyPanel stores the backend's volume:
/etc/easypanel/projects/{project-name}/cl-backend/volumes/cl-storage

For production deployments, mount external storage (Azure Files, NFS, etc.) to the host and bind mount it to both services:

# Both services mount the same host path
Host: /mnt/cl-storage → Container: /data/storage

AWS: Elastic File System (EFS)

EFS provides a fully managed NFS filesystem that can be mounted by multiple ECS Fargate tasks simultaneously.

# Create filesystem
aws efs create-file-system --performance-mode generalPurpose --encrypted

# Create mount targets in each subnet
aws efs create-mount-target --file-system-id <efs-id> --subnet-id <subnet-id> --security-groups <sg-id>

# Create access point
aws efs create-access-point --file-system-id <efs-id> \
--posix-user Uid=1000,Gid=1000 \
--root-directory "Path=/data/storage,CreationInfo={OwnerUid=1000,OwnerGid=1000,Permissions=755}"

In your ECS task definition, add the EFS volume:

{
"volumes": [{
"name": "cl-storage",
"efsVolumeConfiguration": {
"fileSystemId": "<efs-id>",
"transitEncryption": "ENABLED",
"authorizationConfig": {
"accessPointId": "<access-point-id>",
"iam": "ENABLED"
}
}
}],
"containerDefinitions": [{
"mountPoints": [{
"sourceVolume": "cl-storage",
"containerPath": "/data/storage"
}]
}]
}

Azure: Azure Files

Azure Files provides SMB/NFS shares that can be mounted by Azure Container Apps.

# Create storage account and share
az storage account create --name clstorage --resource-group cl-production --sku Premium_LRS --kind FileStorage
az storage share-rm create --storage-account clstorage --name cl-documents --quota 100

# Mount in Container Apps environment
az containerapp env storage set \
--name cl-environment \
--resource-group cl-production \
--storage-name clstorage \
--azure-file-account-name clstorage \
--azure-file-account-key <key> \
--azure-file-share-name cl-documents \
--access-mode ReadWrite

Then in each container app's YAML, reference the volume:

template:
volumes:
- name: cl-storage
storageName: clstorage
storageType: AzureFile
containers:
- volumeMounts:
- volumeName: cl-storage
mountPath: /data/storage

GCP: Cloud Storage (GCS FUSE)

Cloud Run supports mounting Cloud Storage buckets as volumes using GCS FUSE.

# Create bucket
gcloud storage buckets create gs://cl-documents-<project-id> --location=us-central1

# Deploy with volume mount
gcloud run deploy cl-backend \
--add-volume=name=cl-storage,type=cloud-storage,bucket=cl-documents-<project-id> \
--add-volume-mount=volume=cl-storage,mount-path=/data/storage
GCS FUSE Performance

GCS FUSE translates filesystem operations to Cloud Storage API calls. For Contract Lucidity's usage pattern (write-once, read-many, no random access), performance is excellent. If you need true POSIX semantics (file locking, random writes), use Filestore instead.

NFS (On-Premise / Hybrid)

For on-premise or hybrid deployments, mount an NFS share to the host:

# On the host
sudo mount -t nfs nfs-server:/exports/cl-storage /mnt/cl-storage

# Add to /etc/fstab for persistence
echo "nfs-server:/exports/cl-storage /mnt/cl-storage nfs defaults 0 0" | sudo tee -a /etc/fstab

Then bind mount to containers:

docker-compose.override.yml
services:
cl-backend:
volumes:
- /mnt/cl-storage:/data/storage
cl-worker:
volumes:
- /mnt/cl-storage:/data/storage

Capacity Planning

Average Document Size

ContentAverage Size
Uploaded PDF/DOCX~500 KB
Extracted text~50 KB
Analysis report (JSON)~25 KB
Total per document~575 KB

Sizing Formula

Storage needed = Documents per month x 575 KB x Retention months x 1.2 (overhead)
Documents/Month6 Months12 Months24 Months
100~400 MB~800 MB~1.6 GB
500~2 GB~4 GB~8 GB
1,000~4 GB~8 GB~16 GB
5,000~20 GB~40 GB~80 GB
10,000~40 GB~80 GB~160 GB

Cost Projections by Platform

For 1,000 documents/month over 12 months (~8 GB):

PlatformStorage TypeMonthly Cost
Docker ComposeLocal disk$0 (included in VPS)
EasyPanelLocal disk$0 (included in VPS)
AWS EFSElastic File System~$2.40
AWS S3Standard~$0.18
Azure FilesPremium~$1.28
GCP Cloud StorageStandard~$0.16
GCP FilestoreBasic HDD~$1.60

The Critical Warning About Local Disk

Local Disk Storage is Not for Production

When using Docker Compose or EasyPanel without external storage, documents are stored on the server's local disk. This has severe limitations:

  1. No redundancy -- disk failure = total data loss
  2. No scaling -- cannot add more servers without storage migration
  3. Disk fills up -- a busy deployment (500+ docs/month) will fill a typical 40 GB VPS disk in weeks
  4. No backup -- unless you manually implement backup scripts

Local disk storage is NOT recommended past 15-30 days for any deployment processing more than a handful of documents. Mount external storage (Azure Files, S3, NFS) before going to production.

The deploy script (deploy-easypanel.sh) warns about this:

═══════════════════════════════════════════════════════════════
No external storage configured.
Documents will be stored on the server's local disk.
This WILL fill the disk in a production deployment.
NOT RECOMMENDED PAST 15-30 DAYS MAX.
For production: mount Azure Files / S3 / NFS at /mnt/cl-storage
═══════════════════════════════════════════════════════════════

Monitoring Storage Usage

Docker Compose

# Check volume size
docker system df -v | grep cl-storage

# Check host disk
df -h

EasyPanel

# Check the volume directory
du -sh /etc/easypanel/projects/*/cl-backend/volumes/cl-storage/

AWS EFS

aws cloudwatch get-metric-statistics \
--namespace AWS/EFS \
--metric-name StorageBytes \
--dimensions Name=FileSystemId,Value=<efs-id> \
--start-time $(date -d '-1 day' -u +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 86400 \
--statistics Average

Azure Files

az storage share-rm show \
--resource-group cl-production \
--storage-account clstorage \
--name cl-documents \
--query '{quota:shareQuota,usage:shareUsageBytes}'

GCP Cloud Storage

gcloud storage du --summarize gs://cl-documents-<project-id>

Backup Strategy

Regardless of storage backend, implement regular backups:

PlatformBackup MethodRPO
Local diskrsync to remote or scheduled tarManual
AWS EFSEFS-to-EFS backup via AWS BackupDaily (automatic)
AWS S3Cross-region replication + versioningNear real-time
Azure FilesAzure Backup for file sharesDaily (automatic)
GCP Cloud StorageDual-region + object versioningNear real-time
NFSSnapshot + rsyncDepends on schedule

For local disk deployments, at minimum set up a daily cron job:

# /etc/cron.daily/cl-backup
#!/bin/bash
BACKUP_DIR="/backups/cl-storage"
SOURCE="/etc/easypanel/projects/contract-lucidity/cl-backend/volumes/cl-storage"
mkdir -p "$BACKUP_DIR"
rsync -a --delete "$SOURCE/" "$BACKUP_DIR/"