Every time you upload a PDF to an online tool — whether you're merging invoices, compressing contracts, or converting spreadsheets — that document passes through someone else's server. For personal files, that might feel like an acceptable tradeoff. For business documents, legal agreements, medical records, or financial statements, it's a risk you shouldn't need to take. Stirling PDF is an open-source, self-hosted PDF toolkit that puts you in complete control. It offers over 50 tools for manipulating PDF files, all running on your own infrastructure with zero data leaving your server. In this guide, we'll deploy Stirling PDF on an Ubuntu VPS using Docker, set up a secure reverse proxy, configure OCR support, and integrate it into a broader document management workflow.
MassiveGRID Ubuntu VPS includes: Ubuntu 24.04 LTS pre-installed · Proxmox HA cluster with automatic failover · Ceph 3x replicated NVMe storage · Independent CPU/RAM/storage scaling · 12 Tbps DDoS protection · 4 global datacenter locations · 100% uptime SLA · 24/7 human support rated 9.5/10
Deploy a self-managed VPS — from $1.99/mo
Need dedicated resources? — from $19.80/mo
Want fully managed hosting? — we handle everything
Why Self-Host Your PDF Tools
Online PDF services have become ubiquitous. Tools like iLovePDF, Smallpdf, and Adobe's online suite are convenient, but they come with serious drawbacks for anyone handling sensitive documents. When you upload a contract to a third-party service, you're trusting that provider with its contents. You're trusting their data retention policies, their encryption practices, their employee access controls, and their compliance posture. For organizations subject to GDPR, HIPAA, SOC 2, or similar frameworks, this can create compliance violations that are difficult to audit and impossible to fully mitigate.
Self-hosting Stirling PDF eliminates these concerns entirely. Documents never leave your server. There are no external API calls, no telemetry, no analytics, and no data retention by third parties. Every PDF operation — merge, split, compress, convert, OCR — happens locally on your VPS. You control the encryption at rest, the network access policies, and the authentication layer. For legal teams, finance departments, healthcare providers, and any organization that handles confidential documents, this is a meaningful security improvement over cloud-based alternatives.
Beyond privacy, self-hosting also means no subscription fees, no per-document limits, and no watermarks on output files. Stirling PDF is completely free and open source under the MIT license. You pay only for the infrastructure to run it.
Stirling PDF Feature Overview
Stirling PDF is not a single-purpose tool. It's a comprehensive PDF workstation with over 50 distinct operations organized into logical categories. The merge and split tools let you combine multiple PDFs into one document or extract specific page ranges into separate files. The compression engine reduces file sizes without destroying visual quality, which is critical for archiving large document sets.
The conversion tools handle bidirectional transformations between PDF and formats including images (PNG, JPEG, TIFF), office documents (DOCX, XLSX, PPTX), HTML, and Markdown. OCR (optical character recognition) can scan image-based PDFs and extract searchable text, making scanned documents indexable and searchable. You can add digital signatures, watermarks, headers, footers, and page numbers. There are tools for rotating pages, reordering them, removing specific pages, and overlaying content from one PDF onto another.
Security tools let you add or remove password protection, set permissions (prevent printing, copying, or editing), and redact sensitive content. The form tools can flatten interactive forms into static documents or extract form field data. For developers and automation workflows, Stirling PDF exposes a full REST API, meaning every operation available in the web interface can be called programmatically from scripts, CI/CD pipelines, or other applications.
Prerequisites
You'll need an Ubuntu VPS with Docker and Docker Compose installed. A MassiveGRID VPS with 1 vCPU and 1 GB RAM is sufficient for small teams doing occasional PDF operations. If you plan to use OCR heavily or process large batches, start with 2 vCPU and 4 GB RAM instead — OCR is CPU-intensive and benefits from additional cores.
If you haven't installed Docker yet, follow our Docker installation guide for Ubuntu VPS to get Docker Engine and Docker Compose set up properly. You'll also want a domain name pointed at your server's IP address if you plan to access Stirling PDF over HTTPS.
SSH into your server and create a project directory:
mkdir -p ~/stirling-pdf && cd ~/stirling-pdf
Docker Compose Deployment
Stirling PDF ships as a single Docker container that bundles the web interface, the processing engine, and all the underlying libraries (LibreOffice, Tesseract, Ghostscript, etc.). Create a docker-compose.yml file:
version: "3.9"
services:
stirling-pdf:
image: frooodle/s-pdf:latest
container_name: stirling-pdf
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- ./trainingData:/usr/share/tessdata
- ./extraConfigs:/configs
- ./logs:/logs
- ./customFiles:/customFiles
- ./pipeline:/pipeline
environment:
- DOCKER_ENABLE_SECURITY=false
- INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false
- LANGS=en_GB
tmpfs:
- /tmp:size=2G
Start the container:
docker compose up -d
After the image pulls and the container starts, Stirling PDF will be available at http://your-server-ip:8080. The first startup may take a minute as it initializes internal components. The trainingData volume is where OCR language packs are stored. The tmpfs mount ensures temporary files created during PDF processing are stored in RAM and automatically cleaned, which improves both performance and security — processed documents don't linger on disk.
Nginx Reverse Proxy with SSL
Exposing Stirling PDF directly on port 8080 without encryption is acceptable for local testing but not for production use. Set up Nginx as a reverse proxy with TLS termination so all traffic is encrypted. If you haven't configured Nginx yet, our Nginx reverse proxy guide covers the full setup including Let's Encrypt certificates.
The critical addition for Stirling PDF is the client_max_body_size directive. By default, Nginx limits upload sizes to 1 MB, which is far too small for PDF operations. Create or update your Nginx site configuration:
server {
listen 443 ssl http2;
server_name pdf.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/pdf.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/pdf.yourdomain.com/privkey.pem;
client_max_body_size 256M;
proxy_read_timeout 600s;
proxy_send_timeout 600s;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_buffering off;
}
}
The client_max_body_size 256M directive allows uploads up to 256 MB, which accommodates most document workflows. The extended timeout values (proxy_read_timeout and proxy_send_timeout) prevent Nginx from dropping the connection during long-running operations like OCR on large documents or batch conversions. The proxy_buffering off directive ensures progress feedback reaches the browser without delay.
Test and reload Nginx:
sudo nginx -t && sudo systemctl reload nginx
Adding Authentication
Stirling PDF does not include built-in user authentication by default (though it has an optional internal security mode). For production deployments, you should add an authentication layer in front of it. There are two practical approaches.
The first option is Authentik forward authentication. If you're running Authentik as a central identity provider (see our Authentik self-hosting guide), you can configure it as a forward auth provider for Nginx. This gives you SSO, multi-factor authentication, and centralized user management across all your self-hosted services. Add the forward auth directives to your Nginx location block to validate each request against Authentik before proxying to Stirling PDF.
The second option is Nginx basic authentication, which is simpler but less flexible. Generate a password file using htpasswd:
sudo apt install apache2-utils
sudo htpasswd -c /etc/nginx/.htpasswd pdfuser
Then add authentication directives to your Nginx configuration inside the location / block:
auth_basic "Stirling PDF";
auth_basic_user_file /etc/nginx/.htpasswd;
Basic auth is adequate for single-user or small-team deployments. For organizations with multiple users or compliance requirements, Authentik provides a significantly more robust solution with audit logging and access policies.
OCR Language Packs
Stirling PDF uses Tesseract for optical character recognition. By default, only English language data is included. If you need to OCR documents in other languages, download the appropriate Tesseract training data files into the trainingData volume:
# Download German language pack
wget -P ./trainingData https://github.com/tesseract-ocr/tessdata/raw/main/deu.traineddata
# Download French language pack
wget -P ./trainingData https://github.com/tesseract-ocr/tessdata/raw/main/fra.traineddata
# Download Spanish language pack
wget -P ./trainingData https://github.com/tesseract-ocr/tessdata/raw/main/spa.traineddata
Language packs are loaded automatically — no container restart is needed. Each pack is roughly 4-15 MB depending on the language. The full list of available languages is maintained in the Tesseract GitHub repository. For best OCR accuracy on mixed-language documents, install all relevant language packs and let Tesseract auto-detect the language during processing.
File Size Limits and Temporary Storage
PDF processing can be memory-intensive, especially for operations like OCR, image conversion, and merging large document sets. The tmpfs mount in our Docker Compose configuration allocates 2 GB of RAM-backed temporary storage. This is where Stirling PDF stores intermediate files during processing. If you regularly work with very large PDFs (hundreds of megabytes) or batch-process many files simultaneously, increase this allocation:
tmpfs:
- /tmp:size=4G
Keep in mind that tmpfs uses your server's RAM. A server with 4 GB total RAM shouldn't allocate more than 2 GB to tmpfs, or you risk OOM (out of memory) conditions. If your workload demands more temporary space than your RAM allows, switch from tmpfs to a regular volume mount instead:
volumes:
- ./tmp:/tmp
This uses disk-backed storage, which is slower but not constrained by RAM. On a MassiveGRID VPS with NVMe-backed Ceph storage, disk I/O is fast enough that the performance difference is negligible for most workflows. Periodically clean the temporary directory with a cron job to prevent disk usage from growing unbounded:
# Clean temp files older than 1 hour, every 30 minutes
*/30 * * * * find ~/stirling-pdf/tmp -type f -mmin +60 -delete 2>/dev/null
Team Usage and Workflow Considerations
Stirling PDF works well for teams, not just individual users. Because it's a web application, anyone on your network (or with VPN access) can use it through their browser — no software installation required on client machines. This makes it particularly valuable in environments where installing desktop software is restricted or where you need a consistent toolset across different operating systems.
For team deployments, consider enabling Stirling PDF's built-in security mode by setting DOCKER_ENABLE_SECURITY=true in your Docker Compose environment variables. This activates internal user accounts with role-based access, allowing you to create separate credentials for different team members and track usage. Combined with external authentication through Authentik, you get a layered security model where Authentik handles identity verification and Stirling PDF manages internal permissions.
The REST API opens up automation possibilities for teams. Common patterns include: a shared folder where staff drop PDFs that are automatically OCR-processed and filed, automated compression of all outgoing email attachments, batch watermarking of draft documents, and scheduled conversion of legacy document formats. Any scripting language that can make HTTP requests can interact with the Stirling PDF API.
For a small team of 3-5 users doing occasional PDF work, a 1 vCPU / 1 GB VPS handles the load comfortably. Concurrent heavy operations (multiple OCR jobs running simultaneously) will benefit from scaling up to 2+ vCPU and 4 GB RAM. MassiveGRID's independent resource scaling lets you adjust CPU and RAM independently, so you can add processing power without overpaying for storage you don't need.
Integration with Paperless-ngx
If you're building a complete document management stack, Stirling PDF pairs exceptionally well with Paperless-ngx. Paperless-ngx is a self-hosted document management system that automatically OCRs, tags, and organizes incoming documents. The two tools complement each other: use Stirling PDF for active document manipulation (merging, splitting, converting, signing) and Paperless-ngx for long-term storage, search, and retrieval.
A practical workflow looks like this: incoming documents (scanned mail, email attachments, downloaded files) get ingested by Paperless-ngx, which OCRs them and files them automatically based on rules you define. When you need to manipulate a document — merge multiple invoices into a single PDF for a client, redact sensitive information before sharing, or convert a batch of images into a searchable PDF — you use Stirling PDF. The finished document can then be re-ingested into Paperless-ngx for archival.
Both services run well on the same server. Paperless-ngx is relatively lightweight on its own, and combined with Stirling PDF, a 2 vCPU / 4 GB RAM VPS handles both services and their databases without contention. If you add heavy OCR workloads to both services simultaneously, consider separating them onto different servers or scaling up your resources.
Scaling for OCR-Heavy Workloads
OCR is by far the most CPU-intensive operation in Stirling PDF. A single-page OCR job is fast, but batch-processing hundreds of scanned pages — common in legal document review, medical record digitization, or financial auditing — can saturate CPU cores for extended periods. If your workflow regularly involves large OCR batches, a shared VPS may not provide consistent performance, since CPU resources can be affected by neighboring tenants.
A MassiveGRID Dedicated VPS (VDS) gives you guaranteed CPU cores that aren't shared with anyone else. Starting at $19.80/mo, a VDS ensures that your OCR jobs run at full speed regardless of what other customers on the same hardware node are doing. For organizations processing thousands of pages daily, the dedicated CPU allocation makes a measurable difference in throughput and completion times.
Secure Infrastructure Without the Management Overhead
Running Stirling PDF in production means keeping the host OS patched, monitoring Docker container updates, managing TLS certificate renewals, maintaining backup schedules, and responding to security incidents. For organizations where PDF processing is a daily operational requirement — law firms, accounting practices, healthcare providers — the infrastructure management overhead can become a distraction from core business activities.
MassiveGRID's Managed Dedicated Cloud Servers handle all of this for you. The managed service includes OS-level maintenance, security patching, monitoring, backups, and 24/7 support. Your team focuses on using Stirling PDF rather than maintaining the server it runs on. With dedicated resources and a 100% uptime SLA backed by Proxmox HA clustering and Ceph replicated storage, your document processing infrastructure stays available even during hardware failures.
Whether you start with a $1.99/mo self-managed VPS for personal use, scale to a VDS for demanding OCR workloads, or opt for fully managed hosting for team-critical deployments, Stirling PDF gives you a private, capable PDF toolkit that respects your data and runs entirely under your control.