Paper documents are a liability. They get lost, damaged, misfiled, and they resist every attempt at efficient search and retrieval. Worse still, uploading scans to cloud services like Google Drive or Dropbox means surrendering control of your most sensitive information — tax records, medical documents, contracts, identity papers — to third-party platforms with opaque data handling practices. Paperless-ngx solves this by giving you a self-hosted, intelligent document management system that ingests physical and digital documents alike, applies optical character recognition, and makes everything instantly searchable from a clean web interface.
Running Paperless-ngx on your own Ubuntu VPS means your documents never leave infrastructure you control. Every page is OCR-processed, tagged, and indexed on your server. You get the convenience of a cloud document manager with the privacy guarantees of self-hosting. In this guide, we will walk through a complete production deployment of Paperless-ngx on an Ubuntu VPS using Docker Compose, including PostgreSQL for metadata, Redis for task queuing, Gotenberg and Tika for advanced document processing, Nginx as a reverse proxy with SSL, and a robust backup strategy to keep your archive safe.
MassiveGRID Ubuntu VPS includes: Ubuntu 24.04 LTS pre-installed · Proxmox HA cluster with automatic failover · Ceph 3x replicated NVMe storage · Independent CPU/RAM/storage scaling · 12 Tbps DDoS protection · 4 global datacenter locations · 100% uptime SLA · 24/7 human support rated 9.5/10
Deploy a self-managed VPS — from $1.99/mo
Need dedicated resources? — from $19.80/mo
Want fully managed hosting? — we handle everything
Why Go Paperless with a Self-Hosted System
The motivations for digitizing your documents go far beyond convenience. A self-hosted paperless system addresses three fundamental concerns that commercial cloud storage cannot.
Privacy and data sovereignty. Financial statements, medical records, legal contracts, and identity documents are among the most sensitive data you possess. When you upload these to a SaaS platform, you accept their terms of service, their data retention policies, and their vulnerability to breaches. With Paperless-ngx on your own VPS, documents are encrypted in transit via SSL, stored on disk you control, and accessible only through credentials you manage. No third party ever touches your data.
OCR-powered search. A scanned PDF sitting in a folder is barely better than paper in a filing cabinet. You cannot search its contents, and finding a specific document means scrolling through filenames you hopefully named well. Paperless-ngx runs every ingested document through Tesseract OCR, extracting the full text and indexing it. Need to find that insurance claim from 2023? Search for the claim number, the provider name, or any phrase from the document. Results appear in milliseconds.
Compliance and audit readiness. Whether you are a freelancer tracking invoices, a small business maintaining employee records, or a household organizing tax documents, having a structured, searchable archive with metadata, timestamps, and tags means you can produce any document on demand. Paperless-ngx assigns document types, correspondents, dates, and custom tags — transforming a pile of scans into an organized, auditable system.
Paperless-ngx Features Overview
Paperless-ngx is the community-maintained successor to the original Paperless project, and it has matured into a remarkably capable document management system. Here is what it brings to the table:
- Optical Character Recognition (OCR) — Tesseract-based OCR supports over 100 languages. Every document is processed on ingest, and the extracted text is stored alongside the original file for full-text search.
- Automatic tagging — Define matching rules (exact, regex, fuzzy, or auto) and Paperless-ngx assigns tags to incoming documents automatically. A document containing "Invoice" in the text can be tagged as
invoicewithout manual intervention. - Correspondents — Assign senders or organizations to documents. Your bank statements automatically get the correspondent "Chase Bank" based on content matching rules.
- Document types — Categorize documents as invoices, receipts, contracts, tax forms, medical records, or any custom type you define.
- Full-text search — Powered by Whoosh, the search engine indexes all OCR-extracted text, metadata, tags, correspondents, and notes. Queries return ranked results with highlighted matches.
- Multi-format support — PDFs, images (JPEG, PNG, TIFF), Office documents (DOCX, XLSX, PPTX), plain text, and more. Gotenberg and Tika extend format support further.
- Consumption workflows — Drop files into a watched folder, upload via the web UI, or configure email polling to automatically ingest attachments.
- Multi-user support — Role-based permissions let you share the system with family members or team members while controlling who sees what.
- REST API — Integrate with other tools, build custom workflows, or automate document uploads from scripts.
Prerequisites
Before deploying Paperless-ngx, you need an Ubuntu VPS with Docker and Docker Compose installed. For a personal document archive, a VPS with 2 vCPU cores and 4 GB RAM provides a solid starting point — OCR processing is CPU-intensive, and having dedicated cores prevents slowdowns during document ingestion.
Ensure your server meets these requirements:
- Ubuntu 24.04 LTS — Fresh installation with root or sudo access
- Docker and Docker Compose — Follow our guide to install Docker on Ubuntu VPS if you have not already set this up
- A domain name — Pointed to your VPS IP address (e.g.,
paperless.yourdomain.com) - At least 20 GB of storage — More if you plan to archive thousands of documents; plan for roughly 1-5 MB per document depending on scan quality
Update your system before proceeding:
sudo apt update && sudo apt upgrade -y
Create a dedicated directory for the Paperless-ngx deployment:
sudo mkdir -p /opt/paperless-ngx
cd /opt/paperless-ngx
Docker Compose Setup
The production deployment of Paperless-ngx involves five services working together: the Paperless-ngx application itself, PostgreSQL for the metadata database, Redis for the task queue and caching layer, Gotenberg for document-to-PDF conversion, and Tika for advanced content extraction from Office documents. Create the Docker Compose file:
sudo nano /opt/paperless-ngx/docker-compose.yml
Add the following configuration:
version: "3.8"
services:
broker:
image: redis:7-alpine
container_name: paperless-redis
restart: unless-stopped
volumes:
- redis_data:/data
networks:
- paperless
db:
image: postgres:16-alpine
container_name: paperless-db
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
networks:
- paperless
gotenberg:
image: gotenberg/gotenberg:8
container_name: paperless-gotenberg
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
networks:
- paperless
tika:
image: ghcr.io/paperless-ngx/tika:latest
container_name: paperless-tika
restart: unless-stopped
networks:
- paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
container_name: paperless-web
restart: unless-stopped
depends_on:
- db
- broker
- gotenberg
- tika
ports:
- "8000:8000"
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- export:/usr/src/paperless/export
- consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_DBNAME: paperless
PAPERLESS_DBUSER: paperless
PAPERLESS_DBPASS: ${POSTGRES_PASSWORD}
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
PAPERLESS_URL: https://${PAPERLESS_DOMAIN}
PAPERLESS_SECRET_KEY: ${PAPERLESS_SECRET_KEY}
PAPERLESS_TIME_ZONE: ${PAPERLESS_TIMEZONE:-UTC}
PAPERLESS_OCR_LANGUAGE: ${PAPERLESS_OCR_LANG:-eng}
PAPERLESS_ADMIN_USER: ${PAPERLESS_ADMIN_USER:-admin}
PAPERLESS_ADMIN_PASSWORD: ${PAPERLESS_ADMIN_PASSWORD}
PAPERLESS_OCR_MODE: skip_noarchive
PAPERLESS_CONSUMER_POLLING: 30
PAPERLESS_TASK_WORKERS: 2
networks:
- paperless
volumes:
data:
media:
export:
consume:
pgdata:
redis_data:
networks:
paperless:
Create the environment file with your credentials:
sudo nano /opt/paperless-ngx/.env
Populate it with secure values:
POSTGRES_PASSWORD=your_strong_database_password_here
PAPERLESS_DOMAIN=paperless.yourdomain.com
PAPERLESS_SECRET_KEY=your_generated_secret_key_here
PAPERLESS_TIMEZONE=America/New_York
PAPERLESS_OCR_LANG=eng
PAPERLESS_ADMIN_USER=admin
PAPERLESS_ADMIN_PASSWORD=your_strong_admin_password_here
Generate the secret key and paste it into the file:
openssl rand -hex 32
Set restrictive permissions on the environment file:
sudo chmod 600 /opt/paperless-ngx/.env
Launch the entire stack:
cd /opt/paperless-ngx
sudo docker compose up -d
Monitor the startup process to ensure all containers come up healthy:
sudo docker compose logs -f
Wait until you see the Paperless-ngx webserver report that it is ready and listening on port 8000. The first startup takes a few minutes as it runs database migrations and initializes the search index.
Nginx Reverse Proxy with SSL
Paperless-ngx handles sensitive documents, so exposing it over plain HTTP is not acceptable. Set up Nginx as a reverse proxy with Let's Encrypt SSL certificates. If you do not already have Nginx configured on your VPS, follow our comprehensive guide to set up Nginx as a reverse proxy on Ubuntu VPS.
Install Nginx and Certbot if needed:
sudo apt install -y nginx certbot python3-certbot-nginx
Create the Nginx site configuration:
sudo nano /etc/nginx/sites-available/paperless
Add the following server block:
server {
listen 80;
server_name paperless.yourdomain.com;
client_max_body_size 50M;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support for live updates
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Enable the site and obtain an SSL certificate:
sudo ln -s /etc/nginx/sites-available/paperless /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
sudo certbot --nginx -d paperless.yourdomain.com
The client_max_body_size 50M directive is important — without it, Nginx will reject document uploads larger than 1 MB. Adjust this value upward if you regularly scan at high resolution or upload large multi-page PDFs.
Initial Configuration
With the stack running and SSL configured, navigate to https://paperless.yourdomain.com in your browser. Log in with the admin credentials you specified in the environment file.
Start by configuring these essential settings from the admin panel:
OCR language settings. If your documents include multiple languages, update the PAPERLESS_OCR_LANGUAGE environment variable to include all relevant language codes separated by plus signs. For example, English and German:
PAPERLESS_OCR_LANGUAGE=eng+deu
After changing environment variables, restart the stack:
cd /opt/paperless-ngx
sudo docker compose down
sudo docker compose up -d
Document types. Create common document types from the web interface under the admin section: Invoice, Receipt, Contract, Tax Document, Medical Record, Insurance, Bank Statement, Identity Document, and any others relevant to your use case.
Correspondents. Add frequently occurring senders — your bank, insurance company, employer, utility providers, and government agencies. Paperless-ngx will also create correspondents automatically as it processes documents if you enable automatic matching.
Tags. Set up a tagging taxonomy. Common tags include years (2024, 2025, 2026), status labels (action-required, archived, pending), and categories (personal, business, household). Tags can use color coding in the interface for visual organization.
Storage paths. By default, Paperless-ngx stores all documents in a flat structure within the media volume. You can configure storage path templates to organize files into folders based on metadata:
PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}
This creates a folder hierarchy like 2026/Chase Bank/January Statement.pdf on disk.
Document Consumption Workflows
Paperless-ngx supports three primary methods for ingesting documents, and you can use all three simultaneously.
Web Upload
The simplest method. Click the upload button in the web interface, drag and drop files, or select them from your file browser. Paperless-ngx queues them for processing, runs OCR, applies matching rules, and adds them to your archive. This works well for documents you receive digitally — email attachments, downloaded statements, exported receipts.
Consume Folder
The consume directory is a watched folder. Any file placed in it is automatically ingested and processed. This is powerful when combined with scanner integrations. If your scanner supports scan-to-folder via SMB or FTP, point it at the consume directory. You can also use tools like rsync or scp to push files from other machines:
# From your local machine, push a scanned document to the consume folder
scp /path/to/scanned-document.pdf user@your-vps:/opt/paperless-ngx/consume/
The consume folder is mapped to a Docker volume. To make it accessible from the host filesystem at a known path, update the volume mapping in docker-compose.yml:
volumes:
- /opt/paperless-ngx/consume:/usr/src/paperless/consume
Email Consumption
Paperless-ngx can poll an email inbox at regular intervals and ingest attachments from matching messages. This is particularly effective for automated workflows — set up a dedicated email address (e.g., docs@yourdomain.com) and forward receipts, invoices, and statements to it. Configure email consumption by adding these environment variables:
PAPERLESS_EMAIL_HOST=imap.yourdomain.com
PAPERLESS_EMAIL_PORT=993
PAPERLESS_EMAIL_HOST_USER=docs@yourdomain.com
PAPERLESS_EMAIL_HOST_PASSWORD=your_email_password
PAPERLESS_EMAIL_SECRET=your_email_secret
You can then create mail rules from the admin interface that filter by sender, subject, or other criteria and apply specific tags, document types, or correspondents to the ingested attachments.
Tagging and Organizing Documents
Effective organization in Paperless-ngx relies on a combination of automatic matching rules and manual curation. The system supports four matching algorithms:
- Exact match — The tag, correspondent, or document type is assigned when the document content contains the exact specified string
- Regular expression — Pattern-based matching for complex rules (e.g., matching invoice numbers with a specific format)
- Fuzzy match — Tolerates minor OCR errors and typos in the matching process
- Auto match — Uses machine learning to learn from your manual assignments and predict classifications for new documents
A practical tagging strategy for a household archive might look like this:
- Create correspondents for each bank, insurer, employer, and government agency you deal with, each with a matching rule based on their letterhead text
- Create document types for invoices, receipts, contracts, tax documents, and medical records
- Use year tags (2024, 2025, 2026) assigned based on the document's detected date
- Add action tags like
needs-review,tax-deductible, andkeep-originalfor workflow management
Over time, the auto-matching algorithm becomes increasingly accurate. After manually classifying 20-30 documents from a particular correspondent, the system reliably assigns the correct correspondent, type, and tags to subsequent documents from the same source.
Bulk Document Import
If you are digitizing an existing paper archive, you may need to ingest hundreds or thousands of documents at once. Bulk import is where resource allocation matters significantly. Tesseract OCR is CPU-intensive — processing a single multi-page document can saturate a CPU core for several seconds, and processing thousands sequentially takes hours.
For bulk ingestion workloads, consider upgrading to a Dedicated VPS with guaranteed CPU resources. Unlike shared VPS plans where CPU is shared among tenants, a Dedicated VPS gives you physical CPU cores that are exclusively yours. When Tesseract runs continuously during a bulk import, you will not be throttled or affected by noisy neighbors. A 4-core Dedicated VPS can run multiple OCR workers in parallel, dramatically reducing import time.
Increase the number of task workers to match your available CPU cores:
PAPERLESS_TASK_WORKERS=4
To bulk-import an existing collection of documents, place them all in the consume directory:
# Copy an entire directory of scanned documents
sudo cp -r /path/to/scanned-archive/* /opt/paperless-ngx/consume/
You can also use the built-in document importer for migrating from another Paperless-ngx instance:
sudo docker compose exec webserver document_importer /usr/src/paperless/export
Monitor the processing queue through the web interface. The dashboard shows active tasks, queued documents, and any processing errors. Documents that fail OCR — typically due to very low scan quality or unsupported formats — are flagged for manual review.
Full-Text Search
Full-text search is arguably the most compelling feature of Paperless-ngx and the primary reason to invest the effort in digitizing your documents. The search engine indexes every word extracted by OCR, along with all metadata fields.
From the search bar in the web interface, you can run queries like:
insurance claim 2025— finds documents containing all three terms"quarterly statement"— exact phrase searchcorrespondent:chase tag:tax-deductible— filter by metadata fieldstype:invoice created:[2025-01 TO 2025-12]— combine type and date range filters
Search results are ranked by relevance, and you can further filter by date range, tags, document type, and correspondent using the sidebar filters. The combination of OCR text indexing and structured metadata means you can locate any document in your archive within seconds, regardless of whether you remember the filename, the date, or even the exact content — a partial phrase is usually enough.
The search index is stored on disk in the data volume. For archives exceeding 10,000 documents, ensure your VPS has sufficient storage and consider allocating more RAM to improve index performance.
Multi-User Access for Teams and Families
Paperless-ngx supports multiple user accounts with granular permissions, making it suitable for shared use in households, small businesses, and teams.
Create additional users from the admin panel. Each user can be configured with:
- View permissions — control which documents, tags, correspondents, and document types a user can see
- Edit permissions — control which objects a user can modify
- Object-level ownership — documents can be assigned to specific users, making them invisible to others without explicit sharing
For a family setup, you might create separate accounts for each household member, with shared tags for household documents (utilities, mortgage, insurance) and private ownership for personal documents (medical, employment). For a small business, create accounts for each department or role, with shared access to company-wide documents and restricted access to HR or financial records.
If you are deploying Paperless-ngx for a team, plan your resource allocation accordingly. Each concurrent user adds to the server load, and simultaneous OCR processing from multiple uploaders compounds CPU demand. A VPS with 4 vCPU and 8 GB RAM comfortably supports 5-10 concurrent users with moderate upload activity.
Backup Strategy
A document archive is only as reliable as your backup strategy. Paperless-ngx stores data across several Docker volumes — the PostgreSQL database, the media files (original and archived documents), the search index, and the application data. All of these must be backed up together to ensure a consistent restore.
Paperless-ngx includes a built-in document exporter that creates a portable backup:
sudo docker compose exec webserver document_exporter /usr/src/paperless/export
This exports all documents, metadata, tags, correspondents, and settings to the export directory. Schedule this as a daily cron job:
sudo crontab -e
Add the following line:
0 2 * * * cd /opt/paperless-ngx && docker compose exec -T webserver document_exporter /usr/src/paperless/export >> /var/log/paperless-backup.log 2>&1
Additionally, back up the PostgreSQL database separately for faster recovery:
sudo docker compose exec db pg_dump -U paperless paperless > /opt/paperless-ngx/backups/db_backup_$(date +%Y%m%d).sql
For comprehensive off-site backup automation, follow our guide to automate backups on Ubuntu VPS. A solid backup strategy includes local snapshots for quick recovery, off-site copies for disaster recovery, and regular test restores to verify backup integrity.
Store backups on a separate volume or remote location — not on the same disk as the production system. MassiveGRID's independent storage scaling lets you add backup storage without affecting your compute resources.
Storage Growth Planning
Document archives grow continuously, and running out of storage on a document management system is disruptive. Plan your storage allocation based on your expected ingestion rate and document characteristics.
Typical storage consumption per document:
- Text-heavy single-page documents (letters, invoices) — 200 KB to 1 MB each
- Scanned multi-page documents (contracts, reports) — 1 MB to 10 MB each
- High-resolution color scans (photographs, certificates) — 5 MB to 20 MB each
Paperless-ngx stores both the original file and an archived (OCR-processed) version, roughly doubling the raw storage requirement. For a household archiving 50 documents per month at an average of 2 MB each, plan for approximately 2.4 GB of new storage per year. A small business processing 500 documents per month needs closer to 24 GB per year.
Monitor your storage usage regularly:
# Check Docker volume sizes
sudo docker system df -v
# Check overall disk usage
df -h /
With MassiveGRID's independent resource scaling, you can increase your VPS storage allocation without changing your CPU or RAM configuration. This means you can start with a cost-effective storage tier and scale up as your archive grows, paying only for the storage you actually use.
Consider implementing a retention policy for documents that do not need to be kept indefinitely. Paperless-ngx does not have built-in retention automation, but you can use tags to flag documents for periodic review and deletion — for example, tagging utility bills older than seven years for removal.
Sensitive Documents Deserve Enterprise-Grade Security
A self-hosted document management system containing tax records, identity documents, medical files, and financial statements is a high-value target. The security of this system must match the sensitivity of its contents. For personal archives on a VPS, you are responsible for server hardening, firewall configuration, SSH key management, unattended security updates, intrusion detection, and backup encryption. That is a significant operational burden — especially for documents where a breach could mean identity theft or financial fraud.
If your document archive contains regulated data (HIPAA-covered medical records, financial documents subject to retention requirements, or client confidential materials), consider deploying on a MassiveGRID Managed Dedicated Cloud Server. With managed hosting, MassiveGRID handles security hardening, automatic patching, firewall management, intrusion detection, encrypted backups, and 24/7 monitoring. You focus on organizing your documents while a team of infrastructure engineers ensures the server hosting them meets enterprise security standards.
For teams and businesses, the managed option also includes proactive monitoring with alerting, so if a disk approaches capacity, a service degrades, or an anomalous access pattern is detected, the support team responds before you even notice. When your documents are irreplaceable and your compliance obligations are real, that level of operational assurance is not a luxury — it is a necessity.
Whether you start with a self-managed VPS for a personal paperless project or deploy a fully managed solution for a business-critical archive, Paperless-ngx on MassiveGRID infrastructure gives you the performance, reliability, and security that your most important documents demand.