When a Single Server Stops Being Enough

Nextcloud on a single server is remarkably capable. A well-configured instance on an 8 vCPU, 32 GB RAM server can comfortably serve 200-300 active users with responsive file operations, smooth document editing, and reliable background task processing. But somewhere between 300 and 500 concurrent users, you hit walls that no amount of single-server optimization can overcome.

The symptoms are predictable: PHP-FPM workers saturate during peak hours, database connections queue up waiting for locks, file operations slow as the storage subsystem handles thousands of simultaneous reads and writes, and preview generation backlogs grow faster than the cron system can process them. You have already followed every recommendation in our performance tuning guide, and the server is running at peak efficiency. The problem is no longer configuration — it is architecture.

Scaling Nextcloud to 1,000 or more users requires fundamentally rethinking the deployment architecture. Instead of a monolithic server running all components, you need a distributed system with dedicated tiers for each function: application processing, database operations, file storage, caching, and document editing. This guide walks through each architectural decision in detail.

Multi-Node Architecture Overview

An enterprise Nextcloud deployment at the 1,000+ user scale typically consists of five distinct tiers:

                    ┌─────────────────┐
                    │   Load Balancer  │
                    │  (HAProxy/Nginx) │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
        ┌───────────┐ ┌───────────┐ ┌───────────┐
        │ Nextcloud │ │ Nextcloud │ │ Nextcloud │
        │  App #1   │ │  App #2   │ │  App #3   │
        └─────┬─────┘ └─────┬─────┘ └─────┬─────┘
              │              │              │
    ┌─────────┴──────────────┴──────────────┴──────────┐
    │                                                    │
    ▼              ▼                ▼                ▼
┌────────┐  ┌───────────┐  ┌──────────────┐  ┌──────────┐
│ Redis  │  │PostgreSQL │  │  S3/Ceph     │  │Collabora │
│Cluster │  │ Primary + │  │  Object      │  │  Online  │
│        │  │ Replicas  │  │  Storage     │  │  Cluster │
└────────┘  └───────────┘  └──────────────┘  └──────────┘

Each tier is independently scalable — you can add application servers without touching the database, expand storage without modifying the application, and scale document editing capacity independently. Let's examine each tier in depth.

Application Tier: Horizontal Scaling with Nextcloud Servers

How Many Application Servers?

The number of Nextcloud application servers you need depends on your concurrent user count and workload pattern. A single Nextcloud application server with 8 vCPUs and 32 GB RAM can handle approximately 200-300 concurrent users (not total users — concurrent active users during peak hours). For 1,000+ total users with a typical concurrency ratio of 20-30%, you need:

Shared Configuration Requirements

All application servers must share identical Nextcloud configuration. The critical shared elements are:

# config.php — must be identical across all app servers
'config_is_read_only' => true,   // Prevent accidental changes via web UI

// Database connection (all nodes point to the same database)
'dbtype' => 'pgsql',
'dbhost' => 'db-primary.internal:5432',
'dbname' => 'nextcloud',
'dbuser' => 'nextcloud',
'dbpassword' => 'your_secure_password',

// Redis for distributed caching and locking
'memcache.distributed' => '\\OC\\Memcache\\Redis',
'memcache.locking' => '\\OC\\Memcache\\Redis',
'memcache.local' => '\\OC\\Memcache\\APCu',
'redis' => [
    'host' => 'redis-cluster.internal',
    'port' => 6379,
    'password' => 'your_redis_password',
],

// Object storage for files (all nodes access the same storage)
'objectstore' => [
    'class' => '\\OC\\Files\\ObjectStore\\S3',
    'arguments' => [
        'bucket' => 'nextcloud-data',
        'hostname' => 's3.internal',
        'port' => 443,
        'use_ssl' => true,
        'region' => 'us-east-1',
        'key' => 'your_access_key',
        'secret' => 'your_secret_key',
    ],
],

Local APCu vs Distributed Redis

A critical subtlety in multi-node deployments: use APCu for local caching (memcache.local) and Redis for distributed caching (memcache.distributed) and file locking (memcache.locking). APCu is faster than Redis for local lookups (no network round-trip) but cannot share data across servers. Redis is slower per-operation but provides a consistent cache visible to all nodes.

Never use APCu for distributed caching or file locking in a multi-node deployment — it creates cache inconsistencies where different application servers have different views of the data.

Background Job Processing

Nextcloud background tasks (cron jobs) must run on only one application server, or tasks will execute multiple times. Designate a single server as the cron runner:

# On the designated cron server only
*/5 * * * * sudo -u www-data php /var/www/nextcloud/cron.php

For large deployments, consider running background tasks on a dedicated worker server that is not in the load balancer rotation. This prevents heavy background operations (like file scanning or preview generation) from impacting user-facing request processing.

Load Balancing: Sticky Sessions vs Distributed Sessions

The load balancer is the entry point for all user requests. Two session handling strategies are available, each with distinct trade-offs.

Sticky Sessions (Session Affinity)

Sticky sessions route all requests from a given user to the same backend server for the duration of their session. This is simpler to implement but has operational drawbacks:

# HAProxy sticky session configuration
frontend nextcloud_frontend
    bind *:443 ssl crt /etc/ssl/nextcloud.pem
    default_backend nextcloud_backend

backend nextcloud_backend
    balance roundrobin
    cookie SERVERID insert indirect nocache
    server nc1 10.0.1.10:80 check cookie nc1
    server nc2 10.0.1.11:80 check cookie nc2
    server nc3 10.0.1.12:80 check cookie nc3

Drawbacks of sticky sessions:

Distributed Sessions via Redis (Recommended)

The preferred approach stores PHP sessions in Redis, allowing any application server to handle any request from any user:

# php.ini session configuration for Redis
session.save_handler = redis
session.save_path = "tcp://redis-cluster.internal:6379?auth=your_redis_password"
# HAProxy without sticky sessions
backend nextcloud_backend
    balance leastconn
    option httpchk GET /status.php
    http-check expect status 200
    server nc1 10.0.1.10:80 check
    server nc2 10.0.1.11:80 check
    server nc3 10.0.1.12:80 check

With Redis-backed sessions, the load balancer uses leastconn (or roundrobin) without session affinity. Any server can handle any request, load distributes evenly, and server failures are transparent — users are automatically routed to surviving servers without session loss.

Health Checking

Configure the load balancer to check each Nextcloud server's health via the /status.php endpoint. This endpoint returns Nextcloud's operational status without requiring authentication. Remove unhealthy servers from the rotation automatically to prevent users from being routed to failed nodes.

Database Tier: PostgreSQL at Scale

The database is the most common bottleneck in large Nextcloud deployments. Every file operation, share action, user authentication, and metadata lookup generates database queries. At 1,000+ users, the database handles thousands of queries per second.

PostgreSQL Over MariaDB

For large-scale Nextcloud, PostgreSQL is the recommended database engine. It handles concurrent connections more efficiently than MariaDB, provides better query planning for complex joins (which Nextcloud's file cache queries generate), and offers more mature replication options.

Connection Pooling with PgBouncer

PHP creates a new database connection for every request and closes it when the request completes. At high concurrency, this creates connection storms — hundreds of simultaneous connect/disconnect cycles that overwhelm PostgreSQL's connection handling. PgBouncer solves this by pooling connections:

# /etc/pgbouncer/pgbouncer.ini
[databases]
nextcloud = host=127.0.0.1 port=5432 dbname=nextcloud

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

# Pool sizing
pool_mode = transaction
default_pool_size = 100
max_client_conn = 1000
min_pool_size = 20
reserve_pool_size = 20

# Timeouts
server_idle_timeout = 300
client_idle_timeout = 0
query_timeout = 120

Use pool_mode = transaction for Nextcloud — this releases connections back to the pool after each transaction completes, maximizing connection reuse. Application servers connect to PgBouncer (port 6432) instead of PostgreSQL directly (port 5432).

Read Replicas

For read-heavy workloads (which Nextcloud generates — file listings, metadata lookups, share resolution are all reads), PostgreSQL streaming replication to one or more read replicas offloads the primary server:

# Primary server: postgresql.conf
wal_level = replica
max_wal_senders = 10
synchronous_commit = on

# Replica server: recovery signal + primary_conninfo
primary_conninfo = 'host=db-primary.internal port=5432 user=replicator password=rep_password'

Nextcloud does not natively support read/write splitting, but you can achieve it through PgBouncer or a PostgreSQL proxy like Pgpool-II that routes SELECT queries to replicas and writes to the primary. Alternatively, use replicas for reporting, backup, and monitoring queries to reduce load on the primary.

Database Tuning for Scale

# postgresql.conf — tuned for 1,000+ user Nextcloud
# Memory
shared_buffers = 8GB                 # 25% of RAM for a dedicated DB server
effective_cache_size = 24GB          # 75% of RAM
work_mem = 64MB                      # Per-operation sort/hash memory
maintenance_work_mem = 2GB           # Vacuum, index creation

# WAL
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB

# Query Planner
random_page_cost = 1.1               # SSD-optimized (use 4.0 for HDD)
effective_io_concurrency = 200       # SSD-optimized (use 2 for HDD)

# Connections
max_connections = 200                # PgBouncer handles client connections
                                     # Keep this moderate for PostgreSQL

# Vacuum
autovacuum_max_workers = 4
autovacuum_naptime = 30s
autovacuum_vacuum_cost_delay = 2ms

The oc_filecache table is typically the largest and most frequently queried table in Nextcloud's database. Ensure regular VACUUM ANALYZE operations on this table to keep query plans optimal.

Storage Tier: Architecture at Scale

File storage architecture is perhaps the most consequential decision in a multi-node Nextcloud deployment. Three primary options exist, each with distinct scalability characteristics.

Option 1: NFS Shared Filesystem

The simplest multi-node storage approach: an NFS server exports the Nextcloud data directory, and all application servers mount it.

Option 2: S3-Compatible Object Storage (Recommended)

Nextcloud's primary storage can be configured to use S3-compatible object storage (Ceph RADOS Gateway, MinIO, or cloud S3). This is the recommended approach for large deployments. For detailed configuration guidance, see our S3 object storage configuration guide.

# config.php — S3 primary storage
'objectstore' => [
    'class' => '\\OC\\Files\\ObjectStore\\S3',
    'arguments' => [
        'bucket' => 'nextcloud-primary',
        'hostname' => 'ceph-rgw.internal',
        'port' => 443,
        'use_ssl' => true,
        'region' => 'default',
        'key' => 'ACCESS_KEY',
        'secret' => 'SECRET_KEY',
        'use_path_style' => true,
    ],
],

Option 3: GlusterFS Distributed Filesystem

GlusterFS provides a distributed filesystem across multiple servers, presenting a POSIX-compatible mount point to Nextcloud while distributing and replicating data across the cluster.

Tiered Storage: Hot, Warm, and Cold

At scale, not all data has the same access pattern. Files accessed within the last 30 days (hot data) typically represent 20-30% of total storage but generate 80-90% of I/O operations. A tiered storage approach optimizes cost and performance:

Tier Storage Type Access Pattern Cost (relative)
Hot (0-30 days) NVMe SSD Frequent read/write $$$
Warm (30-180 days) SATA SSD or fast HDD Occasional read $$
Cold (180+ days) High-density HDD or archive S3 Rare read, mostly write-once $

With Ceph as the storage backend, you can define CRUSH rules that automatically place data on different storage tiers based on pool configuration. Lifecycle policies move data from hot to warm to cold tiers as it ages, optimizing storage costs without manual intervention.

Cache Tier: Redis at Scale

Redis serves three critical functions in a multi-node Nextcloud deployment: distributed caching, transactional file locking, and session storage. At 1,000+ users, a single Redis instance may become a bottleneck.

Redis Sentinel for High Availability

Redis Sentinel provides automatic failover for Redis. Deploy a primary Redis instance with one or two replicas, monitored by three Sentinel processes:

# redis-sentinel.conf
sentinel monitor nextcloud-redis redis-primary.internal 6379 2
sentinel down-after-milliseconds nextcloud-redis 5000
sentinel failover-timeout nextcloud-redis 60000
sentinel parallel-syncs nextcloud-redis 1
# Nextcloud config.php — Redis Sentinel
'redis' => [
    'seeds' => [
        'redis-sentinel-1.internal:26379',
        'redis-sentinel-2.internal:26379',
        'redis-sentinel-3.internal:26379',
    ],
    'timeout' => 1.5,
    'read_timeout' => 1.5,
    'password' => 'your_redis_password',
],

Redis Memory Sizing

Nextcloud's Redis memory usage scales primarily with the number of active sessions and cached file metadata. As a sizing guideline:

Configure Redis with a maxmemory policy of allkeys-lru (evict least-recently-used keys when memory is full) to prevent out-of-memory crashes:

# redis.conf
maxmemory 8gb
maxmemory-policy allkeys-lru

Collabora Tier: Scaling Document Editing

Collabora Online (or OnlyOffice) handles real-time document editing. It is resource-intensive — each active document editing session consumes 50-200 MB of RAM and measurable CPU. For 1,000+ users, a single Collabora instance is insufficient.

Multi-Server Collabora Deployment

Deploy multiple Collabora containers behind a load balancer. Collabora supports clustering natively with its HAProxy-compatible health check endpoint:

# HAProxy backend for Collabora
backend collabora_backend
    balance leastconn
    option httpchk GET /hosting/capabilities
    server cool1 10.0.2.10:9980 check
    server cool2 10.0.2.11:9980 check
    server cool3 10.0.2.12:9980 check

Each Collabora server should be provisioned with 4-8 vCPUs and 8-16 GB RAM. One Collabora server can typically handle 50-100 simultaneous document editing sessions.

Preview Generation at Scale

Nextcloud generates preview thumbnails for images, documents, and videos. At scale, this becomes a significant workload — uploading 10,000 photos triggers 10,000 preview generation tasks that consume CPU and I/O.

Dedicated Preview Worker

Offload preview generation to a dedicated worker server that is not in the load balancer rotation. The worker runs Nextcloud's cron process and preview generation commands but does not serve user requests:

# On the preview worker server
# Pre-generate previews for all users
sudo -u www-data php /var/www/nextcloud/occ preview:generate-all

# Run as a continuous service for on-demand preview generation
sudo -u www-data php /var/www/nextcloud/occ preview:pre-generate

Preview Configuration for Performance

# config.php — preview optimization
'enable_previews' => true,
'preview_max_x' => 2048,
'preview_max_y' => 2048,
'preview_max_filesize_image' => 50,    // MB — skip previews for files larger than this
'jpeg_quality' => 60,                  // Lower quality = faster generation, less storage

// Disable preview providers for file types you do not need
'enabledPreviewProviders' => [
    'OC\\Preview\\PNG',
    'OC\\Preview\\JPEG',
    'OC\\Preview\\GIF',
    'OC\\Preview\\HEIC',
    'OC\\Preview\\WebP',
    'OC\\Preview\\MP4',
    'OC\\Preview\\PDF',
],

For very large deployments, consider using Imaginary (a dedicated image processing microservice) as Nextcloud's preview backend. Imaginary can be scaled independently and processes images faster than PHP's native GD/Imagick libraries.

Monitoring Essentials at Scale

Operating a multi-node Nextcloud deployment without comprehensive monitoring is flying blind. You need visibility into every tier to identify bottlenecks before they impact users.

What to Monitor

Component Key Metrics Alert Threshold
Application servers PHP-FPM active/idle workers, request latency, error rate Workers > 80% capacity, latency > 2s
Load balancer Request rate, backend health, connection queues Queue depth > 100, backend failures
PostgreSQL Connections, query time, cache hit ratio, replication lag Cache hit < 95%, replication lag > 10s
Redis Memory usage, connected clients, hit ratio, evictions Memory > 80%, evictions/sec > 100
Storage IOPS, throughput, latency, capacity Latency > 10ms (SSD), capacity > 85%
Collabora Active documents, memory usage, CPU Memory > 80%, documents per server > 80

Deploy Prometheus with exporters for each component (Node Exporter, PHP-FPM Exporter, PostgreSQL Exporter, Redis Exporter) and Grafana for visualization. A dedicated monitoring server ensures you have visibility even when application components are under stress. For a complete walkthrough of setting up monitoring, see our companion guide on Nextcloud monitoring with Prometheus and Grafana.

Network Architecture Considerations

At the 1,000+ user scale, internal network traffic between tiers becomes significant. Each user request may generate 5-10 internal network calls (load balancer to app server, app server to database, app server to Redis, app server to storage).

Internal Network Sizing

Network Segmentation

Place each tier on a separate VLAN for security and traffic management:

Deployment Strategy: Phased Scaling

You do not need to deploy the full multi-node architecture from day one. A phased approach lets you scale incrementally as user count grows:

Phase 1 (up to 300 users): Single Server

One server running all components. Follow our installation guide and performance tuning guide. This handles most organizations adequately.

Phase 2 (300-700 users): Separated Database and Cache

Move PostgreSQL and Redis to a dedicated server. The application server now has all CPU and memory available for PHP processing, and the database server can be tuned independently. This typically doubles capacity.

Phase 3 (700-1,500 users): Multi-Node Application + Object Storage

Add a load balancer and 2-3 application servers. Migrate file storage to S3-compatible object storage. Add PgBouncer in front of PostgreSQL. This is the architecture described in this guide.

Phase 4 (1,500+ users): Full Enterprise Architecture

Add PostgreSQL read replicas, Redis Sentinel, dedicated preview workers, and Collabora clusters. Consider multi-region deployment for geographic distribution. At this scale, consider high availability configurations that eliminate single points of failure in every tier.

Infrastructure Requirements Summary

Component 1,000 Users 2,500 Users 5,000 Users
Application servers 3x (8 vCPU, 32 GB) 5x (8 vCPU, 32 GB) 10x (8 vCPU, 32 GB)
Load balancer 1x (2 vCPU, 4 GB) 2x (4 vCPU, 8 GB) — HA pair 2x (4 vCPU, 8 GB) — HA pair
PostgreSQL primary 1x (8 vCPU, 64 GB, NVMe) 1x (16 vCPU, 128 GB, NVMe) 1x (32 vCPU, 256 GB, NVMe)
PostgreSQL replicas 1x (8 vCPU, 64 GB) 2x (8 vCPU, 64 GB) 2x (16 vCPU, 128 GB)
Redis 1x (4 vCPU, 8 GB) 3x Sentinel (4 vCPU, 8 GB) 3x Sentinel (4 vCPU, 16 GB)
Object storage (Ceph) 3x OSD nodes 5x OSD nodes 10x OSD nodes
Collabora 2x (4 vCPU, 8 GB) 3x (8 vCPU, 16 GB) 5x (8 vCPU, 16 GB)
Preview worker 1x (4 vCPU, 16 GB) 2x (8 vCPU, 16 GB) 3x (8 vCPU, 32 GB)
Monitoring 1x (4 vCPU, 8 GB) 1x (4 vCPU, 16 GB) 1x (8 vCPU, 32 GB)

Common Scaling Mistakes

After helping organizations scale Nextcloud deployments, certain mistakes recur frequently:

Next Steps

Scaling Nextcloud to 1,000+ users is an infrastructure engineering challenge, not a software limitation. Nextcloud's architecture supports horizontal scaling across every tier, and organizations worldwide run Nextcloud for tens of thousands of users with excellent performance.

The key is matching your architecture to your scale — not over-engineering from the start, but having a clear scaling path as your organization grows. Start with a single well-tuned server, separate database and cache when you outgrow it, and progress to a full multi-node architecture as your user count demands.

For organizations that need enterprise-scale Nextcloud without building the infrastructure team to support it, contact MassiveGRID to discuss managed Nextcloud deployments. Our infrastructure spans multiple global data centers with the compute, storage, and network capacity to support Nextcloud deployments at any scale — from 50-user teams to 10,000-user enterprises.