A single-server Dokploy installation handles most workloads well. But at a certain scale, you hit the ceiling: builds compete with production traffic for CPU, a single point of failure means any hardware issue takes everything offline, and geographic distribution is impossible when all your containers run on one machine.
Dokploy supports multi-node deployments through Docker Swarm, allowing you to distribute workloads across multiple servers. This guide covers when to go multi-node, how to design your architecture, the step-by-step Swarm setup, and how to optimize resource allocation across your cluster.
When to Go Multi-Node
Not every Dokploy deployment needs multiple servers. Multi-node adds operational complexity, and that complexity is only justified when you hit specific limitations with a single server. Consider scaling to multiple nodes when:
- Build workloads affect application performance: Docker builds are CPU-intensive and compete with running containers for resources. If your application response times degrade during deployments, isolating builds to a separate server eliminates this contention entirely. See our guide on shared vs. dedicated resources for diagnosing this issue.
- Geographic distribution requirements: You need containers running in multiple regions for latency optimization or data residency compliance. A single-server setup limits you to one data center location.
- High availability requirements: Your SLA demands that a single hardware failure cannot take your applications offline. Multi-node Swarm provides container-level failover across nodes.
- Team or project isolation: Different teams or projects need resource isolation. Swarm node labels let you pin specific services to specific nodes, ensuring one team's workload cannot affect another's.
- Resource specialization: Your workloads have different resource profiles. Database-heavy services need RAM, build pipelines need CPU, and static file serving needs neither. Multi-node lets you optimize each server for its role.
If none of these apply, a single well-provisioned Dedicated VPS or Cloud Dedicated Server is simpler and more cost-effective. Do not add complexity before you need it.
Architecture Design
A Dokploy multi-node cluster has two types of nodes:
Manager Node (Dokploy + Swarm Manager)
The manager node runs the Dokploy web UI, the Dokploy API, the internal PostgreSQL database that stores your project configurations, and the Docker Swarm manager process. This is the control plane for your entire deployment infrastructure.
Because the manager node is a single point of failure for your deployment workflow, it should run on your most reliable infrastructure. A MassiveGRID Cloud Dedicated Server with HA is the recommended tier for the manager. The HA layer provides automatic failover: if the underlying hardware fails, the server is automatically migrated to healthy infrastructure. Your Dokploy dashboard, API, and Swarm manager remain available without manual intervention.
Resource requirements for the manager depend on your cluster size. For most setups (up to 10 worker nodes, up to 50 services):
- 4 vCPU (Swarm manager consensus + Dokploy API)
- 8 GB RAM (PostgreSQL + Dokploy + Swarm state)
- 100 GB SSD (Dokploy database, Docker images, logs)
Worker Nodes
Worker nodes run your application containers, databases, and build processes. They receive instructions from the Swarm manager and execute them. Workers do not need Dokploy installed -- they only need Docker with Swarm mode enabled.
The infrastructure tier for workers depends on the workload:
- Production application serving: VDS (Dedicated VPS) for guaranteed CPU performance and consistent response times
- Build servers: VDS with high CPU allocation for fast, predictable builds
- Development/staging: Cloud VPS for cost-effective shared resources
- Database nodes: VDS with high RAM allocation for in-memory database performance
Step-by-Step Swarm Setup
This section assumes you have already installed Dokploy on your manager node and have one or more additional servers provisioned for worker nodes.
Step 1: Initialize Swarm on the Manager Node
SSH into your manager node (the server running Dokploy) and initialize the Swarm:
# Replace MANAGER_IP with your manager node's public IP
docker swarm init --advertise-addr MANAGER_IP
This command initializes the current node as a Swarm manager and outputs a join token. The output looks like:
Swarm initialized: current node (abc123def) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-xxxxx-yyyyy MANAGER_IP:2377
Save this join command. You will run it on each worker node.
Step 2: Open Swarm Ports on All Nodes
Docker Swarm requires specific ports for inter-node communication. On every node (manager and workers), open these ports:
# TCP port 2377 - Swarm management (manager only needs inbound)
sudo ufw allow 2377/tcp
# TCP and UDP port 7946 - Node communication
sudo ufw allow 7946/tcp
sudo ufw allow 7946/udp
# UDP port 4789 - Overlay network traffic (VXLAN)
sudo ufw allow 4789/udp
If your nodes are on different networks (different data centers), these ports must be accessible between the nodes. MassiveGRID's internal network allows direct communication between servers in the same data center. For cross-datacenter communication, the public IP addresses are used with the above ports open.
Step 3: Join Worker Nodes to the Swarm
SSH into each worker node and run the join command from Step 1:
# On each worker node
docker swarm join --token SWMTKN-1-xxxxx-yyyyy MANAGER_IP:2377
You should see: This node joined a swarm as a worker.
Step 4: Verify the Cluster
Back on the manager node, verify all nodes are connected:
docker node ls
Expected output:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
abc123def * manager-01 Ready Active Leader
ghi456jkl worker-01 Ready Active
mno789pqr worker-02 Ready Active
All nodes should show Ready status and Active availability.
Step 5: Add Worker Nodes in Dokploy
In the Dokploy web UI, navigate to the Servers section. Add each worker node by its IP address. Dokploy will establish an SSH connection to the worker and configure it for deployments. Once added, you can assign deployments to specific nodes or let Swarm distribute them automatically.
Step 6: Label Nodes for Workload Placement
Use Docker node labels to control which services run on which nodes:
# Label a node for production workloads
docker node update --label-add role=production worker-01
# Label a node for builds
docker node update --label-add role=build worker-02
# Label by location
docker node update --label-add datacenter=frankfurt worker-01
docker node update --label-add datacenter=london worker-02
In your Dokploy service configuration, you can then use placement constraints to pin services to labeled nodes. This ensures your production database only runs on the production-labeled node with high RAM, while builds only run on the build-labeled node with high CPU.
The Dedicated Build Server Pattern
One of the most impactful multi-node patterns for Dokploy is isolating build workloads to a dedicated server. This eliminates the primary single-server pain point: builds degrading production performance.
Dokploy supports custom build servers. Instead of building Docker images on the same node that serves production traffic, Dokploy can SSH into a separate build server, run the build there, push the resulting image to a registry, and then pull and deploy it on the production node.
Build Server Sizing
Docker builds are CPU-bound during compilation and dependency installation, and RAM-bound during layer caching. The ideal build server has:
- High CPU: 6-8 dedicated CPU cores (VDS) for fast compilation. Builds are the quintessential bursty workload -- they spike to 100% CPU, then drop to zero. Dedicated cores ensure no contention during the spike.
- Moderate RAM: 8-16 GB for Docker layer caching and concurrent builds
- Fast storage: SSD for Docker image layer I/O during builds
Production Node Sizing
Without build overhead, production nodes can be optimized purely for serving:
- Moderate CPU: 4 cores is usually sufficient for web serving workloads
- High RAM: 16-32 GB if running databases (PostgreSQL, Redis, MongoDB) alongside application containers
- High storage: SSD sized for database storage and logs
This separation means your production node never experiences CPU spikes from builds, and your build server never competes with database queries for RAM. Each node is optimized for its specific role.
Geographic Distribution
Docker Swarm supports worker nodes in different physical locations. With MassiveGRID's four data center locations (New York, London, Frankfurt, Singapore), you can distribute your Dokploy cluster across continents.
Example: Multi-Region Architecture
| Node | Location | Role | Infrastructure |
|---|---|---|---|
| manager-01 | Frankfurt | Dokploy + Swarm manager | Cloud Dedicated (HA) |
| worker-eu | London | EU production serving | VDS (4 CPU / 16 GB RAM) |
| worker-us | New York | US production serving | VDS (4 CPU / 16 GB RAM) |
| worker-asia | Singapore | Asia production serving | VDS (4 CPU / 8 GB RAM) |
| build-01 | Frankfurt | Dedicated build server | VDS (8 CPU / 8 GB RAM) |
In this architecture, the manager and build server are co-located in Frankfurt for low-latency communication during deployments. Production workers are distributed globally, each labeled with their datacenter location. When deploying, Dokploy builds the image on build-01, pushes it to a registry, and Swarm distributes it to the appropriate regional workers based on placement constraints.
Cross-Region Considerations
Swarm's management traffic (heartbeats, task scheduling, service updates) travels between the manager and all workers. For cross-datacenter deployments, this traffic traverses the public internet. The bandwidth requirements are minimal (kilobytes per second for management), but latency affects how quickly Swarm detects node failures and reschedules containers.
Key considerations:
- Heartbeat interval: Swarm's default heartbeat is 5 seconds. For cross-region setups, this is usually fine. Swarm will declare a node as down after missing heartbeats for 5 seconds (default).
- Image distribution: Use a container registry (Docker Hub, GitHub Container Registry, or a self-hosted registry) rather than relying on Swarm's built-in image distribution. The registry provides a pull-based model that handles high-latency connections better.
- Database placement: Databases should run in a single region to avoid cross-region replication latency. Place your primary database on the worker closest to the majority of your users, and use application-level caching (Redis) in other regions.
Independent Scaling Across the Cluster
One of the primary advantages of running Dokploy on MassiveGRID's infrastructure is that each Swarm node can be scaled independently. This is particularly powerful in a multi-node setup where different nodes serve different purposes.
Per-Node Scaling Examples
Database node needs more RAM: Your PostgreSQL container on worker-eu is hitting memory limits as your dataset grows. Scale that specific node's RAM from 16 GB to 32 GB without touching CPU or storage. The node stays online; no migration to a different plan required.
Build server needs more CPU: Build times are increasing as your codebase grows. Add 2 more CPU cores to build-01, bringing it from 8 to 10 cores. Your production workers remain unchanged.
New region launch: You are expanding to serve Asian users. Provision a new VDS in Singapore with 4 CPU and 8 GB RAM, join it to the Swarm, label it appropriately, and deploy your application services to it. The existing cluster is unaffected.
With fixed-tier providers, scaling one node means upgrading to the next plan and paying for resources you do not need. With independent scaling, you pay for exactly the resource each node requires.
Monitoring Across Nodes
Dokploy's built-in monitoring dashboard provides per-server metrics including CPU utilization, RAM usage, disk I/O, and network throughput. In a multi-node setup, you can view these metrics for each node individually through the Dokploy UI.
For deeper visibility, consider deploying a monitoring stack to your Swarm cluster:
- Prometheus for metrics collection from all nodes (deploy as a Swarm service with global mode to run on every node)
- Grafana for dashboards and alerting (deploy to the manager node)
- cAdvisor for container-level resource metrics (deploy as a global service)
Deploy the monitoring stack through Dokploy itself as a Docker Compose project. This gives you a unified view of resource utilization across all nodes, helping you identify which specific node and resource needs scaling.
Key Metrics to Watch
- CPU utilization per node: Build nodes should spike during builds and idle between them. Production nodes should show steady utilization below 70%. Sustained 100% on any node indicates a need for more CPU.
- RAM utilization per node: Database nodes should show stable, high utilization (databases use available RAM for caching). Swap usage on any node is a red flag -- scale RAM immediately.
- Swarm service health: Track the number of running vs. desired replicas for each service. A mismatch indicates scheduling failures, usually due to resource constraints on target nodes.
- Cross-node network latency: For geographically distributed clusters, monitor latency between the manager and each worker. Sustained latency above 200ms can cause Swarm management issues.
MassiveGRID for Dokploy
- Cloud Dedicated with HA — Managed, high-availability infrastructure for your Dokploy manager node. 100% uptime SLA with automatic failover
- Dedicated VPS (VDS) — Guaranteed physical CPU cores for worker nodes. Zero contention, predictable performance for production and builds
- 4 Global Data Centers — New York, London, Frankfurt, Singapore. Distribute your Swarm cluster across continents
- Independent Resource Scaling — Scale CPU, RAM, and storage per node. Optimize each server for its role in the cluster
- 12 Tbps DDoS Protection — Network-edge mitigation across all data centers, protecting every node in your cluster
Recommended Cluster Configurations
Here are three reference architectures based on scale:
Small Team (2-5 applications)
| Node | Specs | Tier | Role |
|---|---|---|---|
| Manager + Production | 4 CPU / 8 GB / 100 GB | Cloud Dedicated (HA) | Dokploy + apps + databases |
| Build Server | 4 CPU / 4 GB / 50 GB | VDS | Isolated builds only |
Total: 2 nodes. The primary benefit here is build isolation. Production applications no longer compete with Docker builds for CPU.
Growing Team (5-20 applications)
| Node | Specs | Tier | Role |
|---|---|---|---|
| Manager | 4 CPU / 8 GB / 100 GB | Cloud Dedicated (HA) | Dokploy + Swarm manager |
| Production Worker | 6 CPU / 16 GB / 200 GB | VDS | Apps + databases |
| Build Server | 8 CPU / 8 GB / 80 GB | VDS | Builds + staging |
Total: 3 nodes. The manager is dedicated to Dokploy and cluster coordination. Production workloads are isolated from builds. Each node can be scaled independently as needs grow.
Scale-Out (20+ applications, multi-region)
| Node | Specs | Tier | Role |
|---|---|---|---|
| Manager (Frankfurt) | 4 CPU / 16 GB / 100 GB | Cloud Dedicated (HA) | Dokploy + Swarm manager |
| EU Worker (London) | 6 CPU / 32 GB / 300 GB | VDS | EU apps + databases |
| US Worker (New York) | 6 CPU / 32 GB / 300 GB | VDS | US apps + databases |
| Asia Worker (Singapore) | 4 CPU / 16 GB / 200 GB | VDS | Asia apps + caching |
| Build Server (Frankfurt) | 8 CPU / 16 GB / 100 GB | VDS | All builds |
Total: 5 nodes across 4 data centers. Each regional worker is optimized for its local workload. The build server is co-located with the manager for fast communication. All nodes use dedicated CPU (VDS) for predictable performance.
Security in a Multi-Node Cluster
Multi-node deployments expand the attack surface. Each node needs the same security hardening as a single-server setup, plus additional measures for inter-node communication:
- Swarm encryption: Docker Swarm encrypts management traffic by default (TLS mutual authentication between nodes). Verify this is active with
docker info | grep -i encrypt. - Overlay network encryption: Enable encryption on overlay networks used for inter-service communication:
docker network create --opt encrypted --driver overlay my-network. This encrypts all data plane traffic between containers on different nodes. - SSH key management: Dokploy connects to worker nodes via SSH. Use unique SSH keys per node and restrict key access to the Dokploy manager only.
- Firewall rules: Swarm ports (2377, 7946, 4789) should only be accessible from other cluster nodes, not from the public internet. Use UFW rules that restrict source IP addresses.
Wrapping Up
Dokploy's multi-node support through Docker Swarm transforms it from a single-server deployment tool into a distributed infrastructure platform. The key decisions are:
- Manager node: Cloud Dedicated with HA for reliability. This is your control plane -- it must stay available.
- Worker nodes: VDS for production (guaranteed dedicated resources) or Cloud VPS for development and staging (cost-effective shared resources).
- Build isolation: A dedicated build server is the single most impactful multi-node pattern. It eliminates the primary performance complaint for Dokploy deployments.
- Geographic distribution: Place workers in the data centers closest to your users. Use node labels and placement constraints to control service distribution.
- Per-node scaling: Each node in your cluster has independent resource allocation. Scale exactly the resource that is needed, on exactly the node that needs it.
For the initial single-server setup, start with our Dokploy installation guide. If you are experiencing inconsistent build performance, the dedicated build server pattern described above is the most effective solution. And before going to production on any node, follow the security hardening guide to lock down each server in your cluster.