Running Dokploy Multi-Node with Docker Swarm on Dedicated Infrastructure

A single-server Dokploy installation handles most workloads well. But at a certain scale, you hit the ceiling: builds compete with production traffic for CPU, a single point of failure means any hardware issue takes everything offline, and geographic distribution is impossible when all your containers run on one machine.

Dokploy supports multi-node deployments through Docker Swarm, allowing you to distribute workloads across multiple servers. This guide covers when to go multi-node, how to design your architecture, the step-by-step Swarm setup, and how to optimize resource allocation across your cluster.

When to Go Multi-Node

Not every Dokploy deployment needs multiple servers. Multi-node adds operational complexity, and that complexity is only justified when you hit specific limitations with a single server. Consider scaling to multiple nodes when:

Build workloads affect application performance: Docker builds are CPU-intensive and compete with running containers for resources. If your application response times degrade during deployments, isolating builds to a separate server eliminates this contention entirely. See our guide on shared vs. dedicated resources for diagnosing this issue.
Geographic distribution requirements: You need containers running in multiple regions for latency optimization or data residency compliance. A single-server setup limits you to one data center location.
High availability requirements: Your SLA demands that a single hardware failure cannot take your applications offline. Multi-node Swarm provides container-level failover across nodes.
Team or project isolation: Different teams or projects need resource isolation. Swarm node labels let you pin specific services to specific nodes, ensuring one team's workload cannot affect another's.
Resource specialization: Your workloads have different resource profiles. Database-heavy services need RAM, build pipelines need CPU, and static file serving needs neither. Multi-node lets you optimize each server for its role.

If none of these apply, a single well-provisioned Dedicated VPS or Cloud Dedicated Server is simpler and more cost-effective. Do not add complexity before you need it.

Architecture Design

A Dokploy multi-node cluster has two types of nodes:

Manager Node (Dokploy + Swarm Manager)

The manager node runs the Dokploy web UI, the Dokploy API, the internal PostgreSQL database that stores your project configurations, and the Docker Swarm manager process. This is the control plane for your entire deployment infrastructure.

Because the manager node is a single point of failure for your deployment workflow, it should run on your most reliable infrastructure. A MassiveGRID Cloud Dedicated Server with HA is the recommended tier for the manager. The HA layer provides automatic failover: if the underlying hardware fails, the server is automatically migrated to healthy infrastructure. Your Dokploy dashboard, API, and Swarm manager remain available without manual intervention.

Resource requirements for the manager depend on your cluster size. For most setups (up to 10 worker nodes, up to 50 services):

4 vCPU (Swarm manager consensus + Dokploy API)
8 GB RAM (PostgreSQL + Dokploy + Swarm state)
100 GB SSD (Dokploy database, Docker images, logs)

Worker Nodes

Worker nodes run your application containers, databases, and build processes. They receive instructions from the Swarm manager and execute them. Workers do not need Dokploy installed -- they only need Docker with Swarm mode enabled.

The infrastructure tier for workers depends on the workload:

Production application serving: VDS (Dedicated VPS) for guaranteed CPU performance and consistent response times
Build servers: VDS with high CPU allocation for fast, predictable builds
Development/staging: Cloud VPS for cost-effective shared resources
Database nodes: VDS with high RAM allocation for in-memory database performance

Step-by-Step Swarm Setup

This section assumes you have already installed Dokploy on your manager node and have one or more additional servers provisioned for worker nodes.

Step 1: Initialize Swarm on the Manager Node

SSH into your manager node (the server running Dokploy) and initialize the Swarm:

# Replace MANAGER_IP with your manager node's public IP
docker swarm init --advertise-addr MANAGER_IP

This command initializes the current node as a Swarm manager and outputs a join token. The output looks like:

Swarm initialized: current node (abc123def) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-xxxxx-yyyyy MANAGER_IP:2377

Save this join command. You will run it on each worker node.

Step 2: Open Swarm Ports on All Nodes

Docker Swarm requires specific ports for inter-node communication. On every node (manager and workers), open these ports:

# TCP port 2377 - Swarm management (manager only needs inbound)
sudo ufw allow 2377/tcp

# TCP and UDP port 7946 - Node communication
sudo ufw allow 7946/tcp
sudo ufw allow 7946/udp

# UDP port 4789 - Overlay network traffic (VXLAN)
sudo ufw allow 4789/udp

If your nodes are on different networks (different data centers), these ports must be accessible between the nodes. MassiveGRID's internal network allows direct communication between servers in the same data center. For cross-datacenter communication, the public IP addresses are used with the above ports open.

Step 3: Join Worker Nodes to the Swarm

SSH into each worker node and run the join command from Step 1:

# On each worker node
docker swarm join --token SWMTKN-1-xxxxx-yyyyy MANAGER_IP:2377

You should see: This node joined a swarm as a worker.

Step 4: Verify the Cluster

Back on the manager node, verify all nodes are connected:

docker node ls

Expected output:

ID                           HOSTNAME      STATUS    AVAILABILITY   MANAGER STATUS
abc123def *                  manager-01    Ready     Active         Leader
ghi456jkl                    worker-01     Ready     Active
mno789pqr                    worker-02     Ready     Active

All nodes should show Ready status and Active availability.

Step 5: Add Worker Nodes in Dokploy

In the Dokploy web UI, navigate to the Servers section. Add each worker node by its IP address. Dokploy will establish an SSH connection to the worker and configure it for deployments. Once added, you can assign deployments to specific nodes or let Swarm distribute them automatically.

Step 6: Label Nodes for Workload Placement

Use Docker node labels to control which services run on which nodes:

# Label a node for production workloads
docker node update --label-add role=production worker-01

# Label a node for builds
docker node update --label-add role=build worker-02

# Label by location
docker node update --label-add datacenter=frankfurt worker-01
docker node update --label-add datacenter=london worker-02

In your Dokploy service configuration, you can then use placement constraints to pin services to labeled nodes. This ensures your production database only runs on the production-labeled node with high RAM, while builds only run on the build-labeled node with high CPU.

The Dedicated Build Server Pattern

One of the most impactful multi-node patterns for Dokploy is isolating build workloads to a dedicated server. This eliminates the primary single-server pain point: builds degrading production performance.

Dokploy supports custom build servers. Instead of building Docker images on the same node that serves production traffic, Dokploy can SSH into a separate build server, run the build there, push the resulting image to a registry, and then pull and deploy it on the production node.

Build Server Sizing

Docker builds are CPU-bound during compilation and dependency installation, and RAM-bound during layer caching. The ideal build server has:

High CPU: 6-8 dedicated CPU cores (VDS) for fast compilation. Builds are the quintessential bursty workload -- they spike to 100% CPU, then drop to zero. Dedicated cores ensure no contention during the spike.
Moderate RAM: 8-16 GB for Docker layer caching and concurrent builds
Fast storage: SSD for Docker image layer I/O during builds

Production Node Sizing

Without build overhead, production nodes can be optimized purely for serving:

Moderate CPU: 4 cores is usually sufficient for web serving workloads
High RAM: 16-32 GB if running databases (PostgreSQL, Redis, MongoDB) alongside application containers
High storage: SSD sized for database storage and logs

This separation means your production node never experiences CPU spikes from builds, and your build server never competes with database queries for RAM. Each node is optimized for its specific role.

Geographic Distribution

Docker Swarm supports worker nodes in different physical locations. With MassiveGRID's four data center locations (New York, London, Frankfurt, Singapore), you can distribute your Dokploy cluster across continents.

Example: Multi-Region Architecture

Node	Location	Role	Infrastructure
manager-01	Frankfurt	Dokploy + Swarm manager	Cloud Dedicated (HA)
worker-eu	London	EU production serving	VDS (4 CPU / 16 GB RAM)
worker-us	New York	US production serving	VDS (4 CPU / 16 GB RAM)
worker-asia	Singapore	Asia production serving	VDS (4 CPU / 8 GB RAM)
build-01	Frankfurt	Dedicated build server	VDS (8 CPU / 8 GB RAM)

In this architecture, the manager and build server are co-located in Frankfurt for low-latency communication during deployments. Production workers are distributed globally, each labeled with their datacenter location. When deploying, Dokploy builds the image on build-01, pushes it to a registry, and Swarm distributes it to the appropriate regional workers based on placement constraints.

Cross-Region Considerations

Swarm's management traffic (heartbeats, task scheduling, service updates) travels between the manager and all workers. For cross-datacenter deployments, this traffic traverses the public internet. The bandwidth requirements are minimal (kilobytes per second for management), but latency affects how quickly Swarm detects node failures and reschedules containers.

Key considerations:

Heartbeat interval: Swarm's default heartbeat is 5 seconds. For cross-region setups, this is usually fine. Swarm will declare a node as down after missing heartbeats for 5 seconds (default).
Image distribution: Use a container registry (Docker Hub, GitHub Container Registry, or a self-hosted registry) rather than relying on Swarm's built-in image distribution. The registry provides a pull-based model that handles high-latency connections better.
Database placement: Databases should run in a single region to avoid cross-region replication latency. Place your primary database on the worker closest to the majority of your users, and use application-level caching (Redis) in other regions.

Independent Scaling Across the Cluster

One of the primary advantages of running Dokploy on MassiveGRID's infrastructure is that each Swarm node can be scaled independently. This is particularly powerful in a multi-node setup where different nodes serve different purposes.

Per-Node Scaling Examples

Database node needs more RAM: Your PostgreSQL container on worker-eu is hitting memory limits as your dataset grows. Scale that specific node's RAM from 16 GB to 32 GB without touching CPU or storage. The node stays online; no migration to a different plan required.

Build server needs more CPU: Build times are increasing as your codebase grows. Add 2 more CPU cores to build-01, bringing it from 8 to 10 cores. Your production workers remain unchanged.

New region launch: You are expanding to serve Asian users. Provision a new VDS in Singapore with 4 CPU and 8 GB RAM, join it to the Swarm, label it appropriately, and deploy your application services to it. The existing cluster is unaffected.

With fixed-tier providers, scaling one node means upgrading to the next plan and paying for resources you do not need. With independent scaling, you pay for exactly the resource each node requires.

Monitoring Across Nodes

Dokploy's built-in monitoring dashboard provides per-server metrics including CPU utilization, RAM usage, disk I/O, and network throughput. In a multi-node setup, you can view these metrics for each node individually through the Dokploy UI.

For deeper visibility, consider deploying a monitoring stack to your Swarm cluster:

Prometheus for metrics collection from all nodes (deploy as a Swarm service with global mode to run on every node)
Grafana for dashboards and alerting (deploy to the manager node)
cAdvisor for container-level resource metrics (deploy as a global service)

Deploy the monitoring stack through Dokploy itself as a Docker Compose project. This gives you a unified view of resource utilization across all nodes, helping you identify which specific node and resource needs scaling.

Key Metrics to Watch

CPU utilization per node: Build nodes should spike during builds and idle between them. Production nodes should show steady utilization below 70%. Sustained 100% on any node indicates a need for more CPU.
RAM utilization per node: Database nodes should show stable, high utilization (databases use available RAM for caching). Swap usage on any node is a red flag -- scale RAM immediately.
Swarm service health: Track the number of running vs. desired replicas for each service. A mismatch indicates scheduling failures, usually due to resource constraints on target nodes.
Cross-node network latency: For geographically distributed clusters, monitor latency between the manager and each worker. Sustained latency above 200ms can cause Swarm management issues.

MassiveGRID for Dokploy

Cloud Dedicated with HA — Managed, high-availability infrastructure for your Dokploy manager node. 100% uptime SLA with automatic failover
Dedicated VPS (VDS) — Guaranteed physical CPU cores for worker nodes. Zero contention, predictable performance for production and builds
4 Global Data Centers — New York, London, Frankfurt, Singapore. Distribute your Swarm cluster across continents
Independent Resource Scaling — Scale CPU, RAM, and storage per node. Optimize each server for its role in the cluster
12 Tbps DDoS Protection — Network-edge mitigation across all data centers, protecting every node in your cluster

Explore Dokploy Hosting on MassiveGRID →

Recommended Cluster Configurations

Here are three reference architectures based on scale:

Small Team (2-5 applications)

Node	Specs	Tier	Role
Manager + Production	4 CPU / 8 GB / 100 GB	Cloud Dedicated (HA)	Dokploy + apps + databases
Build Server	4 CPU / 4 GB / 50 GB	VDS	Isolated builds only

Total: 2 nodes. The primary benefit here is build isolation. Production applications no longer compete with Docker builds for CPU.

Growing Team (5-20 applications)

Node	Specs	Tier	Role
Manager	4 CPU / 8 GB / 100 GB	Cloud Dedicated (HA)	Dokploy + Swarm manager
Production Worker	6 CPU / 16 GB / 200 GB	VDS	Apps + databases
Build Server	8 CPU / 8 GB / 80 GB	VDS	Builds + staging

Total: 3 nodes. The manager is dedicated to Dokploy and cluster coordination. Production workloads are isolated from builds. Each node can be scaled independently as needs grow.

Scale-Out (20+ applications, multi-region)

Node	Specs	Tier	Role
Manager (Frankfurt)	4 CPU / 16 GB / 100 GB	Cloud Dedicated (HA)	Dokploy + Swarm manager
EU Worker (London)	6 CPU / 32 GB / 300 GB	VDS	EU apps + databases
US Worker (New York)	6 CPU / 32 GB / 300 GB	VDS	US apps + databases
Asia Worker (Singapore)	4 CPU / 16 GB / 200 GB	VDS	Asia apps + caching
Build Server (Frankfurt)	8 CPU / 16 GB / 100 GB	VDS	All builds

Total: 5 nodes across 4 data centers. Each regional worker is optimized for its local workload. The build server is co-located with the manager for fast communication. All nodes use dedicated CPU (VDS) for predictable performance.

Security in a Multi-Node Cluster

Multi-node deployments expand the attack surface. Each node needs the same security hardening as a single-server setup, plus additional measures for inter-node communication:

Swarm encryption: Docker Swarm encrypts management traffic by default (TLS mutual authentication between nodes). Verify this is active with docker info | grep -i encrypt.
Overlay network encryption: Enable encryption on overlay networks used for inter-service communication: docker network create --opt encrypted --driver overlay my-network. This encrypts all data plane traffic between containers on different nodes.
SSH key management: Dokploy connects to worker nodes via SSH. Use unique SSH keys per node and restrict key access to the Dokploy manager only.
Firewall rules: Swarm ports (2377, 7946, 4789) should only be accessible from other cluster nodes, not from the public internet. Use UFW rules that restrict source IP addresses.

Wrapping Up

Dokploy's multi-node support through Docker Swarm transforms it from a single-server deployment tool into a distributed infrastructure platform. The key decisions are:

Manager node: Cloud Dedicated with HA for reliability. This is your control plane -- it must stay available.
Worker nodes: VDS for production (guaranteed dedicated resources) or Cloud VPS for development and staging (cost-effective shared resources).
Build isolation: A dedicated build server is the single most impactful multi-node pattern. It eliminates the primary performance complaint for Dokploy deployments.
Geographic distribution: Place workers in the data centers closest to your users. Use node labels and placement constraints to control service distribution.
Per-node scaling: Each node in your cluster has independent resource allocation. Scale exactly the resource that is needed, on exactly the node that needs it.

For the initial single-server setup, start with our Dokploy installation guide. If you are experiencing inconsistent build performance, the dedicated build server pattern described above is the most effective solution. And before going to production on any node, follow the security hardening guide to lock down each server in your cluster.