You've chosen a VPS plan, deployed your application, and you're ready to launch. But how do you know your server can handle the traffic you expect? Launching without benchmarking is like opening a restaurant without checking if the kitchen can handle a dinner rush — you'll find out the hard way, and your users will be the ones who suffer.
This guide covers every layer of VPS performance testing: CPU, memory, disk I/O, network, and realistic web application load testing. You'll learn to identify your actual bottleneck before a single real user hits your server, and you'll know exactly which resource to scale when it's time.
MassiveGRID Ubuntu VPS includes: Ubuntu 24.04 LTS pre-installed · Proxmox HA cluster with automatic failover · Ceph 3x replicated NVMe storage · Independent CPU/RAM/storage scaling · 12 Tbps DDoS protection · 4 global datacenter locations · 100% uptime SLA · 24/7 human support rated 9.5/10
Deploy a self-managed VPS — from $1.99/mo
Need dedicated resources? — from $19.80/mo
Want fully managed hosting? — we handle everything
Why Benchmark Before Launch
Benchmarking serves three critical purposes:
- Prevent over-provisioning — if your 4-vCPU server barely uses 20% CPU under peak load, you're paying for resources you don't need. Scale down and save money.
- Prevent under-provisioning — if your server hits 95% CPU at half your expected traffic, you need to scale up before launch, not during a traffic spike.
- Identify the bottleneck — is your application limited by CPU, memory, disk I/O, or network? The answer determines which resource to scale. On MassiveGRID, you can scale each resource independently, so knowing the bottleneck saves you from upgrading everything when only one dimension needs more capacity.
Benchmark your Cloud VPS before going live. If results show you need more capacity, scale the bottleneck resource independently — no need to upgrade your entire plan.
Prerequisites
Install the benchmarking tools we'll use throughout this guide:
sudo apt update
sudo apt install -y sysbench fio iperf3 wrk
For k6 (advanced HTTP load testing), install separately:
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt update
sudo apt install k6 -y
Verify installations:
sysbench --version
fio --version
iperf3 --version
wrk --version
k6 version
CPU Benchmark: sysbench
The CPU benchmark tests raw computational throughput by calculating prime numbers. This measures how fast your vCPUs process mathematical operations — a good proxy for application code execution speed.
Single-Threaded Test
This tests the performance of a single CPU core — relevant for applications that are single-threaded (many PHP applications, Node.js event loop, database single-query performance):
sysbench cpu --cpu-max-prime=20000 --threads=1 run
Sample output:
CPU speed:
events per second: 1847.32
General statistics:
total time: 10.0005s
total number of events: 18476
Latency (ms):
min: 0.52
avg: 0.54
max: 2.34
95th percentile: 0.56
sum: 9994.27
How to interpret: The key metric is events per second. Higher is better. Compare this number against:
| Events/sec (single-thread) | Assessment |
|---|---|
| 500–1,000 | Budget vCPU — fine for low-traffic sites |
| 1,000–2,000 | Standard vCPU — handles moderate workloads |
| 2,000–3,500 | High-performance vCPU — good for compute tasks |
| 3,500+ | Premium CPU — database servers, heavy computation |
Multi-Threaded Test
This tests aggregate performance across all vCPUs — relevant for applications that use multiple workers (Nginx workers, PHP-FPM pool, multi-threaded applications):
# Replace 4 with your vCPU count (check with: nproc)
sysbench cpu --cpu-max-prime=20000 --threads=$(nproc) run
Sample output (4 vCPU):
CPU speed:
events per second: 7283.94
General statistics:
total time: 10.0003s
total number of events: 72848
How to interpret: The multi-threaded result should be approximately N times the single-threaded result, where N is your vCPU count. If it's significantly less (e.g., 2.5x instead of 4x), there may be CPU contention from other tenants on the host.
Memory Benchmark: sysbench
Memory bandwidth affects how fast data moves between RAM and CPU. This is important for in-memory databases (Redis), large dataset processing, and applications with high memory allocation rates.
# Sequential read test (1MB block size, 10GB total)
sysbench memory --memory-block-size=1M --memory-total-size=10G --memory-oper=read run
# Sequential write test
sysbench memory --memory-block-size=1M --memory-total-size=10G --memory-oper=write run
Sample output:
Total operations: 10240 (8347.62 per second)
10240.00 MiB transferred (8347.62 MiB/sec)
General statistics:
total time: 1.2269s
total number of events: 10240
How to interpret: The key metric is MiB/sec transferred. Modern servers should achieve 5,000–15,000 MiB/sec. If you're below 3,000 MiB/sec, memory bandwidth could bottleneck memory-intensive workloads.
Also verify your actual available memory matches your plan and check swap usage — if swap is being used during normal operations, you need more RAM. See our swap memory management guide for details:
free -h
# If "Swap: used" is more than 0, your application is exceeding available RAM
Disk I/O Benchmark: fio
Disk performance is often the most critical factor for database servers, file-heavy applications, and any workload that reads or writes data frequently. On MassiveGRID, storage uses Ceph with 3x replicated NVMe SSDs — meaning your data is written to three separate NVMe drives for redundancy.
Sequential Read/Write
Sequential I/O measures throughput for large file operations (backups, log processing, large file transfers):
# Sequential write test (1GB file, 4 jobs)
fio --name=seq-write \
--ioengine=libaio \
--direct=1 \
--bs=1M \
--size=1G \
--numjobs=4 \
--runtime=30 \
--rw=write \
--group_reporting
Sample output:
WRITE: bw=412MiB/s (432MB/s), 412MiB/s-412MiB/s (432MB/s-432MB/s), io=4096MiB (4295MB), run=9934-9934msec
# Sequential read test
fio --name=seq-read \
--ioengine=libaio \
--direct=1 \
--bs=1M \
--size=1G \
--numjobs=4 \
--runtime=30 \
--rw=read \
--group_reporting
Random IOPS
Random I/O (IOPS) is the most important metric for databases. Database queries involve many small random reads and writes across different parts of the storage:
# Random read IOPS (4K block size, simulates database reads)
fio --name=rand-read \
--ioengine=libaio \
--direct=1 \
--bs=4k \
--size=1G \
--numjobs=4 \
--iodepth=64 \
--runtime=30 \
--rw=randread \
--group_reporting
Sample output:
READ: bw=187MiB/s (196MB/s), 187MiB/s-187MiB/s (196MB/s-196MB/s), io=5621MiB (5894MB), run=30001-30001msec
iops : min=42186, max=52847, avg=47892.14
# Random write IOPS
fio --name=rand-write \
--ioengine=libaio \
--direct=1 \
--bs=4k \
--size=1G \
--numjobs=4 \
--iodepth=64 \
--runtime=30 \
--rw=randwrite \
--group_reporting
# Mixed random read/write (70% read, 30% write — typical database pattern)
fio --name=rand-rw \
--ioengine=libaio \
--direct=1 \
--bs=4k \
--size=1G \
--numjobs=4 \
--iodepth=64 \
--runtime=30 \
--rw=randrw \
--rwmixread=70 \
--group_reporting
How to interpret IOPS:
| Random 4K IOPS | Storage Type | Suitable For |
|---|---|---|
| 500–2,000 | HDD or slow SSD | Static sites, basic applications |
| 2,000–10,000 | Standard SSD | Most web applications |
| 10,000–50,000 | NVMe SSD | Databases, high-traffic applications |
| 50,000+ | Premium NVMe | Heavy database workloads, real-time analytics |
Note on Ceph storage: Ceph distributes data across multiple NVMe drives with 3x replication. Write performance includes the overhead of writing three copies. Read performance benefits from being able to read from any of the three replicas. This trade-off gives you enterprise-grade data durability with strong I/O performance.
Clean up test files after benchmarking:
rm -f seq-write.* seq-read.* rand-read.* rand-write.* rand-rw.*
Network Benchmark: iperf3
Network throughput determines how quickly your server can serve data to users. This is critical for file downloads, video streaming, API-heavy applications, and any service where response payload sizes are large.
Testing with a Public iperf3 Server
# Test download speed (your VPS receiving data)
iperf3 -c iperf.he.net -p 5201 -t 10
# Test upload speed (your VPS sending data — this is what users experience)
iperf3 -c iperf.he.net -p 5201 -t 10 -R
Sample output:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.09 GBytes 937 Mbits/sec 0 sender
[ 5] 0.00-10.04 sec 1.09 GBytes 934 Mbits/sec receiver
Testing Between Two VPS Instances
For a more controlled test, use two VPS instances — one as server, one as client:
# On VPS 1 (server):
iperf3 -s -p 5201
# On VPS 2 (client):
iperf3 -c VPS1_IP -p 5201 -t 30 -P 4
The -P 4 flag uses 4 parallel streams, which gives a more realistic throughput measurement (a single TCP stream often can't saturate a high-bandwidth link).
How to interpret: Most VPS plans include 1 Gbps ports. You should see 800–950 Mbits/sec on a good connection. If you're significantly below that, check if your VPS plan has bandwidth throttling or if the test is going through a congested path.
Web Application Load Testing: wrk
System-level benchmarks tell you about raw hardware performance. But what you really need to know is: how many HTTP requests per second can my application serve? That's where wrk comes in.
Basic HTTP Benchmark
# 2 threads, 100 connections, 30-second test
wrk -t2 -c100 -d30s http://localhost
Sample output:
Running 30s test @ http://localhost
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 12.34ms 5.67ms 189.23ms 87.65%
Req/Sec 4.12k 412.89 5.23k 72.33%
246,847 requests in 30.02s, 1.87GB read
Requests/sec: 8,222.45
Transfer/sec: 63.82MB
Key metrics:
- Requests/sec — how many requests your application can serve per second
- Latency (Avg) — average response time; under 50ms is good, under 200ms is acceptable
- Latency (Max) — worst-case response time; indicates tail latency problems
- Stdev — consistency; low stdev means predictable performance
Testing Different Endpoints
Don't just test your homepage. Test the endpoints that matter most:
# Homepage (static or cached)
wrk -t2 -c100 -d30s http://localhost/
# Dynamic page (database queries involved)
wrk -t2 -c100 -d30s http://localhost/products
# API endpoint
wrk -t2 -c100 -d30s http://localhost/api/v1/users
# Search (typically most expensive)
wrk -t2 -c100 -d30s "http://localhost/search?q=test"
Gradual Load Increase
The most useful test increases load gradually to find your breaking point:
# Test with increasing connections
for connections in 10 50 100 200 500 1000; do
echo "=== $connections connections ==="
wrk -t2 -c$connections -d15s http://localhost/ 2>&1 | grep -E "Requests/sec|Latency"
echo ""
sleep 5 # Let the server recover between tests
done
Sample results:
=== 10 connections ===
Latency 2.14ms 0.89ms 15.67ms 91.23%
Requests/sec: 4,621.33
=== 50 connections ===
Latency 5.87ms 2.34ms 45.12ms 88.45%
Requests/sec: 8,134.67
=== 100 connections ===
Latency 12.34ms 5.67ms 189.23ms 87.65%
Requests/sec: 8,222.45
=== 200 connections ===
Latency 28.91ms 14.56ms 423.78ms 82.34%
Requests/sec: 6,891.23
=== 500 connections ===
Latency 112.45ms 67.89ms 1234.56ms 76.12%
Requests/sec: 4,234.56
=== 1000 connections ===
Latency 478.23ms 234.56ms 3456.78ms 68.90%
Requests/sec: 1,987.34
How to interpret: Notice that requests/sec peaked at 100 connections and declined after 200. The optimal operating point is around 100 concurrent connections for this server. Beyond that, latency increases dramatically and throughput drops — the server is overloaded.
Realistic Load Testing: k6
While wrk tests raw throughput, k6 lets you simulate realistic user behavior — page navigations, form submissions, API calls with think time between requests. This gives you a much more accurate picture of real-world capacity.
Basic k6 Script
Create a test script that simulates a user browsing your site:
cat > ~/load-test.js << 'EOF'
import http from 'k6/http';
import { check, sleep } from 'k6';
// Ramp up to 50 users over 2 minutes, sustain for 5 minutes, ramp down
export const options = {
stages: [
{ duration: '2m', target: 50 }, // Ramp up
{ duration: '5m', target: 50 }, // Sustain
{ duration: '1m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests under 500ms
http_req_failed: ['rate<0.01'], // Less than 1% failure rate
},
};
const BASE_URL = 'http://localhost';
export default function () {
// Visit homepage
let res = http.get(`${BASE_URL}/`);
check(res, {
'homepage status 200': (r) => r.status === 200,
'homepage load time < 500ms': (r) => r.timings.duration < 500,
});
sleep(Math.random() * 3 + 1); // 1-4 seconds think time
// Visit a product page
res = http.get(`${BASE_URL}/products`);
check(res, {
'products status 200': (r) => r.status === 200,
});
sleep(Math.random() * 3 + 1);
// Simulate an API call
res = http.get(`${BASE_URL}/api/v1/items?page=1&limit=20`);
check(res, {
'API status 200': (r) => r.status === 200,
'API response time < 200ms': (r) => r.timings.duration < 200,
});
sleep(Math.random() * 2 + 1);
}
EOF
Run the k6 Test
k6 run ~/load-test.js
Sample output:
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: /root/load-test.js
output: -
scenarios: (100.00%) 1 scenario, 50 max VUs, 8m30s max duration
default: Up to 50 looping VUs for 8m0s
✓ homepage status 200
✓ homepage load time < 500ms
✓ products status 200
✓ API status 200
✓ API response time < 200ms
checks.........................: 100.00% ✓ 12847 ✗ 0
data_received..................: 145 MB 302 kB/s
data_sent......................: 1.2 MB 2.5 kB/s
http_req_blocked...............: avg=0.12ms min=0µs med=0.004ms max=23.4ms p(90)=0.006ms p(95)=0.008ms
http_req_duration..............: avg=24.3ms min=1.2ms med=18.7ms max=487.3ms p(90)=52.1ms p(95)=78.4ms
{ expected_response:true }...: avg=24.3ms min=1.2ms med=18.7ms max=487.3ms p(90)=52.1ms p(95)=78.4ms
✓ http_req_failed................: 0.00% ✓ 0 ✗ 12847
http_reqs......................: 12847 26.76/s
iteration_duration.............: avg=5.62s min=2.14s med=5.43s max=12.3s p(90)=8.12s p(95)=9.45s
iterations.....................: 4282 8.92/s
vus............................: 1 min=1 max=50
vus_max........................: 50 min=50 max=50
running (8m00.4s), 00/50 VUs, 4282 complete iterations
default ✓ [======================================] 00/50 VUs 8m0s
Key metrics to evaluate:
- http_req_duration p(95) — 95th percentile response time. Under 500ms is good for web pages, under 200ms for APIs.
- http_req_failed — failure rate. Should be 0% or very close to it.
- http_reqs — total requests per second the server handled.
- checks — percentage of assertion checks that passed.
Stress Test to Find the Breaking Point
cat > ~/stress-test.js << 'EOF'
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
stages: [
{ duration: '2m', target: 50 },
{ duration: '2m', target: 100 },
{ duration: '2m', target: 200 },
{ duration: '2m', target: 300 },
{ duration: '2m', target: 500 },
{ duration: '2m', target: 0 },
],
};
export default function () {
http.get('http://localhost/');
sleep(1);
}
EOF
k6 run ~/stress-test.js
Watch for the stage where response times spike or errors begin. That's your server's breaking point under load.
Interpreting Results: How Many Visitors Can My VPS Handle?
Converting benchmark results to "visitors per day" requires understanding the relationship between concurrent users and total daily visitors.
The Calculation Framework
# Key formula:
# Concurrent users = (Daily visitors × Pages per visit × Avg page load time) / Seconds per day
# Reverse the formula to find capacity:
# Daily visitors = (Max concurrent users × 86400) / (Pages per visit × Avg page load time)
# Example calculation:
# Your server handles 100 concurrent connections well (from wrk tests)
# Average visitor views 3 pages
# Average page load takes 0.5 seconds (from k6 tests)
# Daily capacity = (100 × 86400) / (3 × 0.5) = 5,760,000 visitors/day
But traffic isn't evenly distributed across the day. Most sites see 50–70% of traffic in 8 peak hours:
# More realistic calculation accounting for peak hours:
# Peak hour visitors = Daily visitors × 0.10 (10% of daily traffic in busiest hour)
# Peak concurrent = (Peak visitors per hour × Pages per visit × Avg load time) / 3600
# If your server handles 100 concurrent:
# Peak visitors/hour = (100 × 3600) / (3 × 0.5) = 240,000 visitors/hour
# Daily visitors (assuming peak hour = 10% of daily) = 240,000 / 0.10 = 2,400,000/day
In practice, with overhead for database queries, session management, and other factors, apply a 40–60% efficiency factor:
# Conservative estimate:
# 2,400,000 × 0.5 = 1,200,000 visitors/day
# For a typical content site on a 2 vCPU / 4GB VPS:
# Realistic capacity: 50,000-200,000 visitors/day (depending on application complexity)
| VPS Spec | Static Site | WordPress (cached) | Dynamic App (Node/PHP) | Database-heavy App |
|---|---|---|---|---|
| 1 vCPU / 2 GB | 500K+/day | 20K-50K/day | 10K-30K/day | 5K-15K/day |
| 2 vCPU / 4 GB | 1M+/day | 50K-150K/day | 30K-80K/day | 15K-40K/day |
| 4 vCPU / 8 GB | 2M+/day | 150K-400K/day | 80K-200K/day | 40K-100K/day |
| 8 vCPU / 16 GB | 5M+/day | 400K-1M/day | 200K-500K/day | 100K-300K/day |
These are rough estimates. Your actual capacity depends on your application's complexity, caching strategy, database query efficiency, and many other factors. That's why benchmarking your specific application matters more than generic numbers.
The Consistency Test
Here's a test most benchmarking guides skip — and it's one of the most revealing. Run the same benchmark at different times of day and compare results:
cat > ~/consistency-test.sh << 'SCRIPT'
#!/bin/bash
# Run this script every 4 hours for 24 hours (use cron)
# Records CPU benchmark and disk I/O at each run
TIMESTAMP=$(date +"%Y-%m-%d_%H:%M")
LOGFILE=~/benchmark-consistency.log
echo "=== $TIMESTAMP ===" >> $LOGFILE
# CPU benchmark (quick 5-second test)
echo "CPU:" >> $LOGFILE
sysbench cpu --cpu-max-prime=10000 --threads=$(nproc) --time=5 run 2>&1 | grep "events per second" >> $LOGFILE
# Disk IOPS (quick 10-second test)
echo "Disk IOPS:" >> $LOGFILE
fio --name=quick-iops --ioengine=libaio --direct=1 --bs=4k --size=256M --numjobs=1 --iodepth=32 --runtime=10 --rw=randread --group_reporting 2>&1 | grep "IOPS" >> $LOGFILE
rm -f quick-iops.*
echo "" >> $LOGFILE
SCRIPT
chmod +x ~/consistency-test.sh
Schedule it to run every 4 hours:
# Add to crontab
crontab -e
# Add this line:
0 */4 * * * /root/consistency-test.sh
After 24-48 hours, review the results:
cat ~/benchmark-consistency.log
What to look for: If CPU events/sec varies by more than 15% between runs, or if disk IOPS varies significantly, you're experiencing resource contention from other tenants on the same hardware.
If results vary significantly by time of day, you're experiencing contention. Dedicated resources produce consistent benchmarks — your CPU cores and RAM are exclusively yours, not shared with other tenants.
Monitoring During Load Tests
While running load tests, monitor your server's resource usage in real time to identify the bottleneck:
# Open a second SSH session and run:
# Real-time system overview
top -d 1
# Or for more detail:
vmstat 1
# Watch for:
# - %cpu (us+sy) near 100% = CPU bottleneck
# - free memory near 0 + swap active = memory bottleneck
# - wa% high = disk I/O bottleneck
# - si/so high = swapping (need more RAM)
For more comprehensive monitoring during load tests, see our monitoring setup guide or our guide on VPS performance optimization.
# Quick bottleneck identifier during load test
echo "=== CPU ===" && top -bn1 | head -5 && echo "" && \
echo "=== Memory ===" && free -h && echo "" && \
echo "=== Disk I/O ===" && iostat -x 1 3 | tail -10 && echo "" && \
echo "=== Network ===" && ss -s
When Benchmarks Say "Upgrade"
Your benchmark results point to specific scaling decisions:
| Bottleneck Found | Evidence | Action |
|---|---|---|
| CPU | CPU at 100% while memory and disk are fine | Add more vCPU cores |
| Memory | Swap usage during load, OOM kills in logs | Add more RAM |
| Disk I/O | High iowait %, low IOPS in fio test | Add more storage or optimize queries |
| Network | Bandwidth saturated in iperf3 test | Upgrade bandwidth tier |
| Application | Resources not maxed but response times high | Optimize code, add caching, tune database |
On MassiveGRID, you can scale each resource independently. If your benchmarks show CPU is the bottleneck while memory and storage are underutilized, add more vCPUs without paying for RAM or storage you don't need.
Prefer Managed Capacity Planning?
Performance benchmarking, capacity planning, and scaling decisions require ongoing attention as your traffic grows. If you'd rather have experts handle the infrastructure tuning and scaling, MassiveGRID's fully managed dedicated hosting includes proactive performance monitoring, capacity planning, and scaling recommendations — we handle the infrastructure so you can focus on your application.
Summary
Before going live, benchmark every layer of your VPS:
- CPU —
sysbench cpufor raw computation speed (single and multi-threaded) - Memory —
sysbench memoryfor bandwidth andfree -hfor available capacity - Disk I/O —
fiofor sequential throughput and random IOPS - Network —
iperf3for bandwidth - HTTP throughput —
wrkfor raw requests per second at your application level - Realistic load —
k6for simulated user scenarios with think time - Consistency — run benchmarks at different times to check for resource contention
Record your baseline results. Re-run benchmarks after any significant change — new deployment, configuration change, traffic pattern shift. And when the numbers tell you it's time to scale, you'll know exactly which resource needs it.