Nextcloud Monitoring with Prometheus & Grafana — Complete Setup Guide

Why Monitor Nextcloud?

Running Nextcloud in production is about more than just deploying the application and walking away. Your users depend on it for file synchronization, collaboration, and communication. When something goes wrong — a disk fills up, memory runs out, or response times spike — you need to know about it before your users start filing tickets.

Monitoring transforms Nextcloud administration from reactive firefighting into proactive management. Instead of discovering that storage ran out at 3 AM from an angry email, you get an alert when free space drops below 15%, giving you hours or days to respond. Instead of guessing why the server feels slow, you have dashboards showing exactly which resource is the bottleneck.

The combination of Prometheus and Grafana has become the industry standard for infrastructure monitoring. Prometheus collects and stores time-series metrics with a powerful query language, while Grafana transforms those metrics into actionable dashboards and alerts. Together with purpose-built Nextcloud exporters, they give you complete visibility into your Nextcloud deployment.

This guide walks you through the complete setup — from enabling the Nextcloud metrics API to building production-ready dashboards and alerting rules. Whether you are running a small team instance or an enterprise deployment serving thousands of users, the monitoring stack described here scales to match.

Monitoring Architecture Overview

Before diving into configuration, it helps to understand how the monitoring components connect and how data flows through the system.

Components

Nextcloud — Your application, which exposes metrics through its Server Info API
Nextcloud Exporter — A lightweight service that queries the Nextcloud API and translates metrics into Prometheus format
Node Exporter — Collects OS-level metrics (CPU, memory, disk, network) from the host system
Prometheus — Time-series database that scrapes metrics from exporters at regular intervals, stores them, and evaluates alerting rules
Grafana — Visualization platform that queries Prometheus and renders dashboards, also handles alert notifications

Data Flow

┌─────────────────┐      ┌──────────────────────┐      ┌──────────────┐      ┌──────────────┐
│                 │      │                      │      │              │      │              │
│   Nextcloud     │─────▶│  Nextcloud Exporter   │◀─────│  Prometheus  │─────▶│   Grafana    │
│   (Server Info  │ HTTP │  (port 9205)          │scrape│  (port 9090) │query │  (port 3000) │
│    API)         │      │                      │      │              │      │              │
└─────────────────┘      └──────────────────────┘      │              │      │              │
                                                       │              │      │  Dashboards  │
┌─────────────────┐                                    │              │      │  Alerts      │
│                 │                                    │              │      │              │
│  Node Exporter  │◀───────────────────────────────────│              │      │              │
│  (port 9100)    │              scrape                │              │      │              │
│                 │                                    │              │      │              │
└─────────────────┘                                    └──────────────┘      └──────────────┘
                                                              │
                                                              ▼
                                                       ┌──────────────┐
                                                       │  Alertmanager│
                                                       │  (port 9093) │
                                                       │  Email/Slack │
                                                       └──────────────┘

Prometheus operates on a pull model — it actively scrapes each exporter endpoint at configured intervals (typically every 15–60 seconds). This design means exporters do not need to know where Prometheus is; they simply expose an HTTP endpoint with current metrics. Prometheus stores these data points with timestamps, enabling both real-time monitoring and historical trend analysis.

Prerequisites

Before starting the monitoring setup, ensure you have the following in place:

Nextcloud Instance

A working Nextcloud installation (version 20 or later recommended). If you are starting fresh, follow our production Nextcloud installation guide first.
Administrative access to Nextcloud (for enabling the Server Info app and creating a monitoring user)
HTTPS enabled with a valid SSL certificate

Server Requirements for the Monitoring Stack

CPU: 2 vCPUs minimum (Prometheus is CPU-intensive during compaction)
RAM: 2 GB minimum, 4 GB recommended (Prometheus keeps recent data in memory)
Disk: 20 GB minimum for Prometheus data retention (plan ~1.5 MB per day per target with default metrics)
OS: Ubuntu 22.04/24.04, Debian 12, or AlmaLinux 9

Deployment note: You can run the monitoring stack on the same server as Nextcloud for small deployments, but dedicated monitoring infrastructure is recommended for production. This keeps resource contention from affecting either service and ensures monitoring remains available even if the Nextcloud server has issues.

Network Considerations

Port 9090 — Prometheus web UI and API
Port 9100 — Node Exporter metrics endpoint
Port 9205 — Nextcloud Exporter metrics endpoint
Port 3000 — Grafana web UI
Port 9093 — Alertmanager (if using separate alerting)

If Prometheus runs on a different server than Nextcloud, ensure the firewall allows Prometheus to reach the exporter ports on the Nextcloud host. Use a private network or VPN for exporter traffic — never expose raw metrics endpoints to the public internet.

Step 1: Enable the Nextcloud Server Info API

Nextcloud includes a built-in serverinfo app that exposes system metrics through an API endpoint. This is the data source that the Nextcloud Exporter will query.

Enable the Server Info App

The serverinfo app is typically enabled by default, but verify and enable it if needed:

# Check if serverinfo is enabled
sudo -u www-data php /var/www/nextcloud/occ app:list | grep serverinfo

# Enable if not already active
sudo -u www-data php /var/www/nextcloud/occ app:enable serverinfo

Verify the API Endpoint

Test the API endpoint to confirm it returns data:

# Test with curl using admin credentials
curl -s -u admin:YOUR_PASSWORD \
  "https://your-nextcloud.example.com/ocs/v2.php/apps/serverinfo/api/v1/info?format=json" \
  -H "OCS-APIREQUEST: true" | python3 -m json.tool

You should see a JSON response containing sections for system, storage, shares, server, and activeUsers.

Create a Dedicated Monitoring User

Do not use your admin account for monitoring. Create a dedicated user with minimal permissions:

# Create monitoring user
sudo -u www-data php /var/www/nextcloud/occ user:add \
  --display-name="Monitoring" \
  --group="admin" \
  monitoring

# Set a strong password when prompted
# The user needs admin group membership to access serverinfo API

Security note: The serverinfo API requires admin-level access. If you want to minimize the monitoring user's permissions, consider using an app password instead of the main account password. Generate one from Nextcloud Settings > Security > Devices & sessions. For comprehensive security practices, see our Nextcloud security hardening guide.

Step 2: Install and Configure the Nextcloud Exporter

The Nextcloud Exporter is a lightweight Go application that queries the Nextcloud Server Info API and exposes the metrics in Prometheus format.

Option A: Install from Binary

# Download the latest release
wget https://github.com/xperimental/nextcloud-exporter/releases/download/v0.7.0/nextcloud-exporter-v0.7.0-linux-amd64.tar.gz

# Extract the binary
tar xzf nextcloud-exporter-v0.7.0-linux-amd64.tar.gz

# Move to system path
sudo mv nextcloud-exporter /usr/local/bin/
sudo chmod +x /usr/local/bin/nextcloud-exporter

Option B: Deploy with Docker

# Run as a Docker container
docker run -d \
  --name nextcloud-exporter \
  --restart unless-stopped \
  -p 9205:9205 \
  -e NEXTCLOUD_SERVER="https://your-nextcloud.example.com" \
  -e NEXTCLOUD_USERNAME="monitoring" \
  -e NEXTCLOUD_PASSWORD="your-secure-password" \
  ghcr.io/xperimental/nextcloud-exporter:latest

Create a Systemd Service (Binary Install)

For the binary installation, create a systemd service file for reliable operation:

# /etc/systemd/system/nextcloud-exporter.service
[Unit]
Description=Nextcloud Exporter for Prometheus
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=nextcloud-exporter
Group=nextcloud-exporter
ExecStart=/usr/local/bin/nextcloud-exporter \
  --server https://your-nextcloud.example.com \
  --username monitoring \
  --password your-secure-password \
  --listen-address :9205 \
  --timeout 30s
Restart=on-failure
RestartSec=5
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true

[Install]
WantedBy=multi-user.target

# Create the service user and start the exporter
sudo useradd --system --no-create-home --shell /usr/sbin/nologin nextcloud-exporter
sudo systemctl daemon-reload
sudo systemctl enable --now nextcloud-exporter

Verify the Metrics Endpoint

# Check that the exporter is serving metrics
curl -s http://localhost:9205/metrics | head -30

# You should see lines like:
# nextcloud_system_info{version="28.0.4"} 1
# nextcloud_users_total 42
# nextcloud_files_total 128456
# nextcloud_storage_free_bytes 5.3687091e+10

Configuration Options Reference

Flag / Environment Variable	Description	Default
`--server` / `NEXTCLOUD_SERVER`	Nextcloud instance URL	(required)
`--username` / `NEXTCLOUD_USERNAME`	API username	(required)
`--password` / `NEXTCLOUD_PASSWORD`	API password or app token	(required)
`--listen-address`	Address and port to listen on	`:9205`
`--timeout`	HTTP request timeout	`5s`
`--tls-skip-verify`	Skip TLS certificate verification	`false`

Step 3: Prometheus Configuration

With the exporter running, configure Prometheus to scrape metrics from both the Nextcloud Exporter and the Node Exporter.

Install Prometheus

# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.51.0/prometheus-2.51.0.linux-amd64.tar.gz
tar xzf prometheus-2.51.0.linux-amd64.tar.gz
sudo mv prometheus-2.51.0.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-2.51.0.linux-amd64/promtool /usr/local/bin/

# Create directories
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo useradd --system --no-create-home --shell /usr/sbin/nologin prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

Prometheus Configuration File

Create the main configuration at /etc/prometheus/prometheus.yml:

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 30s          # How often to scrape targets
  evaluation_interval: 30s      # How often to evaluate alerting rules
  scrape_timeout: 15s           # Timeout for each scrape request

  external_labels:
    environment: production
    service: nextcloud

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093

# Load alerting rules
rule_files:
  - /etc/prometheus/rules/*.yml

# Scrape configurations
scrape_configs:
  # Prometheus self-monitoring
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  # Nextcloud application metrics
  - job_name: "nextcloud"
    scrape_interval: 60s         # Nextcloud API can be slow; use longer interval
    scrape_timeout: 30s
    static_configs:
      - targets: ["localhost:9205"]
        labels:
          instance: "nextcloud-prod"
          datacenter: "nyc1"

  # Node Exporter for OS-level metrics
  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]
        labels:
          instance: "nextcloud-prod"
          datacenter: "nyc1"

Scrape Interval Recommendations

Nextcloud Exporter (60s): The Server Info API queries the database on every call. A 60-second interval balances freshness against load. For large instances with many users, consider 120s.
Node Exporter (30s): OS-level metrics are cheap to collect. A 30-second interval gives good resolution for resource monitoring.
Prometheus self-monitoring (30s): Track Prometheus own health at the default interval.

Create the Prometheus Systemd Service

# /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring System
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --storage.tsdb.retention.time=90d \
  --web.listen-address=:9090 \
  --web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

# Start Prometheus
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus

# Verify it is running
curl -s http://localhost:9090/-/healthy
# Expected: Prometheus Server is Healthy.

Validate Configuration

# Check configuration syntax before restarting
promtool check config /etc/prometheus/prometheus.yml

# Verify targets are being scraped
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool

Step 4: Key Metrics to Monitor

Understanding which metrics matter is critical to building effective dashboards and alerts. The Nextcloud Exporter exposes dozens of metrics, but these are the ones you should focus on.

System Health Metrics

Metric Name	Description	Alert Threshold
`nextcloud_system_cpuload`	System CPU load averages (1m, 5m, 15m)	> number of CPU cores
`nextcloud_system_mem_total_bytes`	Total system memory	—
`nextcloud_system_mem_free_bytes`	Available system memory	< 10% of total
`nextcloud_system_swap_total_bytes`	Total swap space	—
`nextcloud_system_swap_free_bytes`	Available swap space	< 20% of total

Storage Metrics

Metric Name	Description	Alert Threshold
`nextcloud_storage_free_bytes`	Free storage space on data directory	< 10% of total or < 10 GB
`nextcloud_storage_num_files`	Total number of files managed	Trend monitoring only
`nextcloud_storage_num_storages`	Number of configured storage backends	—
`nextcloud_storage_num_storages_local`	Number of local storage mounts	—
`nextcloud_storage_num_storages_other`	Number of external storage mounts	—

User Activity Metrics

Metric Name	Description	Alert Threshold
`nextcloud_users_total`	Total registered users	Trend / license limits
`nextcloud_active_users_last5min`	Users active in last 5 minutes	Anomaly detection
`nextcloud_active_users_last1hour`	Users active in last hour	Capacity planning
`nextcloud_active_users_last24hours`	Users active in last 24 hours	Engagement tracking
`nextcloud_shares_num_fed_shares_received`	Federated shares received	—
`nextcloud_shares_num_fed_shares_sent`	Federated shares sent	—

Performance Metrics

Metric Name	Description	Alert Threshold
`nextcloud_php_opcache_hit_rate`	PHP OPcache hit rate percentage	< 95%
`nextcloud_php_memory_limit_bytes`	PHP memory limit	< 512 MB
`nextcloud_php_max_execution_time`	PHP max execution time	—
`nextcloud_php_upload_max_size_bytes`	Maximum upload file size	—
`nextcloud_database_size_bytes`	Nextcloud database size	Growth rate monitoring

Application Metrics

Metric Name	Description	Alert Threshold
`nextcloud_system_info`	Nextcloud version information (label)	Version change detection
`nextcloud_apps_installed`	Number of installed apps	—
`nextcloud_apps_updates_available`	Number of apps with pending updates	> 0 for extended period
`nextcloud_up`	Whether Nextcloud is reachable (1 = up, 0 = down)	== 0

For a deeper dive into the performance-related metrics and how to tune the settings they reflect, see our Nextcloud performance tuning guide.

Step 5: Grafana Dashboard Setup

Grafana transforms raw Prometheus metrics into visual dashboards that make it easy to assess system health at a glance.

Install Grafana

# Add the Grafana repository (Ubuntu/Debian)
sudo apt-get install -y apt-transport-https software-properties-common
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list

# Install and start Grafana
sudo apt-get update
sudo apt-get install -y grafana
sudo systemctl enable --now grafana-server

Add Prometheus as a Data Source

Open Grafana at http://your-server:3000 (default credentials: admin/admin)
Navigate to Connections > Data Sources > Add data source
Select Prometheus
Set the URL to http://localhost:9090
Click Save & Test to verify connectivity

You can also configure the data source via provisioning for reproducible setups:

# /etc/grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090
    isDefault: true
    editable: false

Community Dashboard vs Custom

The Grafana community maintains several Nextcloud dashboards you can import directly. Search for "Nextcloud" on grafana.com/grafana/dashboards and import by ID. However, community dashboards often include outdated metrics or miss important panels. We recommend starting with a community dashboard and then customizing it.

Key Dashboard Panels

Overview Row: System Status at a Glance

Create stat panels for the most critical indicators:

# Active Users (last 5 minutes) — Stat panel
nextcloud_active_users_last5min

# Nextcloud Up/Down — Stat panel with value mapping (1=Up, 0=Down)
nextcloud_up

# Total Storage Used — Stat panel
nextcloud_storage_free_bytes

# Total Files Managed — Stat panel
nextcloud_storage_num_files

# Nextcloud Version — Stat panel (use label value)
nextcloud_system_info

Performance Row: Response Times and PHP Metrics

# PHP OPcache Hit Rate — Gauge panel (target: >95%)
nextcloud_php_opcache_hit_rate

# System CPU Load (1 minute) — Time series panel
nextcloud_system_cpuload{period="1"}

# Memory Usage Percentage — Time series panel
(1 - (nextcloud_system_mem_free_bytes / nextcloud_system_mem_total_bytes)) * 100

# Database Size Over Time — Time series panel
nextcloud_database_size_bytes

Storage Row: Disk Usage Trends

# Free Disk Space Over Time — Time series panel with threshold line
nextcloud_storage_free_bytes

# Disk Usage Percentage — Gauge panel
# (Requires node_exporter metrics for total disk size)
(1 - (node_filesystem_avail_bytes{mountpoint="/var/www/nextcloud/data"} / node_filesystem_size_bytes{mountpoint="/var/www/nextcloud/data"})) * 100

# File Count Growth — Time series panel
nextcloud_storage_num_files

# Predicted Disk Full — Stat panel
# Uses linear regression to predict when disk runs out
predict_linear(node_filesystem_avail_bytes{mountpoint="/var/www/nextcloud/data"}[7d], 30 * 24 * 3600) / (1024^3)

User Activity Row: Login Patterns and Operations

# Active Users Over Time (5min, 1hr, 24hr) — Time series panel
nextcloud_active_users_last5min
nextcloud_active_users_last1hour
nextcloud_active_users_last24hours

# Total Users — Stat panel
nextcloud_users_total

# Shares Overview — Bar gauge panel
nextcloud_shares_num_fed_shares_sent
nextcloud_shares_num_fed_shares_received

Dashboard JSON Export

Once you have built your dashboard, export it as JSON from Dashboard Settings > JSON Model. Store this JSON in version control alongside your infrastructure code. This enables you to recreate the dashboard automatically in disaster recovery scenarios. For backup strategies that include monitoring configuration, see our Nextcloud backup and disaster recovery guide.

Step 6: Alerting Configuration

Dashboards are for humans watching screens. Alerts are for when nobody is watching. A well-configured alerting system ensures critical issues get immediate attention.

Prometheus Alerting Rules

Create alerting rules at /etc/prometheus/rules/nextcloud.yml:

# /etc/prometheus/rules/nextcloud.yml
groups:
  - name: nextcloud_alerts
    interval: 60s
    rules:
      # Nextcloud instance is down
      - alert: NextcloudDown
        expr: nextcloud_up == 0
        for: 3m
        labels:
          severity: critical
        annotations:
          summary: "Nextcloud instance is unreachable"
          description: "The Nextcloud exporter has been unable to reach the instance for more than 3 minutes."
          runbook_url: "https://wiki.internal/runbooks/nextcloud-down"

      # Low disk space on data directory
      - alert: NextcloudDiskSpaceLow
        expr: nextcloud_storage_free_bytes < 10737418240
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Nextcloud storage free space below 10 GB"
          description: "Free space is {{ $value | humanize1024 }}. Investigate and expand storage or clean up old files."

      # Critical disk space
      - alert: NextcloudDiskSpaceCritical
        expr: nextcloud_storage_free_bytes < 2147483648
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Nextcloud storage free space below 2 GB"
          description: "Free space is {{ $value | humanize1024 }}. Immediate action required to prevent service disruption."

      # High memory usage
      - alert: NextcloudHighMemoryUsage
        expr: (1 - (nextcloud_system_mem_free_bytes / nextcloud_system_mem_total_bytes)) * 100 > 90
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Nextcloud server memory usage above 90%"
          description: "Memory usage has been at {{ $value | printf \"%.1f\" }}% for more than 10 minutes."

      # PHP OPcache hit rate low
      - alert: NextcloudOPcacheHitRateLow
        expr: nextcloud_php_opcache_hit_rate < 90
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "PHP OPcache hit rate below 90%"
          description: "OPcache hit rate is {{ $value | printf \"%.1f\" }}%. Consider increasing opcache.max_accelerated_files or opcache.memory_consumption."

      # High CPU load
      - alert: NextcloudHighCPULoad
        expr: nextcloud_system_cpuload{period="5"} > 4
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Nextcloud server CPU load is high"
          description: "5-minute CPU load average is {{ $value | printf \"%.2f\" }}. Check for runaway cron jobs or heavy user activity."

      # App updates available (informational)
      - alert: NextcloudAppUpdatesAvailable
        expr: nextcloud_apps_updates_available > 0
        for: 24h
        labels:
          severity: info
        annotations:
          summary: "{{ $value }} Nextcloud app updates are available"
          description: "App updates have been pending for more than 24 hours. Review and apply updates during maintenance window."

  - name: ssl_alerts
    rules:
      # SSL certificate expiry (requires blackbox exporter or probe)
      - alert: SSLCertificateExpiringSoon
        expr: probe_ssl_earliest_cert_expiry - time() < 14 * 24 * 3600
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "SSL certificate expires in less than 14 days"
          description: "Certificate for {{ $labels.instance }} expires in {{ $value | humanizeDuration }}."

Validate Alerting Rules

# Check rule syntax
promtool check rules /etc/prometheus/rules/nextcloud.yml

# Reload Prometheus to pick up new rules
curl -X POST http://localhost:9090/-/reload

Alertmanager Configuration

Install and configure Alertmanager to route alerts to the appropriate channels:

# /etc/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'smtp-password'
  smtp_require_tls: true

route:
  group_by: ['alertname', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'default-email'

  routes:
    # Critical alerts go to Slack immediately
    - match:
        severity: critical
      receiver: 'slack-critical'
      group_wait: 10s
      repeat_interval: 1h

    # Info alerts aggregated and sent via email
    - match:
        severity: info
      receiver: 'default-email'
      group_wait: 1h
      repeat_interval: 24h

receivers:
  - name: 'default-email'
    email_configs:
      - to: 'ops-team@example.com'
        send_resolved: true

  - name: 'slack-critical'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
        channel: '#nextcloud-alerts'
        title: '{{ .CommonAnnotations.summary }}'
        text: '{{ .CommonAnnotations.description }}'
        send_resolved: true

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname']

Grafana Alert Notifications

You can also configure alert rules directly in Grafana as an alternative or supplement to Prometheus alerting rules. Grafana supports notification channels including email, Slack, Microsoft Teams, PagerDuty, OpsGenie, and generic webhooks. Configure these under Alerting > Contact points in the Grafana UI.

Step 7: Node Exporter for System Metrics

While the Nextcloud Exporter provides application-level metrics, Node Exporter gives you the complete picture of the underlying operating system. Many Nextcloud performance issues stem from system-level resource exhaustion that only Node Exporter can reveal.

Install Node Exporter

# Download and install
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.1/node_exporter-1.8.1.linux-amd64.tar.gz
tar xzf node_exporter-1.8.1.linux-amd64.tar.gz
sudo mv node_exporter-1.8.1.linux-amd64/node_exporter /usr/local/bin/

# Create systemd service
sudo useradd --system --no-create-home --shell /usr/sbin/nologin node_exporter

# /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=node_exporter
Group=node_exporter
ExecStart=/usr/local/bin/node_exporter \
  --collector.filesystem.mount-points-exclude="^/(sys|proc|dev|host|etc)($$|/)" \
  --collector.netclass.ignored-devices="^(veth.*|br.*|docker.*|lo)$$" \
  --web.listen-address=:9100
Restart=on-failure
RestartSec=5
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

# Enable and start
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter

Key System Metrics for Nextcloud

These Node Exporter metrics directly affect Nextcloud performance and should be included in your dashboards:

node_cpu_seconds_total — CPU utilization per core. High iowait indicates disk bottlenecks that slow file uploads/downloads.
node_memory_MemAvailable_bytes — Actual available memory (accounts for buffers/cache). More accurate than MemFree.
node_filesystem_avail_bytes — Available disk space per mount point. Monitor both the Nextcloud data directory and the system root.
node_disk_io_time_seconds_total — Disk I/O utilization. Values consistently near 100% indicate the disk is saturated.
node_network_receive_bytes_total and node_network_transmit_bytes_total — Network throughput. Useful for detecting sync storms or DDoS patterns.
node_filefd_allocated — Open file descriptors. Nextcloud can exhaust file descriptors under heavy concurrent access.

Combining Nextcloud and System Metrics

The real power emerges when you correlate Nextcloud metrics with system metrics in the same dashboard. For example:

A spike in nextcloud_active_users_last5min correlating with high node_cpu_seconds_total tells you the server needs more CPU for your user count.
Growing nextcloud_database_size_bytes alongside increasing node_disk_io_time_seconds_total suggests it is time to move the database to faster storage.
Low nextcloud_php_opcache_hit_rate combined with high node_memory_MemAvailable_bytes means you have memory to spare — increase OPcache allocation.

This correlation approach is especially important when you need to decide whether to optimize your existing deployment or migrate to a different architecture.

Advanced Monitoring

Once the core monitoring stack is in place, consider these extensions for more comprehensive observability.

Log Aggregation with Loki

Metrics tell you something is wrong; logs tell you why. Grafana Loki is a log aggregation system designed to work seamlessly with Grafana. Install Promtail on your Nextcloud server to ship logs to Loki, then correlate log events with metric anomalies in the same Grafana dashboard.

Key log files to monitor:

/var/www/nextcloud/data/nextcloud.log — Application-level errors and warnings
/var/log/nginx/error.log or /var/log/apache2/error.log — Web server errors
/var/log/mysql/error.log or PostgreSQL logs — Database errors
/var/log/php-fpm/*.log — PHP process manager logs

Uptime Monitoring and Synthetic Checks

Add the Blackbox Exporter to perform synthetic HTTP checks against your Nextcloud instance. This catches issues that internal metrics might miss, such as DNS resolution failures, certificate problems, or reverse proxy misconfigurations.

# Prometheus scrape config for Blackbox Exporter
- job_name: "blackbox-nextcloud"
  metrics_path: /probe
  params:
    module: [http_2xx]
  static_configs:
    - targets:
        - https://your-nextcloud.example.com/status.php
        - https://your-nextcloud.example.com/login
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: localhost:9115

Database-Specific Monitoring

If you use MySQL/MariaDB, add the mysqld_exporter. For PostgreSQL, use the postgres_exporter. These provide granular database metrics including query performance, connection pool utilization, replication lag, and lock contention — all of which directly impact Nextcloud responsiveness.

# Key database metrics to watch:
# MySQL: mysql_global_status_slow_queries, mysql_global_status_connections
# PostgreSQL: pg_stat_activity_count, pg_stat_user_tables_seq_scan

Scaling Monitoring for Multi-Node Setups

For high-availability Nextcloud deployments with multiple application nodes, load balancers, and clustered databases:

Run Node Exporter on every server in the cluster
Run the Nextcloud Exporter against each application node separately
Use Prometheus federation or Thanos for multi-site monitoring
Add labels for node, role, and datacenter to distinguish metrics from different cluster members
Create separate dashboard rows for per-node and cluster-aggregate views

Monitoring Nextcloud on MassiveGRID

Setting up and maintaining a monitoring stack takes time and expertise. If you are running Nextcloud on MassiveGRID's managed cloud infrastructure, much of this monitoring comes built-in.

Built-in Infrastructure Monitoring

MassiveGRID's managed hosting platform includes 24/7 infrastructure monitoring for every deployment. This covers server health, network performance, storage utilization, and hardware fault detection — all handled by the MassiveGRID operations team without any configuration on your part.

24/7 NOC Team for Alert Response

When alerts fire at 3 AM, MassiveGRID's Network Operations Center is already watching. The NOC team responds to infrastructure alerts around the clock, handling issues like disk space expansion, memory allocation adjustments, and failover orchestration. This means your on-call engineers only get paged for application-level issues that require domain-specific knowledge.

Managed Monitoring as Part of the Service

With MassiveGRID, you still have full access to deploy your own Prometheus and Grafana stack for application-level Nextcloud metrics. But the infrastructure layer — the part that requires constant vigilance and rapid response — is handled for you. This hybrid approach gives you the best of both worlds: deep application visibility with your custom dashboards, plus reliable infrastructure monitoring from a team that does it professionally.

For organizations that want to focus on their Nextcloud deployment without worrying about the underlying infrastructure monitoring, MassiveGRID provides a complete solution.

Get Fully Monitored Nextcloud Hosting

Deploy Nextcloud on MassiveGRID's managed cloud with 24/7 infrastructure monitoring, automated alerts, and NOC support included.

Explore Nextcloud Hosting

Troubleshooting Common Issues

Even with careful setup, monitoring systems occasionally need debugging. Here are the most common issues and their solutions.

Exporter Connection Refused

Symptom: Prometheus shows the Nextcloud target as DOWN with a "connection refused" error.

Verify the exporter is running: systemctl status nextcloud-exporter
Check the exporter is listening on the expected port: ss -tlnp | grep 9205
Confirm firewall allows traffic: sudo ufw status or sudo iptables -L -n
Test connectivity from the Prometheus server: curl http://nextcloud-host:9205/metrics

Metrics Not Updating

Symptom: Grafana dashboards show stale data or flat lines.

Check the exporter logs for authentication errors: journalctl -u nextcloud-exporter -f
Verify the monitoring user credentials are still valid (passwords expire, app tokens get revoked)
Confirm the Nextcloud Server Info API returns fresh data: query the API directly with curl
Check Prometheus scrape errors: navigate to http://prometheus:9090/targets and inspect the "Last Scrape" and "Error" columns

Grafana Dashboard Showing "No Data"

Symptom: Panels display "No data" or "N/A" despite Prometheus having data.

Verify the data source is configured correctly in Grafana (Connections > Data Sources > test)
Check the time range selector — it might be set to a period before monitoring was configured
Test the PromQL query directly in Prometheus UI (http://prometheus:9090/graph) to confirm data exists
Ensure metric names match exactly (Nextcloud Exporter metric names may change between versions)
Check for label mismatches in queries that use label selectors like {instance="..."}

High Cardinality Warnings

Symptom: Prometheus logs warnings about high cardinality or runs out of memory.

This typically occurs when custom exporters or applications generate metrics with unbounded label values (e.g., user IDs, file paths)
The standard Nextcloud Exporter does not produce high-cardinality metrics, so check other exporters in your setup
Use promtool tsdb analyze /var/lib/prometheus to identify which metrics have the most series
Consider adding metric_relabel_configs to drop unnecessary high-cardinality labels at scrape time

Exporter Timing Out

Symptom: Prometheus marks the Nextcloud target as DOWN with timeout errors, but the exporter is running.

The Nextcloud Server Info API can be slow on large instances. Increase the exporter timeout: --timeout 60s
Increase the Prometheus scrape timeout for the Nextcloud job (must be less than or equal to scrape interval)
Check Nextcloud server load — the API queries the database, and a heavily loaded server will respond slowly
Consider caching the Server Info API response if you use a reverse proxy

From Reactive to Proactive: Transforming Nextcloud Administration

Setting up Prometheus and Grafana for Nextcloud monitoring is an investment that pays dividends every day your server runs. With the stack described in this guide, you gain:

Early warning — Alerts fire when resources approach critical thresholds, not after users are impacted
Root cause analysis — Correlated metrics pinpoint exactly which resource (CPU, memory, disk, network) is causing performance issues
Capacity planning — Historical trends show growth patterns, helping you plan upgrades before they become emergencies
Accountability — Dashboards provide objective SLA metrics you can share with stakeholders
Faster incident response — When something does go wrong, dashboards immediately show what changed and when

The monitoring architecture is also extensible. As your Nextcloud deployment grows — adding more users, enabling more apps, or scaling to a multi-node architecture — the monitoring stack scales with it. Add more exporters, create more dashboards, and refine your alerting rules as you learn what matters most for your specific environment.

Start with the basics: get the Nextcloud Exporter and Node Exporter running, configure a handful of critical alerts, and build a single overview dashboard. From there, iterate based on the incidents and questions that arise. Within a few weeks, you will wonder how you ever managed Nextcloud without it.