← Back to DevOps

Prometheus and Grafana: Monitoring Stack Setup Guide

Prometheus collects metrics from your infrastructure, and Grafana visualizes them in real-time dashboards. Together, they form the industry-standard open-source monitoring solution used by teams managing everything from microservices to Kubernetes clusters. This guide walks you through a production-ready setup in under an hour.

Why Prometheus and Grafana Matter

Most DevOps teams start with basic monitoring—maybe a few log files and manual checks. It doesn't scale. You need automatic metric collection, long-term storage, and visual insights into system behavior. That's where Prometheus steps in. It scrapes metrics from applications and infrastructure at regular intervals, stores them efficiently, and lets you query them with PromQL.

Grafana does something different: it's a visualization layer. You point it at Prometheus as a data source, and suddenly you're building dashboards that update in real-time. You can see CPU usage, memory consumption, request latency, and custom application metrics all in one place.

The combination is powerful because they're built to work together. Prometheus handles the heavy lifting of collection and storage. Grafana handles the presentation and alerting. Neither depends on the other, so you can swap components out if needed.

Prerequisites and Architecture

Before you start, you'll need:

A Linux server (Ubuntu 20.04+ or CentOS 8+) with 2+ CPU cores and 2GB+ RAM
Docker and Docker Compose installed (optional but recommended)
Basic familiarity with command-line tools and YAML configuration
Network access to expose ports 9090 (Prometheus) and 3000 (Grafana)

The architecture is straightforward: Prometheus scrapes metrics from exporters and applications, stores them in its time-series database, Grafana queries Prometheus for data, and you access Grafana's web UI to view dashboards. You'll also want a Node Exporter running on servers you want to monitor—it's a small agent that exposes system metrics.

Installing Prometheus

Start by downloading the latest Prometheus binary. At the time of writing, v2.53+ is recommended. Visit the official Prometheus download page to grab the latest version.

cd /opt
sudo wget https://github.com/prometheus/prometheus/releases/download/v2.53.0/prometheus-2.53.0.linux-amd64.tar.gz
sudo tar xvfz prometheus-2.53.0.linux-amd64.tar.gz
sudo mv prometheus-2.53.0.linux-amd64 prometheus
sudo chown -R nobody:nogroup /opt/prometheus

Next, create a systemd service file so Prometheus runs automatically:

sudo tee /etc/systemd/system/prometheus.service > /dev/null <



  Before starting the service, you need to configure Prometheus. The main configuration file is prometheus.yml. Let's create a basic one that scrapes itself and a Node Exporter:

  global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

  Save this to /opt/prometheus/prometheus.yml. Now enable and start the service:

  sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus

  Prometheus should now be running on http://localhost:9090. You can see the status page, query metrics, and check what's being scraped.

  Setting Up Node Exporter
  Node Exporter exposes system metrics like CPU, memory, disk, and network usage. Install it on any host you want to monitor. Here's the quickest way:

  cd /opt
sudo wget https://github.com/prometheus/node_exporter/releases/download/v1.8.0/node_exporter-1.8.0.linux-amd64.tar.gz
sudo tar xvfz node_exporter-1.8.0.linux-amd64.tar.gz
sudo mv node_exporter-1.8.0.linux-amd64 node_exporter
sudo chown -R nobody:nogroup /opt/node_exporter

  Create a systemd service for Node Exporter:

  sudo tee /etc/systemd/system/node_exporter.service > /dev/null <


  Enable and start it:

  sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

  Verify it's working by visiting http://localhost:9100/metrics. You'll see hundreds of system metrics in Prometheus exposition format. Go back to your Prometheus config and make sure you've added the node scrape job (we did this above). Reload Prometheus to pick up the new target.

  Installing and Configuring Grafana
  Grafana's installation varies by platform. On Ubuntu, the easiest approach is using the official repository:

  sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo apt-get update
sudo apt-get install -y grafana-server

  Start Grafana:

  sudo systemctl enable grafana-server
sudo systemctl start grafana-server
sudo systemctl status grafana-server

  Grafana runs on port 3000. Open http://localhost:3000 in your browser. The default credentials are admin/admin. You'll be prompted to change the password on first login—do that immediately.

  Now add Prometheus as a data source. Go to Configuration → Data Sources → Add Data Source. Select Prometheus. Set the URL to http://localhost:9090. Click Save & Test. If everything's connected, you'll see a green message confirming the link works.

  Creating Your First Dashboard
  With Prometheus feeding data into Grafana, you're ready to build dashboards. Let's start simple: create a panel showing CPU usage.
  Click the + icon in the sidebar and select Dashboard. Add a new panel. In the query editor, enter a PromQL query. Here's a useful one for CPU:

  100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

  This calculates the percentage of CPU not idle over the last 5 minutes. Name the panel "CPU Usage", set the units to percent, and save. You now have a working dashboard panel.

  Want to add more panels? Try memory usage:

  (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

  Or disk usage:

  (1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lowerfs|squashfs|vfat"} / node_filesystem_size_bytes{fstype!~"tmpfs|fuse.lowerfs|squashfs|vfat"})) * 100

  Grafana has thousands of pre-built dashboards available on Grafana's dashboard repository. You can import them by ID in seconds. Dashboard 1860 (Node Exporter Full) is particularly popular for system monitoring—it's comprehensive and well-maintained.

  Configuring Alerts
  Metrics alone aren't enough. You need to be notified when something's wrong. Prometheus and Grafana both support alerting, though they work differently.

  In Prometheus, create alert rules by adding a rules file. Create /opt/prometheus/rules.yml:

  groups:
  - name: system_alerts
    interval: 30s
    rules:
      - alert: HighCPUUsage
        expr: (100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 80
        for: 5m
        annotations:
          summary: "High CPU usage detected on {{ $labels.instance }}"
          description: "CPU usage is {{ $value }}%"
      
      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
        for: 5m
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is {{ $value }}%"

  Update prometheus.yml to include these rules:

  rule_files:
  - "rules.yml"

  Reload Prometheus to activate alerts. They'll appear in the Alerts section of the web UI. To actually get notified, you need an Alertmanager instance. For now, focus on getting the alerts firing correctly in the UI.

  Using Docker Compose (Optional)
  If you prefer containerization, here's a docker-compose.yml that brings everything up in one command:

  version: '3.8'
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
    restart: always

  node_exporter:
    image: prom/node-exporter:latest
    ports:
      - "9100:9100"
    restart: always

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
    volumes:
      - grafana_data:/var/lib/grafana
    restart: always

volumes:
  prometheus_data:
  grafana_data:

  Run docker-compose up -d and everything starts together. This is great for testing and development. For production, you'll want to customize image versions, environment variables, and storage.

  Best Practices for Production
  Once you're up and running, here's what separates a hobby setup from a production-grade monitoring stack:
  Retention Policy: Prometheus keeps metrics in memory and on disk. By default, it retains 15 days of data. For production, decide based on your needs. High-traffic systems might drop this to 7 days to save disk space. Use --storage.tsdb.retention.time=7d when starting Prometheus.
  Remote Storage: Local storage doesn't scale forever. For long-term retention, consider remote storage backends like Thanos, Cortex, or Victoria Metrics. They're designed to handle petabyte-scale metrics.
  Scrape Interval: The default 15-second scrape interval is fine for most setups. Lower it to 5 seconds only if you need high granularity and can handle the storage overhead