iperf3-monitor

Kubernetes-Native Network Performance Monitoring Service

This project provides a comprehensive solution for continuous network validation within a Kubernetes cluster. Leveraging industry-standard tools like iperf3, Prometheus, and Grafana, it offers proactive monitoring of network performance between nodes, helping to identify and troubleshoot latency, bandwidth, and packet loss issues before they impact applications.

Features

Continuous N-to-N Testing: Automatically measures network performance between all nodes in the cluster.
Kubernetes-Native: Deploys as standard Kubernetes workloads (DaemonSet, Deployment).
Dynamic Discovery: Exporter automatically discovers iperf3 server pods using the Kubernetes API.
Prometheus Integration: Translates iperf3 results into standard Prometheus metrics for time-series storage.
Grafana Visualization: Provides a rich, interactive dashboard with heatmaps and time-series graphs.
Helm Packaging: Packaged as a Helm chart for easy deployment and configuration management.
Automated CI/CD: Includes a GitHub Actions workflow for building and publishing the exporter image and Helm chart.

Architecture

The service is based on a decoupled architecture:

iperf3-server DaemonSet: Deploys an iperf3 server pod on every node to act as a test endpoint. Running on the host network to measure raw node performance.
iperf3-exporter Deployment: A centralized service that uses the Kubernetes API to discover server pods, orchestrates iperf3 client tests against them, parses the JSON output, and exposes performance metrics via an HTTP endpoint.
Prometheus & Grafana Stack: A standard monitoring backend (like kube-prometheus-stack) that scrapes the exporter’s metrics and visualizes them in a custom dashboard.

This separation of concerns ensures scalability, resilience, and aligns with Kubernetes operational principles.

Getting Started

Prerequisites

A running Kubernetes cluster.
kubectl configured to connect to your cluster.
Helm v3+ installed.
A Prometheus instance configured to scrape services (ideally using the Prometheus Operator and ServiceMonitors).
A Grafana instance accessible and configured with Prometheus as a data source.

Installation with Helm

Add the Helm chart repository (replace with your actual repo URL once published):
```
helm repo add iperf3-monitor https://malarinv.github.io/iperf3-monitor/
```
Update your Helm repositories:
```
helm repo update
```
Install the chart:
```
helm install iperf3-monitor iperf3-monitor/iperf3-monitor \
  --namespace monitoring # Or your preferred namespace \
  --create-namespace \
  --values values.yaml # Optional: Use a custom values file
```
Note: Ensure your Prometheus instance is configured to scrape services in the namespace where you install the chart and that it recognizes ServiceMonitor resources with the label release: prometheus-operator (if using the standard kube-prometheus-stack setup).

Configuration

The Helm chart is highly configurable via the values.yaml file. You can override default settings by creating your own values.yaml and passing it during installation (--values my-values.yaml).

Refer to the comments in the default values.yaml for a detailed explanation of each parameter:

# Default values for iperf3-monitor.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# -- Override the name of the chart.
nameOverride: ""

# -- Override the fully qualified app name.
fullnameOverride: ""

exporter:
  # -- Configuration for the exporter container image.
  image:
    # -- The container image repository for the exporter.
    repository: ghcr.io/malarinv/iperf3-monitor
    # -- The container image tag for the exporter. If not set, the chart's appVersion is used.
    tag: ""
    # -- The image pull policy for the exporter container.
    pullPolicy: IfNotPresent

  # -- Number of exporter pod replicas. Typically 1 is sufficient.
  replicaCount: 1

  # -- Interval in seconds between complete test cycles (i.e., testing all server nodes).
  testInterval: 300

  # -- Timeout in seconds for a single iperf3 test run.
  testTimeout: 10

  # -- Protocol to use for testing (tcp or udp).
  testProtocol: tcp

  # -- CPU and memory resource requests and limits for the exporter pod.
  # @default -- A small default is provided if commented out.
  resources: {}
    # requests:
    #   cpu: "100m"
    #   memory: "128Mi"
    # limits:
    #   cpu: "500m"
    #   memory: "256Mi"

server:
  # -- Configuration for the iperf3 server container image (DaemonSet).
  image:
    # -- The container image repository for the iperf3 server.
    repository: networkstatic/iperf3
    # -- The container image tag for the iperf3 server.
    tag: latest

  # -- CPU and memory resource requests and limits for the iperf3 server pods (DaemonSet).
  # These should be very low as the server is mostly idle.
  # @default -- A small default is provided if commented out.
  resources: {}
    # requests:
    #   cpu: "50m"
    #   memory: "64Mi"
    # limits:
    #   cpu: "100m"
    #   memory: "128Mi"

  # -- Node selector for scheduling iperf3 server pods.
  # Use this to restrict the DaemonSet to a subset of nodes.
  # @default -- {} (schedule on all nodes)
  nodeSelector: {}

  # -- Tolerations for scheduling iperf3 server pods on tainted nodes (e.g., control-plane nodes).
  # This is often necessary to include master nodes in the test mesh.
  # @default -- Tolerates control-plane and master taints.
  tolerations:
    - key: "node-role.kubernetes.io/control-plane"
      operator: "Exists"
      effect: "NoSchedule"
    - key: "node-role.kubernetes.io/master"
      operator: "Exists"
      effect: "NoSchedule"

rbac:
  # -- If true, create ServiceAccount, ClusterRole, and ClusterRoleBinding for the exporter.
  # Set to false if you manage RBAC externally.
  create: true

serviceAccount:
  # -- The name of the ServiceAccount to use for the exporter pod.
  # Only used if rbac.create is false. If not set, it defaults to the chart's fullname.
  name: ""

serviceMonitor:
  # -- If true, create a ServiceMonitor resource for integration with Prometheus Operator.
  # Requires a running Prometheus Operator in the cluster.
  enabled: true

  # -- Scrape interval for the ServiceMonitor. How often Prometheus scrapes the exporter metrics.
  interval: 60s

  # -- Scrape timeout for the ServiceMonitor. How long Prometheus waits for metrics response.
  scrapeTimeout: 30s

# -- Configuration for the exporter Service.
service:
  # -- Service type. ClusterIP is typically sufficient.
  type: ClusterIP
  # -- Port on which the exporter service is exposed.
  port: 9876
  # -- Target port on the exporter pod.
  targetPort: 9876

# -- Optional configuration for a network policy to allow traffic to the iperf3 server DaemonSet.
# This is often necessary if you are using a network policy controller.
networkPolicy:
  # -- If true, create a NetworkPolicy resource.
  enabled: false
  # -- Specify source selectors if needed (e.g., pods in a specific namespace).
  from: []
  # -- Specify namespace selectors if needed.
  namespaceSelector: {}
  # -- Specify pod selectors if needed.
  podSelector: {}

Grafana Dashboard

A custom Grafana dashboard is provided to visualize the collected iperf3 metrics.

Log in to your Grafana instance.
Navigate to Dashboards -> Import.
Paste the full JSON model provided below into the text area and click Load.
Select your Prometheus data source and click Import.

{
"__inputs": [],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "8.0.0"
},
{
"type": "datasource",
"id": "prometheus",
"name": "Prometheus",
"version": "1.0.0"
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"links": [],
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"targets": [
{
"expr": "avg(iperf_network_bandwidth_mbps) by (source_node, destination_node)",
"format": "heatmap",
"legendFormat": " -> ",
"refId": "A"
}
],
"cards": { "cardPadding": null, "cardRound": null },
"color": {
"mode": "spectrum",
"scheme": "red-yellow-green",
"exponent": 0.5,
"reverse": false
},
"dataFormat": "tsbuckets",
"yAxis": { "show": true, "format": "short" },
"xAxis": { "show": true }
},
{
"title": "Bandwidth Over Time (Source: $source_node, Dest: $destination_node)",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 9
},
"targets": [
{
"expr": "iperf_network_bandwidth_mbps{source_node=~\"^$source_node$\", destination_node=~\"^$destination_node$\", protocol=~\"^$protocol$\"}",
"legendFormat": "Bandwidth",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "mbps"
}
}
},
{
"title": "Jitter Over Time (Source: $source_node, Dest: $destination_node)",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 9
},
"targets": [
{
"expr": "iperf_network_jitter_ms{source_node=~\"^$source_node$\", destination_node=~\"^$destination_node$\", protocol=\"udp\"}",
"legendFormat": "Jitter",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "ms"
}
}
}
],
"refresh": "30s",
"schemaVersion": 36,
"style": "dark",
"tags": ["iperf3", "network", "kubernetes"],
"templating": {
"list": [
{
"current": {},
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"definition": "label_values(iperf_network_bandwidth_mbps, source_node)",
"hide": 0,
"includeAll": false,
"multi": false,
"name": "source_node",
"options": [],
"query": "label_values(iperf_network_bandwidth_mbps, source_node)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
},
{
"current": {},
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"definition": "label_values(iperf_network_bandwidth_mbps{source_node=~\"^$source_node$\"}, destination_node)",
"hide": 0,
"includeAll": false,
"multi": false,
"name": "destination_node",
"options": [],
"query": "label_values(iperf_network_bandwidth_mbps{source_node=~\"^$source_node$\"}, destination_node)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
},
{
"current": { "selected": true, "text": "tcp", "value": "tcp" },
"hide": 0,
"includeAll": false,
"multi": false,
"name": "protocol",
"options": [
{ "selected": true, "text": "tcp", "value": "tcp" },
{ "selected": false, "text": "udp", "value": "udp" }
],
"query": "tcp,udp",
"skipUrlSync": false,
"type": "custom"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "browser",
"title": "Kubernetes iperf3 Network Performance",
"uid": "k8s-iperf3-dashboard",
"version": 1,
"weekStart": ""
}

Repository Structure

The project follows a standard structure:

.
├── .github/
│   └── workflows/
│       └── release.yml    # GitHub Actions workflow for CI/CD
├── charts/
│   └── iperf3-monitor/    # The Helm chart for the service
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/
│           ├── _helpers.tpl
│           ├── server-daemonset.yaml
│           ├── exporter-deployment.yaml
│           ├── rbac.yaml
│           ├── service.yaml
│           └── servicemonitor.yaml
└── exporter/
    ├── Dockerfile         # Dockerfile for the exporter
    ├── requirements.txt   # Python dependencies
    └── exporter.py        # Exporter source code
├── .gitignore             # Specifies intentionally untracked files
├── LICENSE                # Project license
└── README.md              # This file

Development and CI/CD

The project includes a GitHub Actions workflow (.github/workflows/release.yml) triggered on Git tags (v*.*.*) to automate:

Linting the Helm chart.
Building and publishing the Docker image for the exporter to GitHub Container Registry (ghcr.io).
Updating the Helm chart version based on the Git tag.
Packaging and publishing the Helm chart to GitHub Pages.

License

This project is licensed under the GNU Affero General Public License v3. See the LICENSE file for details.

This site is open source. Improve this page.