This project provides a comprehensive solution for continuous network validation within a Kubernetes cluster. Leveraging industry-standard tools like iperf3, Prometheus, and Grafana, it offers proactive monitoring of network performance between nodes, helping to identify and troubleshoot latency, bandwidth, and packet loss issues before they impact applications.
The service is based on a decoupled architecture:
iperf3 server pod on every node to act as a test endpoint. Running on the host network to measure raw node performance.iperf3 client tests against them, parses the JSON output, and exposes performance metrics via an HTTP endpoint.kube-prometheus-stack) that scrapes the exporter’s metrics and visualizes them in a custom dashboard.This separation of concerns ensures scalability, resilience, and aligns with Kubernetes operational principles.
kubectl configured to connect to your cluster.Add the Helm chart repository (replace with your actual repo URL once published):
helm repo add iperf3-monitor https://malarinv.github.io/iperf3-monitor/
Update your Helm repositories:
helm repo update
Install the chart:
helm install iperf3-monitor iperf3-monitor/iperf3-monitor \
--namespace monitoring # Or your preferred namespace \
--create-namespace \
--values values.yaml # Optional: Use a custom values file
Note: Ensure your Prometheus instance is configured to scrape services in the namespace where you install the chart and that it recognizes
ServiceMonitorresources with the labelrelease: prometheus-operator(if using the standardkube-prometheus-stacksetup).
The Helm chart is highly configurable via the values.yaml file. You can override default settings by creating your own values.yaml and passing it during installation (--values my-values.yaml).
Refer to the comments in the default values.yaml for a detailed explanation of each parameter:
# Default values for iperf3-monitor.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
# -- Override the name of the chart.
nameOverride: ""
# -- Override the fully qualified app name.
fullnameOverride: ""
exporter:
# -- Configuration for the exporter container image.
image:
# -- The container image repository for the exporter.
repository: ghcr.io/malarinv/iperf3-monitor
# -- The container image tag for the exporter. If not set, the chart's appVersion is used.
tag: ""
# -- The image pull policy for the exporter container.
pullPolicy: IfNotPresent
# -- Number of exporter pod replicas. Typically 1 is sufficient.
replicaCount: 1
# -- Interval in seconds between complete test cycles (i.e., testing all server nodes).
testInterval: 300
# -- Timeout in seconds for a single iperf3 test run.
testTimeout: 10
# -- Protocol to use for testing (tcp or udp).
testProtocol: tcp
# -- CPU and memory resource requests and limits for the exporter pod.
# @default -- A small default is provided if commented out.
resources: {}
# requests:
# cpu: "100m"
# memory: "128Mi"
# limits:
# cpu: "500m"
# memory: "256Mi"
server:
# -- Configuration for the iperf3 server container image (DaemonSet).
image:
# -- The container image repository for the iperf3 server.
repository: networkstatic/iperf3
# -- The container image tag for the iperf3 server.
tag: latest
# -- CPU and memory resource requests and limits for the iperf3 server pods (DaemonSet).
# These should be very low as the server is mostly idle.
# @default -- A small default is provided if commented out.
resources: {}
# requests:
# cpu: "50m"
# memory: "64Mi"
# limits:
# cpu: "100m"
# memory: "128Mi"
# -- Node selector for scheduling iperf3 server pods.
# Use this to restrict the DaemonSet to a subset of nodes.
# @default -- {} (schedule on all nodes)
nodeSelector: {}
# -- Tolerations for scheduling iperf3 server pods on tainted nodes (e.g., control-plane nodes).
# This is often necessary to include master nodes in the test mesh.
# @default -- Tolerates control-plane and master taints.
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
rbac:
# -- If true, create ServiceAccount, ClusterRole, and ClusterRoleBinding for the exporter.
# Set to false if you manage RBAC externally.
create: true
serviceAccount:
# -- The name of the ServiceAccount to use for the exporter pod.
# Only used if rbac.create is false. If not set, it defaults to the chart's fullname.
name: ""
serviceMonitor:
# -- If true, create a ServiceMonitor resource for integration with Prometheus Operator.
# Requires a running Prometheus Operator in the cluster.
enabled: true
# -- Scrape interval for the ServiceMonitor. How often Prometheus scrapes the exporter metrics.
interval: 60s
# -- Scrape timeout for the ServiceMonitor. How long Prometheus waits for metrics response.
scrapeTimeout: 30s
# -- Configuration for the exporter Service.
service:
# -- Service type. ClusterIP is typically sufficient.
type: ClusterIP
# -- Port on which the exporter service is exposed.
port: 9876
# -- Target port on the exporter pod.
targetPort: 9876
# -- Optional configuration for a network policy to allow traffic to the iperf3 server DaemonSet.
# This is often necessary if you are using a network policy controller.
networkPolicy:
# -- If true, create a NetworkPolicy resource.
enabled: false
# -- Specify source selectors if needed (e.g., pods in a specific namespace).
from: []
# -- Specify namespace selectors if needed.
namespaceSelector: {}
# -- Specify pod selectors if needed.
podSelector: {}
A custom Grafana dashboard is provided to visualize the collected iperf3 metrics.
Dashboards -> Import.Load.Import.{
"__inputs": [],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "8.0.0"
},
{
"type": "datasource",
"id": "prometheus",
"name": "Prometheus",
"version": "1.0.0"
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"links": [],
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"gridPos": {
"h": 9,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"targets": [
{
"expr": "avg(iperf_network_bandwidth_mbps) by (source_node, destination_node)",
"format": "heatmap",
"legendFormat": " -> ",
"refId": "A"
}
],
"cards": { "cardPadding": null, "cardRound": null },
"color": {
"mode": "spectrum",
"scheme": "red-yellow-green",
"exponent": 0.5,
"reverse": false
},
"dataFormat": "tsbuckets",
"yAxis": { "show": true, "format": "short" },
"xAxis": { "show": true }
},
{
"title": "Bandwidth Over Time (Source: $source_node, Dest: $destination_node)",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 9
},
"targets": [
{
"expr": "iperf_network_bandwidth_mbps{source_node=~\"^$source_node$\", destination_node=~\"^$destination_node$\", protocol=~\"^$protocol$\"}",
"legendFormat": "Bandwidth",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "mbps"
}
}
},
{
"title": "Jitter Over Time (Source: $source_node, Dest: $destination_node)",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 9
},
"targets": [
{
"expr": "iperf_network_jitter_ms{source_node=~\"^$source_node$\", destination_node=~\"^$destination_node$\", protocol=\"udp\"}",
"legendFormat": "Jitter",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "ms"
}
}
}
],
"refresh": "30s",
"schemaVersion": 36,
"style": "dark",
"tags": ["iperf3", "network", "kubernetes"],
"templating": {
"list": [
{
"current": {},
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"definition": "label_values(iperf_network_bandwidth_mbps, source_node)",
"hide": 0,
"includeAll": false,
"multi": false,
"name": "source_node",
"options": [],
"query": "label_values(iperf_network_bandwidth_mbps, source_node)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
},
{
"current": {},
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"definition": "label_values(iperf_network_bandwidth_mbps{source_node=~\"^$source_node$\"}, destination_node)",
"hide": 0,
"includeAll": false,
"multi": false,
"name": "destination_node",
"options": [],
"query": "label_values(iperf_network_bandwidth_mbps{source_node=~\"^$source_node$\"}, destination_node)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
},
{
"current": { "selected": true, "text": "tcp", "value": "tcp" },
"hide": 0,
"includeAll": false,
"multi": false,
"name": "protocol",
"options": [
{ "selected": true, "text": "tcp", "value": "tcp" },
{ "selected": false, "text": "udp", "value": "udp" }
],
"query": "tcp,udp",
"skipUrlSync": false,
"type": "custom"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "browser",
"title": "Kubernetes iperf3 Network Performance",
"uid": "k8s-iperf3-dashboard",
"version": 1,
"weekStart": ""
}
The project follows a standard structure:
.
├── .github/
│ └── workflows/
│ └── release.yml # GitHub Actions workflow for CI/CD
├── charts/
│ └── iperf3-monitor/ # The Helm chart for the service
│ ├── Chart.yaml
│ ├── values.yaml
│ └── templates/
│ ├── _helpers.tpl
│ ├── server-daemonset.yaml
│ ├── exporter-deployment.yaml
│ ├── rbac.yaml
│ ├── service.yaml
│ └── servicemonitor.yaml
└── exporter/
├── Dockerfile # Dockerfile for the exporter
├── requirements.txt # Python dependencies
└── exporter.py # Exporter source code
├── .gitignore # Specifies intentionally untracked files
├── LICENSE # Project license
└── README.md # This file
The project includes a GitHub Actions workflow (.github/workflows/release.yml) triggered on Git tags (v*.*.*) to automate:
ghcr.io).This project is licensed under the GNU Affero General Public License v3. See the LICENSE file for details.