July 31, 2022
Redis is a simple – but very well optimized – key-value open source database that is widely used in cloud-native applications. In this article, you will learn how to monitor Redis with Prometheus, and the most important metrics you should be looking at.
Despite its simplicity, Redis has become a key component of many Kubernetes and cloud applications. As a result, performance issues or problems with its resources can cause other components of the application to fail.
By knowing how to monitor Redis with Prometheus, and alerting on the top metrics and key performance indicators, you will be able to ensure the correct functioning of your Redis database and troubleshoot any potential issues.
Prometheus has become the de-facto standard for monitoring applications in Kubernetes environments. To monitor Redis with Prometheus, you can use the Redis Exporter.
First, you will need to create a user for the exporter in Redis with enough privileges to get stats, allowing it to generate metrics. This is done by using redis cli to run this command, replacing USERNAME and PASSWORD by the username and password for the exporter user:
ACL SETUSER USERNAME +client +ping +info +config|get +cluster|info +slowlog +latency +memory +select +get +scan +xinfo +type +pfcount +strlen +llen +scard +zcard +hlen +xlen +eval allkeys on >PASSWORD
Next, create a Secret in the Kubernetes namespace where you will install the exporter with the user and password that you just created in Redis:
kubectl create -n redis-exporter-namespace secret generic redis-exporter-auth \ --from-literal=user=USER \ --from-literal=password=PASSWORD
Now you just need to create a deployment for the exporter. You can use this example to create yours:
apiVersion: apps/v1kind: Deploymentmetadata: name: redis-exporterspec: selector: matchLabels: app: redis-exporter replicas: 1 template: metadata: labels: app: redis-exporter annotations: prometheus.io/port: "9121" prometheus.io/scrape: "true" spec: containers: - name: redis-exporter image: oliver006/redis_exporter:latest ports: - containerPort: 9121 env: - name: REDIS_ADDR value: 'redis://redis:6379' - name: REDIS_USER valueFrom: secretKeyRef: name: redis-exporter-auth key: user - name: REDIS_PASSWORD valueFrom: secretKeyRef: name: redis-exporter-auth key: password resources: limits: memory: "256Mi" cpu: "256m"
Remember to edit the value of the environmental variable REDIS_ADDR in the deployment manifest to the endpoint of your Redis server.
After applying the Deployment for the exporter, Prometheus will automatically start scraping the Redis metrics as it already has the standard annotations.
The first step to monitor your Redis with Prometheus is the uptime of your server. The metric redis_uptime_in_seconds gives you information on when was the last restart of the server, and an alert on this metric can help you identify unscheduled restarts:
redis_uptime_in_seconds < 300
Let’s write a PromQL query to detect rejected connections with the redis_rejected_connections_total metric:
(rate(redis_rejected_connections_total[5m])) > 0
Rejected connections can be caused because Redis reached the maximum number of client connections. You can use the redis_connected_clients and redis_config_maxclients metrics to calculate the number of available connections in your Redis server:
redis_config_maxclients - redis_connected_clients
Also, you can create an alert to detect when the usage of network connections is over 85 percent:
(redis_connected_clients / redis_config_maxclients) > 0.85
Want to dig deeper on how to write Prometheus queries?
Check out our PromQL getting started guide. It includes a nice PromQL cheatsheet too!
There are different ways to detect that a Redis server is having high latency issues. One of them is to monitor the redis_slowlog_length metric. The slow queries log, as in other databases, registers the queries that took too long to perform. You can monitor this metric with the following PromQL query:
rate(redis_slowlog_length[5m])
Also, you can calculate the average response time of the Redis server by using the redis_commands_duration_seconds_total and redis_commands_processed_total metrics. The following query will alert when the average response time is over 250ms:
(rate(redis_commands_duration_seconds_total[5m]) / rate(redis_commands_processed_total[5m]) ) > 0.250
As most databases, Redis uses an in-memory cache to accelerate the response time and improve performance. A low hit ratio can cause an increase in the response time of your Redis database. To calculate the cache hit ratio, you can use the following PromQL query:
(rate(redis_keyspace_hits_total[5m]) / (rate(redis_keyspace_misses_total[5m]) + rate(redis_keyspace_hits_total[5m])))
A common number should be over 0.9. Also, you can monitor the redis_evicted_keys_total metric to assess the number of keys that are being evicted from the in-memory cache to disk:
rate(redis_evicted_keys_total[5m])
Having a high cache hit ratio or a high number of evicted keys can be a symptom of low available memory. You can use the redis_memory_max_bytes and redis_memory_used_bytes metrics to calculate the usage of memory or your Redis server:
100 * redis_memory_used_bytes / redis_memory_max_bytes
Also, to optimize how Redis uses the available memory, you can use the redis_mem_fragmentation_ratio metric to check the fragmentation ratio. Values over 1.5 would mean you need to restart the Redis server to ensure optimal use of the memory. You can create an alert like this:
redis_mem_fragmentation_ratio > 1.5
In this article, you learned how to monitor Redis with Prometheus by using an open source exporter. You also learned which Redis top performance metrics you should look at to ensure the correct functioning of your Redis instances, and to troubleshoot possible issues in your database.
Now, you can try out these free Redis monitoring dashboards for both Grafana and Sysdig Monitor.
Register now for the free Sysdig Monitor trial and start monitoring your Redis server right away with Sysdig’s managed Prometheus service. You’ll find several Prometheus monitoring integrations and lots of out-of-the-box dashboards that will help you monitor your infrastructure and guide you through troubleshooting situations.