The context
SNMP
Devices have a lot of things that can be monitored, we will call them metrics here. These metrics can be considered sensitive, so we will here protect them, note that in some place they are not considered sensitive and can be gathered publicly, I do not recommend that.
To solve the problem of all the devices having their own kind of metrics, own interfaces to them and custom formats, a standard was developed, called SNMP. It has been through several iterations, and the current version is SNMP v3 which is the one we are going to implement.
SNMP metrics can be accessed through different means. Either the monitored device is talking to a trap-server that will gather all the values. Or as we are going to configure it here, as a service, listening on an interface that allows authenticated and encrypted communication.
Many things can be done with SNMP, including configuring devices. In my case here, I didn’t need that, so the user is read-only.
Prometheus and Grafana
I am using Prometheus to collect and store metrics and Grafana to make the dashboards. They are easy to use and implement. And it looks great out of the box.
The problem
- I have several devices that can report their status using SNMP.
- I want to monitor these devices using nice looking dashboards and have alerts.
- Documentation on how all these moving parts can fit together is between sparse and non-existent (or my search engine skills are rusted)
The Solution
Configuration of an Ubiquiti Router
The test device in question is an EdgeRouter Lite. A nice (and inexpensive) piece of equipment.
In the console, here is the configuration I used. It is really generic, it gives full read-only access to all metrics to that user. Using SNMP v3 with authentication and privacy.
Replace all the values between <> with what you want (you can simply remove the < > and it should work for a demo). I couldn’t find a way to generate the encrypted passwords. This should have happened automatically, but is seems that the feature was never added according to this thread.
For simple tasks, we do not really care about the engineID. This becomes useful with large installations where you want to have different contexts. See the RFC5343 if you want more details on how all of this fits together.
user@machine# show service snmp
contact <you@provider.com>
description <myroutersnmp>
listen-address <10.0.0.1> {
}
location "<Closet>"
v3 {
engineid <0x1234>
group viewer {
mode ro
seclevel priv
view simpleview
}
user <username> {
auth {
encrypted-key ""
plaintext-key <mysecretpassword>
type sha
}
engineid <0x1234>
group viewer
privacy {
encrypted-key ""
plaintext-key <mysecretpassword>
type aes
}
}
view simpleview {
oid 1 {
}
}
}
Configuration of the SNMP Prometheus exporter.
Generating the configuration
I’m using the official SNMP exporter. To use with SNMP v3 it requires a little bit of tweaking. The described configuration is really simple, you will need to add the metrics you want. It is usually not recommended to add all metrics because “one day we may need it”.
Pull the repository, build it (follow the instructions in the README), go in the generator* directory and make a generator.yml file.
Again customize the values between < and > . You can also add other metrics at this stage in walk.
modules:
<cpu_net_uptime>:
walk:
- 1.3.6.1.2.1.2 # Same as "interfaces"
- sysUpTime # Same as "1.3.6.1.2.1.1.3"
- 1.3.6.1.2.1.25.3.3.1.2 # CPUs
version: 3
max_repetitions: 25
retries: 3
timeout: 10s
auth:
username: <username>
security_level: authPriv
password: <mysecretpassword>
auth_protocol: SHA
priv_protocol: AES
priv_password: <mysecretpassword>
run
MIBDIRS=mibs ./generator generate
This will generate a snmp.yml file that will be used by the node exporter.
Starting the SNMP exporter
In the root of the repo, run
./snmp_exporter --config.file=generator/snmp.yml
Add a job in Prometheus
Add to your Prometheus config (replace the <10.0.0.1>, <127.0.0.1:9116> and <cpu_net_uptime> according to your config):
- job_name: 'snmp'
static_configs:
- targets:
- <10.0.0.1> # The SNMP device (you can add more here).
metrics_path: /snmp
params:
module: [<cpu_net_uptime>]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: <127.0.0.1:9116>
Conclusion
It works. It took more work than expected, but…