Overview
NVIDIA System Management Interface (nvidia-smi) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices. Telegraf is a plug-in driven server agent for collecting and sending metrics and events from databases, systems and IoT sensors.
To send your Prometheus-format NVIDIA SMI metrics to Logz.io, you need to add the inputs.nvidia_smi and outputs.http plug-ins to your Telegraf configuration file.
Configuring Telegraf to send your metrics data to Logz.io
Set up Telegraf v1.17 or higher on the same machine as the NVIDIA card
Ubuntu & Debian
sudo apt-get update && sudo apt-get install telegraf
The configuration file is located at /etc/telegraf/telegraf.conf
.
RedHat and CentOS
sudo yum install telegraf
The configuration file is located at /etc/telegraf/telegraf.conf
.
SLES & openSUSE
# add go repository
zypper ar -f obs://devel:languages:go/ go
# install latest telegraf
zypper in telegraf
The configuration file is located at /etc/telegraf/telegraf.conf
.
FreeBSD/PC-BSD
sudo pkg install telegraf
The configuration file is located at /etc/telegraf/telegraf.conf
.
Add the inputs.nvidia_smi plug-in
First you need to configure the input plug-in to enable Telegraf to scrape the NVIDIA SMI data from your hosts. To do this, add the following code to the configuration file:
[[inputs.nvidia_smi]]
## Optional: path to nvidia-smi binary, defaults to $PATH via exec.LookPath
# bin_path = "/usr/bin/nvidia-smi"
## Optional: timeout for GPU polling
# timeout = "5s"
The database name is only required for instantiating a connection with the server and does not restrict the databases that we collect metrics from. The full list of data scraping and configuring options can be found here.
Add the outputs.http plug-in
After you create the configuration file, configure the output plug-in to enable Telegraf to send your data to Logz.io in Prometheus-format. To do this, add the following code to the configuration file:
[[outputs.http]]
url = "https://<<LISTENER-HOST>>:8053"
data_format = "prometheusremotewrite"
[outputs.http.headers]
Content-Type = "application/x-protobuf"
Content-Encoding = "snappy"
X-Prometheus-Remote-Write-Version = "0.1.0"
Authorization = "Bearer <<PROMETHEUS-METRICS-SHIPPING-TOKEN>>"
Replace the placeholders to match your specifics. (They are indicated by the double angle brackets << >>
):
- Replace
<<PROMETHEUS-METRICS-SHIPPING-TOKEN>>
with a token for the Metrics account you want to ship to.
Here’s how to look up your Metrics token. - Replace
<<LISTENER-HOST>>
with the Logz.io Listener URL for your region, configured to use port 8052 for http traffic, or port 8053 for https traffic. For example,listener.logz.io
if your account is hosted on AWS US East, orlistener-nl.logz.io
if hosted on Azure West Europe.
Check Logz.io for your metrics
Give your data some time to get from your system to ours, then log in to your Logz.io Metrics account, and open the Logz.io Metrics tab.