Monitoring Vault with Datadog
Challenge
Monitoring is a critical part of administration of any software system. Proactively collecting, visualizing, and analyzing data on Vault is no different, as insight into the details of how Vault instance(s) are important to decision making, quality assurance and troubleshooting.
Solution
In order to get aggregated data about the instance(s) of Vault, you can use Datadog. It is a mature monitoring solution that will report common resource metrics like CPU, memory and network metrics out of the box.
This tutorial covers set-up and configuration of Datadog Agent to monitor an instance of Vault Enterprise. Then you will look over the metrics available, and finally clean up the Datadog Agent and the local Vault installation.
Prerequisites
To enable the Datadog Agent to gather metrics from Vault Enterprise, you will need to have:
- Free Tier Datadog account. Sign up for a free account at Datadog Pricing.
- A Vault Enterprise environment. Refer to the Vault install guide to install Vault.
- Mac Workstation with MacOS 10.12 or higher, though much of this will be the same for other platforms.
Set up the Datadog agent locally
You will need an API key from Datadog. From the Datadog dashboard, select your user name at the bottom of the left navigation.
Select Organizational Settings, and then API Keys which lists existing API keys.
Select an API key created for your username.
Copy the API Key value, and then store it as a
DATADOG_API_KEY
environment variable.$ export DATADOG_API_KEY=<DATADOG_API_KEY>
Open a terminal and run the following command to install Datadog Agent.
$ DD_AGENT_MAJOR_VERSION=7 DD_API_KEY=$DATADOG_API_KEY \ DD_SITE="datadoghq.com" \ bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_mac_os.sh)"
Output:
... Your Agent is running properly. It will continue to run in the background and submit metrics to Datadog. You can check the agent status using the "datadog-agent status" command or by opening the webui using the "datadog-agent launch-gui" command. If you ever want to stop the Agent, please use the Datadog Agent App or the launchctl command. It will start automatically at login.
Verify that the agent is running.
$ datadog-agent status
Example output:
2022-07-29 15:59:23 PDT | CORE | WARN | (pkg/util/log/log.go:591 in func1) | Deactivating Autoconfig will disable most components. It's recommended to use autoconfig_exclude_features and autoconfig_include_features to activate/deactivate features selectively 2022-07-29 15:59:23 PDT | CORE | INFO | (cmd/system-probe/config/config.go:118 in Merge) | no config exists at system-probe.yaml, ignoring... 2022-07-29 15:59:23 PDT | CORE | ERROR | (cmd/system-probe/config/config.go:179 in load) | Could not parse system_probe_config.sysprobe_socket: system-probe unsupported 2022-07-29 15:59:23 PDT | CORE | INFO | (cmd/agent/app/status.go:125 in requestStatus) | Getting the status from the agent. 2022-07-29 15:59:28 PDT | CORE | INFO | (cmd/agent/app/status.go:163 in requestStatus) | =============== Agent (v7.38.0) =============== Status date: 2022-07-29 15:59:23.776 PDT / 2022-07-29 22:59:23.776 UTC (1659135563776) Agent start: 2022-07-29 15:59:02.482 PDT / 2022-07-29 22:59:02.482 UTC (1659135542482) Pid: 34339 Go Version: go1.17.11 Python Version: 3.8.13 Build arch: amd64 Agent flavor: agent Check Runners: 6 Log Level: info Paths ===== Config File: /opt/datadog-agent/etc/datadog.yaml conf.d: /opt/datadog-agent/etc/conf.d checks.d: /opt/datadog-agent/etc/checks.d ...snip...
If you encountered an error, try executing
datadog-agent stop
and thendatadog-agent run
again.
Set up the Vault integration
In order for Vault to send metrics, you need to set up access for the agent to connect to Datadog. For this tutorial, use unauthenticated access.
Open another terminal and start a Vault dev server with
root
as the root token.$ vault server -dev -dev-root-token-id root
Navigate to
/opt/datadog-agent/etc/conf.d/vault.d
.$ cd /opt/datadog-agent/etc/conf.d/vault.d
In the
/vault.d
directory, make a copy ofconf.yaml.example
.$ cp conf.yaml.example conf.yaml
Open the
conf.yaml
file with an editor of your choice. Find theinstances:
section, and notice that theapi_url
parameter points to the Vault address to pull metrics from which is set to the locally running Vault (http://localhost:8200/v1
).Locate the
no_token
parameter and set it totrue
for the convenience of this tutorial.conf.yaml
...snip... init_config: instances: ## @param api_url - string - required ## URL of the Vault to query. - api_url: http://localhost:8200/v1 ## @param no_token - boolean - optional - default: false ## Attempt metric collection without a token. no_token: true
Restart the Datadog Agent to apply the configuration changes.
Stop the Datadog Agent.
$ datadog-agent stop Agent successfully stopped
Run the agent again.
$ datadog-agent run 2022-07-21 10:20:15 CDT | CORE | INFO | (pkg/util/log/log.go:571 in func1) | runtime: final GOMAXPROCS value is: 10 2022-07-21 10:20:15 CDT | CORE | INFO | (pkg/util/log/log.go:571 in func1) | Features detected from environment: 2022-07-21 10:20:15 CDT | CORE | INFO | (cmd/agent/app/run.go:252 in StartAgent) | Starting Datadog Agent v7.37.1 2022-07-21 10:20:16 CDT | CORE | INFO | (cmd/agent/app/run.go:310 in StartAgent) ...
Monitor the Vault metrics
Now that the agent is installed and running, you need to validate that the agent is correctly sending data to Datadog. Metrics about both your workstation and your Vault Enterprise Instance should be streaming to Datadog.
In your Datadog dashboard, select Metrics > Explorer.
The Explorer page shows the default metrics of
system.cpu.user
.With Metrics selected, start typing in
vault.
and you will see a number of available Vault metrics.Explorer the available metrics collected by the Datadog.
Clean up
When you are done exploring, clean up both the Datadog Agent and the Vault environment.
Find your platform Uninstall the Agent for instructions on how to uninstall the agent for your platform of choice.
If you are running Vault locally in
dev
mode, stop the Vault dev server by pressing Ctrl+C where the server is running. Or, execute the following command.$ pgrep -f vault | xargs kill
References
- Details on Vault metrics available through Datadog: Metrics for Vault Cluster Heath and Leader Changes
- How to monitor HashiCorp Vault with Datadog
- Datadog Integrations
- Datadog Agent Troubleshooting
- Datadog Agent Commands
- Datadog Agent's Configuration Directory