Logging

Structured Audit Logging

Hotrod is a distributed system with remote structured logging, saved in a database. The various log commands can help you track the server, the targets and the pipes running on those targets.

We'll start with a simple pipe:

# uptimex.yml
name: uptimex
context:
    interval: 1s
input:
    exec:
        command: uptime
        interval: '{{interval}}'
actions:
    - extract:
        input-field: _raw
        remove: true
        pattern: 'load average: (\S+), (\S+), (\S+)'
        output-fields: [m1, m5, m15]
    - convert:
        - m1: num
        - m5: num
        - m15: num
    - script:
        let:
        - k: count()
    - filter:
        condition: k % 2 == 0
output:
    exec:
        command: cat

Assume we have a target, have loaded this pipe, and attached it to the target:

$ hotrod targets add test0 --id 0
$ hotrod pipes add --file uptimex.yml
$ hotrod targets update test0 --add-pipe uptimex
[INFO] user=admin adding pipe=uptimex to target=test0

At this point, the server logs look like this - you have an audit trail detailing these operations and the user responsible.

$ hotrod daemon logs
 timestamp           | level | message
---------------------+-------+--------------------------------------------------
 2020-06-16T14:17:50 | INFO  | user=admin successful login from 127.0.0.1:38212
 2020-06-16T14:17:59 | INFO  | user=admin assigning id=0 to target='test0'
 2020-06-16T14:18:22 | INFO  | user=admin adding pipe=uptimex
 2020-06-16T14:18:22 | INFO  | user=admin rebuilding targets: []
 2020-06-16T14:18:22 | INFO  | user=admin rebuilding targets: []
 2020-06-16T14:18:36 | INFO  | user=admin adding pipe=uptimex to target=test0

The target logs show the pipe being attached:

$ hotrod targets logs
 timestamp           | target | level | message
---------------------+--------+-------+----------------------------------------
 2020-06-16T14:18:13 | test0  | INFO  | current []
 2020-06-16T14:18:13 | test0  | INFO  | allowed []
 2020-06-16T14:18:13 | test0  | INFO  | agent connected to hotrodd
 2020-06-16T14:18:22 | test0  | INFO  | current []
 2020-06-16T14:18:22 | test0  | INFO  | allowed []
 2020-06-16T14:18:38 | test0  | INFO  | New Pipe detected: hotrod-pipe-uptimex
 2020-06-16T14:18:39 | test0  | INFO  | current ["hotrod-pipe-uptimex"]
 2020-06-16T14:18:39 | test0  | INFO  | allowed ["hotrod-pipe-uptimex"]
 2020-06-16T14:18:39 | test0  | INFO  | Pipes Updated...

And the pipe logs show the pipe being launched:

$ hotrod pipes logs
timestamp           | target | pipe    | level | step | function | message
---------------------+--------+---------+-------+------+----------+----------------------
2020-06-16T14:18:38 | test0  | uptimex | WARN  | 0    |          | pipe started
2020-06-16T14:18:39 | test0  | uptimex | INFO  | 0    |          | systemd pipe started

Pipe Logging

Logging from pipes themselves is not enabled by default. It is opt-in, on a target-by-target basis.

$ hotrod targets settings --enable-pipe-logging test0

Can now switch debug mode on for the pipe:

 $ hotrod pipes trace uptimex --enable-debugging --target test0

The pipe may only be scheduled to run in a few minutes time, but you can always force a restart:

$ hotrod pipes trace uptimex --command restart --target test0

After a few seconds waiting for a few samples to collect, can inspect the pipe logs:

$ hotrod pipes logs
 timestamp           | target | pipe    | level | step | function    | message
---------------------+--------+---------+-------+------+-------------+-------------------------------------------
 2020-06-16T14:18:38 | test0  | uptimex | WARN  | 0    |             | pipe started
 2020-06-16T14:18:39 | test0  | uptimex | INFO  | 0    |             | systemd pipe started
 2020-06-16T14:42:49 | test0  | uptimex | DEBUG | 5    | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1452}
 2020-06-16T14:42:51 | test0  | uptimex | DEBUG | 5    | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1454}
 2020-06-16T14:42:53 | test0  | uptimex | DEBUG | 5    | output-exec | {"m1":0.27,"m5":0.37,"m15":0.44,"k":1456}

Debug mode prints out the event that will be written out. In this case, events just go to the journal, but they may be sent using HTTP POST, plain TCP, etc. In any case, inspecting pipe data should not require shell access to remote devices. 'step' is the zero-based index of the pipe step, where the input is 0 and here the output is 5.

The field k comes from the Lua function count() which increments endlessly, and it is even because we use filter to drop out odd values.

Debugging needs to be switched off explicitly with --disable-debugging.

Tracing gives a lot more insight into the operation of a pipe. It prints out the trace id which thereafter references the result of this trace.

$ hotrod pipes trace --enable-tracing uptimex -t test0
trace id 1

By default, this will trace 5 samples and then switch off. After a few seconds, pass the trace id to see the resulting pipe logs:

$ hotrod pipes logs --trace-id 1
 timestamp           | target | pipe    | level | step | function       | message
---------------------+--------+---------+-------+------+----------------+------------------------------------------------------------------------------------------------------------------------------
 2020-06-16T15:41:14 | test0  | uptimex | TRACE | 1    | action-extract | {"_raw":" 15:41:14 up 50 days, 21:20,  1 user,  load average: 0.42, 0.44, 0.44"} -> [{"m1":"0.42","m5":"0.44","m15":"0.44"}]
 2020-06-16T15:41:14 | test0  | uptimex | TRACE | 2    | action-convert | {"m1":"0.42","m5":"0.44","m15":"0.44"} -> [{"m1":0.42,"m5":0.44,"m15":0.44}]
 2020-06-16T15:41:14 | test0  | uptimex | TRACE | 3    | action-script  | {"m1":0.42,"m5":0.44,"m15":0.44} -> [{"m1":0.42,"m5":0.44,"m15":0.44,"k":4957}]
 2020-06-16T15:41:14 | test0  | uptimex | TRACE | 4    | action-filter  | {"m1":0.42,"m5":0.44,"m15":0.44,"k":4957} -> []
 2020-06-16T15:41:15 | test0  | uptimex | TRACE | 1    | action-extract | {"_raw":" 15:41:15 up 50 days, 21:20,  1 user,  load average: 0.39, 0.43, 0.44"} -> [{"m1":"0.39","m5":"0.43","m15":"0.44"}]
 2020-06-16T15:41:15 | test0  | uptimex | TRACE | 2    | action-convert | {"m1":"0.39","m5":"0.43","m15":"0.44"} -> [{"m1":0.39,"m5":0.43,"m15":0.44}]
 2020-06-16T15:41:15 | test0  | uptimex | TRACE | 3    | action-script  | {"m1":0.39,"m5":0.43,"m15":0.44} -> [{"m1":0.39,"m5":0.43,"m15":0.44,"k":4958}]
 2020-06-16T15:41:15 | test0  | uptimex | TRACE | 4    | action-filter  | {"m1":0.39,"m5":0.43,"m15":0.44,"k":4958} -> [{"m1":0.39,"m5":0.43,"m15":0.44,"k":4958}]
 2020-06-16T15:41:15 | test0  | uptimex | TRACE | 5    | output-exec    | {"m1":0.39,"m5":0.43,"m15":0.44,"k":4958}
 ...

This literally provides a step-by-step view of the data flow; we can see that when t is 4957, it is not passed through by action-filter. In the next sample, data arrives at the output.

Traces can be enabled for pipes running on multiple targets. How can the outputs be distinguished? hotrod targets logs and hotrod pipes logs both have a --target flag to specify one or more targets to search for.

Pipe Errors

The general principle with the pipe framework is: do not discard data. This leads to a tolerant way of viewing errors. But without good diagnostics this makes it hard to troubleshoot pipe problems, since we are often only one typo away from success.

Consider this pipe, with a flaw. The intention is to extract the number of days this particular machine has been up.

# err.yml
name: err
input:
    exec:
        command: uptime
        interval: 2s
actions:
- extract:
    warning: true
    pattern: UP (\d+)
    output-fields: [up]
- convert:
  - up: num
output:
    write: console

Why warning: true? Because not matching a pattern is not always an error. However, we want to be sure in this case.

After adding this pipe and attaching it, here are the logs (with --pipe err to exclude uptimex logs):

$ hotrod pipes logs --pipe err
 timestamp           | target | pipe | level | step | function       | message
---------------------+--------+------+-------+------+----------------+----------------------------------
 2020-06-16T16:10:19 | test0  | err  | INFO  | 0    |                | systemd pipe started
 2020-06-16T16:10:20 | test0  | err  | WARN  | 1    | action-extract | extract: no captures with regex
 2020-06-16T16:10:22 | test0  | err  | WARN  | 1    | action-extract | extract: no captures with regex
 2020-06-16T16:10:24 | test0  | err  | WARN  | 1    | action-extract | extract: no captures with regex
 2020-06-16T16:10:26 | test0  | err  | WARN  | 1    | action-extract | extract: no captures with regex
 2020-06-16T16:10:28 | test0  | err  | WARN  | 1    | action-extract | extract: no captures with regex

To get all the complaints, switch on debugging as before:

$ hotrod pipes logs --pipe err
 timestamp           | target | pipe | level | step | function       | message
---------------------+--------+------+-------+------+----------------+----------------------------------------------------------------------------------
 2020-06-16T16:06:14 | test0  | err  | DEBUG | 3    | output-exec    | {"_raw":" 16:06:14 up 50 days, 21:45,  1 user,  load average: 0.59, 0.57, 0.51"}
 2020-06-16T16:06:16 | test0  | err  | WARN  | 1    | action-extract | extract: no captures with regex
 2020-06-16T16:06:16 | test0  | err  | WARN  | 2    | action-convert | fields not found up
 2020-06-16T16:06:16 | test0  | err  | DEBUG | 3    | output-exec    | {"_raw":" 16:06:16 up 50 days, 21:45,  1 user,  load average: 0.94, 0.64, 0.53"}
 2020-06-16T16:06:18 | test0  | err  | WARN  | 1    | action-extract | extract: no captures with regex
 2020-06-16T16:06:18 | test0  | err  | WARN  | 2    | action-convert | fields not found up
 2020-06-16T16:06:18 | test0  | err  | DEBUG | 3    | output-exec    | {"_raw":" 16:06:18 up 50 days, 21:45,  1 user,  load average: 0.94, 0.64, 0.53"}

We have two problems, flowing from the fact that 'UP' does not match 'up' - the pattern extraction, and then the failed conversion. Again, a field not found in convert might be fine, if we are dealing with events with different schemas.

So, the raw data - which may be precious - will always flow through.

Controlling Log Output

Logs have a level: trace, debug, warn, error. The minimum logging level is set with --level (default 'trace'). Use --level error to only show errors.

It is always possible to display logs in CSV format - this is one of the strengths of structured logging.

$ hotrod pipes logs --file csv
timestamp,target,pipe,level,step,function,message
2020-06-16T16:32:56,test0,err,WARN,1,action-extract,extract: no captures with regex
2020-06-16T16:32:58,test0,err,WARN,1,action-extract,extract: no captures with regex
2020-06-16T16:33:00,test0,err,WARN,1,action-extract,extract: no captures with regex
2020-06-16T16:33:02,test0,err,WARN,1,action-extract,extract: no captures with regex
...

By default, you will get the last 25 records - use --num-records to change this.

A start time can be specified, in a fairly relaxed format similar to that used by the Unix date command:

$ hotrod pipes logs --start-time '16 June' --num-records 10
 timestamp           | target | pipe    | level | step | function    | message
---------------------+--------+---------+-------+------+-------------+-------------------------------------------
 2020-06-16T14:18:38 | test0  | uptimex | WARN  | 0    |             | pipe started
 2020-06-16T14:18:39 | test0  | uptimex | INFO  | 0    |             | systemd pipe started
 2020-06-16T14:42:49 | test0  | uptimex | DEBUG | 5    | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1452}
 2020-06-16T14:42:51 | test0  | uptimex | DEBUG | 5    | output-exec | {"m1":0.29,"m5":0.38,"m15":0.44,"k":1454}
 2020-06-16T14:42:53 | test0  | uptimex | DEBUG | 5    | output-exec | {"m1":0.27,"m5":0.37,"m15":0.44,"k":1456}
 2020-06-16T14:42:55 | test0  | uptimex | DEBUG | 5    | output-exec | {"m1":0.27,"m5":0.37,"m15":0.44,"k":1458}
 2020-06-16T14:42:57 | test0  | uptimex | DEBUG | 5    | output-exec | {"m1":0.32,"m5":0.38,"m15":0.44,"k":1460}
 2020-06-16T14:42:59 | test0  | uptimex | DEBUG | 5    | output-exec | {"m1":0.32,"m5":0.38,"m15":0.44,"k":1462}
 2020-06-16T14:43:01 | test0  | uptimex | DEBUG | 5    | output-exec | {"m1":0.32,"m5":0.38,"m15":0.44,"k":1464}
 2020-06-16T14:43:03 | test0  | uptimex | DEBUG | 5    | output-exec | {"m1":0.3,"m5":0.37,"m15":0.44,"k":1466}
$

Specifying a date implies the time starts from midnight - in this case the data was first created at 14:00.

Can also specify a time-range explicitly. Times without dates are assumed to refer to the current day:

$ hotrod pipes logs --start-time '14:00' --end-time '17:00' --file csv > logs.csv

Metrics and Volumes

temp$ hotrod daemon traffic metrics 43.47 kiB logs 2.80 MiB unsent logs 0.00 B tarballs sent 599.50 kiB

temp$ hotrod volumes date | targets | bboxes | pipes | lines | volume ------------+---------+--------+-------+-------+------------ 2020.06.17 | 2 | 0 | 3 | 3629 | 287.11 kiB 2020.06.16 | 1 | 0 | 2 | 22992 | 1.78 MiB