add readmes

Signed-off-by: Wangchong Zhou <fffonion@gmail.com>
This commit is contained in:
Wangchong Zhou 2018-10-03 16:30:12 -07:00
parent c10e80c44b
commit 4d9ce8c70a
No known key found for this signature in database
GPG key ID: B607274584E8D5E5
2 changed files with 167 additions and 1 deletions

View file

@ -72,6 +72,8 @@ NOTE: Version 0.7.0 switched to the [kingpin](https://github.com/alecthomas/king
read buffer associated with the UDP connection. Please
make sure the kernel parameters net.core.rmem_max is
set to a value greater than the value specified.
--debug.dump-fsm="" The path to dump internal FSM generated for glob
matching as Dot file.
--log.level="info" Only log messages with the given severity or above.
Valid levels: [debug, info, warn, error, fatal]
--log.format="logger:stderr"
@ -181,6 +183,8 @@ mappings:
code: "$1"
```
### StatsD timers
By default, statsd timers are represented as a Prometheus summary with
quantiles. You may optionally configure the [quantiles and acceptable
error](https://prometheus.io/docs/practices/histograms/#quantiles):
@ -223,6 +227,8 @@ mappings:
job: "${1}_server"
```
### Regular expression matching
Another capability when using YAML configuration is the ability to define matches
using raw regular expressions as opposed to the default globbing style of match.
This may allow for pulling structured data from otherwise poorly named statsd
@ -249,14 +255,19 @@ automatically.
only used when the statsd metric type is a timerand the `timer_type` is set to
"histogram."
### Global defaults
One may also set defaults for the timer type, buckets or quantiles, and match_type. These will be used
by all mappings that do not define these.
An option that can only be configured in `defaults` is `glob_disable_ordering`, which is `false` if omitted. By setting this to `true`, `glob` match type will not honor the occurance of rules in the mapping rules file and always treat `*` as lower priority than a general string.
```yaml
defaults:
timer_type: histogram
buckets: [.005, .01, .025, .05, .1, .25, .5, 1, 2.5 ]
match_type: glob
glob_disable_ordering: false
mappings:
# This will be a histogram using the buckets set in `defaults`.
- match: test.timing.*.*.*
@ -275,7 +286,34 @@ mappings:
job: "${1}_server_other"
```
You may also drop metrics by specifying a "drop" action on a match. For example:
### Choose between glob or regex match type
Despite from the missing flexibility of using regular expression in mapping and
formatting labels, `glob` matching is optimized to have better performance than
`regex` in certain use cases. In short, glob will have best performance if the
rules amount is not so less and captures (using of *) is not to much in a
single rule. Whether disabling ordering in glob or not won't have a noticable
effect on performance in general use cases. In edge cases like the below however,
disabling ordering will be beneficial:
a.*.*.*.*
a.b.*.*.*
a.b.c.*.*
a.b.c.d.*
The reason is the list assignment of captures (using of *) is the most
expensive operation in glob. Honoring ordering will result fsm to do 10
times of list assignment at most, while disabling ordering it will need
only 4 at most.
See also [pkg/mapper/fsm/README.md](pkg/mapper/fsm/README.md).
Also running `go test -bench .` in **pkg/mapper** directory will produce
a detailed comparation between the two match type.
### `drop` action
You may also drop metrics by specifying a "drop" action on a match. For
example:
```yaml
mappings:
@ -296,6 +334,8 @@ mappings:
You can drop any metric using the normal match syntax.
The default action is "map" which does the normal metrics mapping.
### Explicit metric type mapping
StatsD allows emitting of different metric types under the same metric name,
but the Prometheus client library can't merge those. For this use-case the
mapping definition allows you to specify which metric type to match:

126
pkg/mapper/fsm/README.md Normal file
View file

@ -0,0 +1,126 @@
# FSM Mapping
## Overview
This package implements a fast and efficient algorithm for generic glob style
string matching using finite state machine (FSM).
### Source Hierachy
```
'-- fsm
'-- dump.go // functionality to dump the FSM to Dot file
'-- formatter.go // format glob templates using captured * groups
'-- fsm.go // manipulating and searching of FSM
'-- minmax.go // min() max() function for interger
```
## FSM Explained
Per [Wikipedia](https://en.wikipedia.org/wiki/Finite-state_machine):
A finite-state machine (FSM) or finite-state automaton (FSA, plural: automata), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number of states at any given time. The FSM can change from one state to another in response to some external inputs; the change from one state to another is called a transition. An FSM is defined by a list of its states, its initial state, and the conditions for each transition.
In our use case, each *state* is a substring after the input StatsD metric name is splitted by `.`.
### Add state to FSM
`func (f *FSM) AddState(match string, matchMetricType string,
maxPossibleTransitions int, result interface{}) int`
At first, the fsm only contains three states, representing three possible metric types:
____ [gauge]
/
(start)---- [counter]
\
'--- [ timer ]
Adding a rule `client.*.request.count` with type `counter` will make the fsm to be:
____ [gauge]
/
(start)---- [counter] -- [client] -- [*] -- [request] -- [count] -- {R1}
\
'--- [timer]
`{R1}` is short for result 1, which is the match result for `client.*.request.count`.
Adding a rule `client.*.*.size` with type `counter` will make the fsm to be:
____ [gauge] __ [request] -- [count] -- {R1}
/ /
(start)---- [counter] -- [client] -- [*]
\ \__ [*] -- [size] -- {R2}
'--- [timer]
### Finding a result state in FSM
`func (f *FSM) GetMapping(statsdMetric string, statsdMetricType string)
(*mappingState, []string)`
For example, try to map `client.aaa.request.count` with `counter` type in the
fsm, the `^1` to `^7` symbols indicate how fsm will traversal in its tree:
____ [gauge] __ [request] -- [count] -- {R1}
/ / ^5 ^6 ^7
(start)---- [counter] -- [client] -- [*]
^1 \ ^2 ^3 \__ [*] -- [size] -- {R2}
'--- [timer] ^4
To map `client.bbb.request.size`, fsm will do a backtracking:
____ [gauge] __ [request] -- [count] -- {R1}
/ / ^5 ^6
(start)---- [counter] -- [client] -- [*]
^1 \ ^2 ^3 \__ [*] -- [size] -- {R2}
'--- [timer] ^4
^7 ^8 ^9
## Debugging
To see all the states of the current FSM, use `func (f *FSM) DumpFSM(w io.Writer)`
to dump into a Dot file. The Dot file can be further renderer into image using:
```shell
$ dot -Tpng dump.dot > dump.png
```
In StatsD exporter, one could use the following:
```shell
$ statsd_exporter --statsd.mapping-config=statsd.rules --debug.dump-fsm=dump.dot
$ dot -Tpng dump.dot > dump.png
```
For example, the following rules:
```yaml
mappings:
- match: client.*.request.count
name: request_count
match_metric_type: counter
labels:
client: $1
- match: client.*.*.size
name: sizes
match_metric_type: counter
labels:
client: $1
direction: $2
```
will be rendered as:
![fsm](https://i.imgur.com/Wao4tsI.png)
The `dot` program is part of [Graphviz](https://www.graphviz.org/) and is
available in most of popular operating systems.