Merge pull request #354 from prometheus/mr/glob-regex-documentation

Reorder and rework glob vs. regex documentation
This commit is contained in:
Matthias Rampke 2021-01-22 12:58:01 +00:00 committed by GitHub
commit 64dd103e3f
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -191,6 +191,11 @@ In general, the different metric types are translated as follows:
StatsD timer, histogram, distribution -> Prometheus summary or histogram
### Glob matching
The default (and fastest) `glob` mapping style uses `*` to denote parts of the statsd metric name that may vary.
These varying parts can then be referenced in the construction of the Prometheus metric name and labels.
An example mapping configuration:
```yaml
@ -234,6 +239,33 @@ mappings:
provider: "$1"
```
Glob matching offers the best performance for common mappings.
#### Ordering glob rules
List more specific matches before wildcards, from left to right:
a.b.c
a.b.*
a.*.d
a.*.*
This avoids unexpected shadowing of later rules, and performance impact from backtracking.
Alternatively, you can disable mapping ordering altogether.
With unordered mapping, at each hierarchy level the most specific match wins.
This has the same effect as using the recommended ordering.
### Regular expression matching
The `regex` mapping style uses regular expressions to match the full statsd metric name.
Use it if the glob mapping is not flexible enough to pull structured data from the available statsd metric names.
Regular expression matching is significantly slower than glob mapping as all mappings must be tested in order.
Because of this, **regex mappings are only executed after all glob mappings**.
In other words, glob mappings take preference over regex matches, irrespective of the order in which they are specified.
Regular expression matches are always evaluated in order, and the first match wins.
The metric name can also contain references to regex matches. The mapping above
could be written as:
@ -244,6 +276,15 @@ mappings:
name: "${2}_total"
labels:
provider: "$1"
mappings:
- match: "(.*)\.(.*)--(.*)\.status\.(.*)\.count"
match_type: regex
name: "request_total"
labels:
hostname: "$1"
exec: "$2"
protocol: "$3"
code: "$4"
```
Be aware about yaml escape rules as a mapping like the following one will not work.
@ -256,6 +297,8 @@ mappings:
provider: "$1"
```
### Naming, labels, and help
Please note that metrics with the same name must also have the same set of
label names.
@ -402,29 +445,6 @@ mappings:
job: "${1}_server_other"
```
### Choosing between glob or regex match type
Despite from the missing flexibility of using regular expression in mapping and
formatting labels, `glob` matching is optimized to have better performance than
`regex` in certain use cases. In short, glob will have best performance if the
rules amount is not so less and captures (using of `*`) is not to much in a
single rule. Whether disabling ordering in glob or not won't have a noticable
effect on performance in general use cases. In edge cases like the below however,
disabling ordering will be beneficial:
a.*.*.*.*
a.b.*.*.*
a.b.c.*.*
a.b.c.d.*
The reason is that the list assignment of captures (using of `*`) is the most
expensive operation in glob. Honoring ordering will result in up to 10 list
assignments, while without ordering it will need only 4 at most.
For details, see [pkg/mapper/fsm/README.md](pkg/mapper/fsm/README.md).
Running `go test -bench .` in **pkg/mapper** directory will produce
a detailed comparison between the two match type.
### `drop` action
You may also drop metrics by specifying a "drop" action on a match. For