Merge pull request #354 from prometheus/mr/glob-regex-documentation

Reorder and rework glob vs. regex documentation
2024-11-22 15:30:59 +00:00 · 2021-01-22 12:58:01 +00:00 · 2021-01-22 12:58:01 +00:00 · 64dd103e3f
commit 64dd103e3f
parent 4171ba0c9b bade06f775
1 changed files with 43 additions and 23 deletions
--- a/README.md
+++ b/README.md
@ -191,6 +191,11 @@ In general, the different metric types are translated as follows:
    StatsD timer, histogram, distribution   -> Prometheus summary or histogram
 ### Glob matching
 The default (and fastest) `glob` mapping style uses `*` to denote parts of the statsd metric name that may vary.
 These varying parts can then be referenced in the construction of the Prometheus metric name and labels.
 An example mapping configuration:
 ```yaml
@ -234,6 +239,33 @@ mappings:
    provider: "$1"
 ```
 Glob matching offers the best performance for common mappings.
 #### Ordering glob rules
 List more specific matches before wildcards, from left to right:
    a.b.c
    a.b.*
    a.*.d
    a.*.*
 This avoids unexpected shadowing of later rules, and performance impact from backtracking.
 Alternatively, you can disable mapping ordering altogether.
 With unordered mapping, at each hierarchy level the most specific match wins.
 This has the same effect as using the recommended ordering.
 ### Regular expression matching
 The `regex` mapping style uses regular expressions to match the full statsd metric name.
 Use it if the glob mapping is not flexible enough to pull structured data from the available statsd metric names.
 Regular expression matching is significantly slower than glob mapping as all mappings must be tested in order.
 Because of this, **regex mappings are only executed after all glob mappings**.
 In other words, glob mappings take preference over regex matches, irrespective of the order in which they are specified.
 Regular expression matches are always evaluated in order, and the first match wins.
 The metric name can also contain references to regex matches. The mapping above
 could be written as:
@ -244,6 +276,15 @@ mappings:
  name: "${2}_total"
  labels:
    provider: "$1"
 mappings:
 - match: "(.*)\.(.*)--(.*)\.status\.(.*)\.count"
  match_type: regex
  name: "request_total"
  labels:
    hostname: "$1"
    exec: "$2"
    protocol: "$3"
    code: "$4"
 ```
 Be aware about yaml escape rules as a mapping like the following one will not work.
@ -256,6 +297,8 @@ mappings:
    provider: "$1"
 ```
 ### Naming, labels, and help
 Please note that metrics with the same name must also have the same set of
 label names.
@ -402,29 +445,6 @@ mappings:
    job: "${1}_server_other"
 ```
 ### Choosing between glob or regex match type
 Despite from the missing flexibility of using regular expression in mapping and
 formatting labels, `glob` matching is optimized to have better performance than
 `regex` in certain use cases. In short, glob will have best performance if the
 rules amount is not so less and captures (using of `*`) is not to much in a
 single rule. Whether disabling ordering in glob or not won't have a noticable
 effect on performance in general use cases. In edge cases like the below however,
 disabling ordering will be beneficial:
    a.*.*.*.*
    a.b.*.*.*
    a.b.c.*.*
    a.b.c.d.*
 The reason is that the list assignment of captures (using of `*`) is the most
 expensive operation in glob. Honoring ordering will result in up to 10 list
 assignments, while without ordering it will need only 4 at most.
 For details, see [pkg/mapper/fsm/README.md](pkg/mapper/fsm/README.md).
 Running `go test -bench .` in **pkg/mapper** directory will produce
 a detailed comparison between the two match type.
 ### `drop` action
 You may also drop metrics by specifying a "drop" action on a match. For