Woodpecker is a community fork of the Drone CI system.
Go to file
Fernando Barbosa 01699eaaab fix: k8s agent fails to tail logs starving the cpu
Proposal to fix https://github.com/woodpecker-ci/woodpecker/issues/2253

We have observed several possibly-related issues on a Kubernetes
backend:

1. Agents behave erractly when dealing with certain log payloads. A common
   observation here is that steps that produce a large volume of logs will cause
   some steps to be stuck "pending" forever.

2. Agents use way more CPU than should be expected, we often see 200-300
   millicores of CPU per Workflow per agent (as reported on #2253).

3. We commonly see Agents displaying thousands of error lines about
   parsing logs, often with very close timestamps, which may explain issues 1
   and 2 (as reported on #2253).

```
{"level":"error","error":"rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8","time":"2024-04-05T21:32:25Z","caller":"/src/agent/rpc/client_grpc.go:335","message":"grpc error: log(): code: Internal"}
{"level":"error","error":"rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8","time":"2024-04-05T21:32:25Z","caller":"/src/agent/rpc/client_grpc.go:335","message":"grpc error: log(): code: Internal"}
{"level":"error","error":"rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8","time":"2024-04-05T21:32:25Z","caller":"/src/agent/rpc/client_grpc.go:335","message":"grpc error: log(): code: Internal"}
```

4. We've also observed that agents will sometimes drop out of the worker queue,
also as reported on #2253.

Seeing as the logs point to `client_grpc.go:335`, this pull request
fixes the issue by:

1. Removing codes.Internal from being a retryable GRPC status. Now agent GRPC
calls that fail with codes. Internal will not be retried. There's not an
agreement on what GRPC codes should be retried but Internal does not seem to be
a common one to retry -- if ever.

2. Add a timeout of 30 seconds to any retries. Currently, the exponential
retries have a maximum timeout of _15 minutes_. I assume this might be
required by some other functions so Agents resume their operation in
case the webserver restarts. Still this is likely the cause behind the
large cpu increase as agents can be stuck trying thousands of requests for
a large windown of time. The previous change alone should be enough to
solve this issue but I think this might be a good idea to prevent
similar problems from arising in the future.
2024-04-08 16:32:29 -03:00
.github Temp exclude huh/spinner from renovate (#3588) 2024-04-03 16:19:52 +02:00
.vscode Add spellcheck config (#3018) 2024-01-27 21:15:10 +01:00
.woodpecker Update docker.io/golang Docker tag to v1.22.2 (#3596) 2024-04-08 07:54:28 +02:00
agent fix: k8s agent fails to tail logs starving the cpu 2024-04-08 16:32:29 -03:00
cli Fix cli version comparison and improve setup command (#3518) 2024-03-28 10:36:39 +01:00
cmd Allow to disable deployments (#3570) 2024-04-02 22:03:37 +02:00
contrib/woodpecker-test-repo/.woodpecker Cleanups + prefer .yaml (#3069) 2024-01-11 18:43:54 +01:00
docker chore(deps): update dependency alpine_3_18/ca-certificates to v20240226 (#3501) 2024-03-18 14:28:17 +01:00
docs Allow to disable deployments (#3570) 2024-04-02 22:03:37 +02:00
nfpm build: fix nfpm path for server binary (#3246) 2024-01-21 23:08:53 +01:00
pipeline fix: k8s agent fails to tail logs starving the cpu 2024-04-08 16:32:29 -03:00
server Update module github.com/google/go-github/v60 to v61 (#3595) 2024-04-06 08:00:59 +02:00
shared Enable golangci linter gomnd (#3171) 2024-03-15 18:00:25 +01:00
version Add spellcheck config (#3018) 2024-01-27 21:15:10 +01:00
web Translated using Weblate (German) 2024-04-06 15:57:38 +00:00
woodpecker-go Fix linter (#3354) 2024-02-08 22:49:07 +01:00
.cspell.json Add spellcheck config (#3018) 2024-01-27 21:15:10 +01:00
.ecrc Add spellcheck config (#3018) 2024-01-27 21:15:10 +01:00
.editorconfig Use editorconfig-checker (#982) 2022-06-17 12:03:34 +02:00
.gitattributes Fix "check_swagger" step (#2024) 2023-07-20 22:12:32 +02:00
.gitignore Remove datastore testfiles (#3584) 2024-04-02 10:10:29 +02:00
.gitpod.yml Fix Gitpod: Gitea auth token creation (#3299) 2024-01-30 18:39:59 +01:00
.golangci.yaml Enable golangci linter gomnd (#3171) 2024-03-15 18:00:25 +01:00
.hadolint.yaml Cleanups + prefer .yaml (#3069) 2024-01-11 18:43:54 +01:00
.markdownlint.yaml Add spellcheck config (#3018) 2024-01-27 21:15:10 +01:00
.pre-commit-config.yaml Update pre-commit hook pre-commit/pre-commit-hooks to v4.6.0 (#3597) 2024-04-08 07:27:22 +02:00
.prettierignore Do not run prettier with pre-commit (#3196) 2024-01-14 21:14:00 +01:00
.prettierrc.json Remove old files (#3077) 2023-12-30 15:10:31 +01:00
.yamllint.yaml Add spellcheck config (#3018) 2024-01-27 21:15:10 +01:00
CHANGELOG.md 🎉 Release 2.4.1 (#3517) 2024-03-20 21:51:24 +01:00
docker-compose.example.yaml Cleanups + prefer .yaml (#3069) 2024-01-11 18:43:54 +01:00
docker-compose.gitpod.yaml chore(deps): update postgres docker tag to v16.2 (#3461) 2024-03-04 08:47:39 +01:00
go.mod Update module github.com/google/go-github/v60 to v61 (#3595) 2024-04-06 08:00:59 +02:00
go.sum Update module github.com/google/go-github/v60 to v61 (#3595) 2024-04-06 08:00:59 +02:00
LICENSE Check for correct license header (#2137) 2023-08-10 11:06:00 +02:00
Makefile Switch back to latest version of golangci (#3527) 2024-03-21 12:22:36 +02:00
README.md docs: fix contributions link (#3363) 2024-02-10 07:51:13 +01:00
release-config.ts Add release helper (#1976) 2023-09-07 17:17:17 +02:00

Woodpecker

Woodpecker


Build Status Code coverage Translation status Discord chat Matrix space Go Report Card go reference GitHub release Docker pulls License: Apache-2.0 OpenSSF best practices pre-commit.ci


Woodpecker is a simple yet powerful CI/CD engine with great extensibility.

woodpecker

🫶 Support

Please consider donating and become a backer. 🙏 [Become a backer]

Open Collective backers

📖 Documentation

https://woodpecker-ci.org/

Contribute

See Contributing Guide

Open in Gitpod

📣 Translate

We use an own Weblate instance at translate.woodpecker-ci.org.

Translation status

👋 Who uses Woodpecker?

Woodpecker is used by itself multiple well-known companies, organizations like Codeberg, hobbyist and many others.

Leave a comment if you're using it as well.

Also consider using the topic WoodpeckerCI in your repository, so others can learn from your config and use the hashtag #WoodpeckerCI when talking about the project on social media!

Here are some places where people mention Woodpecker:

Stars over time

Stargazers over time

License

Woodpecker is Apache 2.0 licensed with the source files in this repository having a header indicating which license they are under and what copyrights apply.

Files under the docs/ folder are licensed under Creative Commons Attribution-ShareAlike 4.0 International Public License.