close: https://github.com/woodpecker-ci/woodpecker/issues/3555
Put the same logic from `waitStep` and call the function
`isImagePullBackOffState` in the `tailStep` function.
---------
Co-authored-by: elias.souza <elias.souza@quintoandar.com.br>
Co-authored-by: Anbraten <6918444+anbraten@users.noreply.github.com>
Proposal to fix https://github.com/woodpecker-ci/woodpecker/issues/2253
We have observed several possibly-related issues on a Kubernetes
backend:
1. Agents behave erractly when dealing with certain log payloads. A common
observation here is that steps that produce a large volume of logs will cause
some steps to be stuck "pending" forever.
2. Agents use way more CPU than should be expected, we often see 200-300
millicores of CPU per Workflow per agent (as reported on #2253).
3. We commonly see Agents displaying thousands of error lines about
parsing logs, often with very close timestamps, which may explain issues 1
and 2 (as reported on #2253).
```
{"level":"error","error":"rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8","time":"2024-04-05T21:32:25Z","caller":"/src/agent/rpc/client_grpc.go:335","message":"grpc error: log(): code: Internal"}
{"level":"error","error":"rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8","time":"2024-04-05T21:32:25Z","caller":"/src/agent/rpc/client_grpc.go:335","message":"grpc error: log(): code: Internal"}
{"level":"error","error":"rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8","time":"2024-04-05T21:32:25Z","caller":"/src/agent/rpc/client_grpc.go:335","message":"grpc error: log(): code: Internal"}
```
4. We've also observed that agents will sometimes drop out of the worker queue,
also as reported on #2253.
Seeing as the logs point to `client_grpc.go:335`, this pull request
fixes the issue by:
1. Removing codes.Internal from being a retryable GRPC status. Now agent GRPC
calls that fail with codes. Internal will not be retried. There's not an
agreement on what GRPC codes should be retried but Internal does not seem to be
a common one to retry -- if ever.
2. Add a timeout of 30 seconds to any retries. Currently, the exponential
retries have a maximum timeout of _15 minutes_. I assume this might be
required by some other functions so Agents resume their operation in
case the webserver restarts. Still this is likely the cause behind the
large cpu increase as agents can be stuck trying thousands of requests for
a large windown of time. The previous change alone should be enough to
solve this issue but I think this might be a good idea to prevent
similar problems from arising in the future.
closes#3515
I think after this is fixed, we should publish a new release as this can
be quite important.
Co-authored-by: Robert Kaussow <mail@thegeeklab.de>
Closes https://github.com/woodpecker-ci/woodpecker/discussions/2274
# deprecation of alternative names
Instead of
```yaml
secrets:
- source: some_secret
target: some_env
```
you now write:
```yaml
environment:
some_env:
from_secret: some_secret
```
Also, it's possible to use complex yaml objects in `environment`,
they're turned into json (just like `settings`).
Fix Issue: https://github.com/woodpecker-ci/woodpecker/issues/3288
The way the pod service starts up makes it impossible to run two or more
pipelines at the same time when we have a service section.
The idea is to set the name of the service in the same way we did for
the pod name.
Pipeline:
```yaml
services:
mydb:
image: mysql
environment:
- MYSQL_DATABASE=test
- MYSQL_ROOT_PASSWORD=example
ports:
- 3306/tcp
steps:
get-version:
image: ubuntu
commands:
- ( apt update && apt dist-upgrade -y && apt install -y mysql-client 2>&1 )> /dev/null
- sleep 30s # need to wait for mysql-server init
- echo 'SHOW VARIABLES LIKE "version"' | mysql -uroot -hmydb test -pexample
```
Running more than one pipeline result:
![image](https://github.com/woodpecker-ci/woodpecker/assets/22245125/e512309f-0d1e-4125-bab9-2357a710fedd)
---------
Co-authored-by: elias.souza <elias.souza@quintoandar.com.br>
Closes https://github.com/woodpecker-ci/woodpecker/discussions/2174
- return bad habit error if no event filter is set
- If this is applied, it's useless to allow `exclude`s on events.
Therefore, deprecate it together with `include`s which should be
replaced by `base.StringOrSlice` later.
Currently, backend options are parsed in the yaml parser.
This has some issues:
- backend specific code should be in the backend folders
- it is not possible to add backend options for backends added via
addons
Fixes https://github.com/woodpecker-ci/woodpecker/issues/3330
This adds error handling on the agent's WaitStep function, on two
sections where it could encounter a `panic: runtime error: invalid
memory address or nil pointer dereference` in case it could no longer
access complete information about a specific pod.
This error was found to happen if the node in which the pod was running
was terminated during the step's execution.
spite active pipelines being executed on the node.
Now instead of a panic on the agent's logs and undefined behavior on the
UI it will display a more helpful error message on the UI.
### Additional context
We observed the bug first on v2.1.1, but tested the fix internally on
top of 2.3.0.
![image](https://github.com/woodpecker-ci/woodpecker/assets/7269710/dfbcf089-85f7-4b5d-8102-f21af95c5cda)