pict-rs/releases/0.4.0.md

565 lines
19 KiB
Markdown

# pict-rs 0.4.0
I am excited to announce a new stable version of pict-rs, version 0.4.0. There's a few big things in
0.4 that might be exciting to developers and admins. Below you'll find upgrade notes for
administrators, as well as a list of changes from 0.3 to 0.4.
## Overview
### Headline Features:
- [Reworked Configuration](#reworked-configuration)
- [Reworked Commandline](#reworked-commandline)
- [Object Storage](#object-storage)
- [Backgrounded Uploads](#backgrounded-uploads)
- [Full Video](#full-video)
### Changes
- [Default video codec](#default-video-codec)
### Other Features:
- [Published Dependencies](#published-dependencies)
- [HEAD Endpoints](#head-endpoints)
- [Healthcheck Endpoints](#healthcheck-endpoints)
- [Image Preprocessing](#image-preprocessing)
- [GIF configuration](#gif-configuration)
- [JXL and AVIF](#jxl-and-avif)
- [H265, VP8, VP9, and AV1](#h265-vp8-vp9-and-av1)
- [Audio Codecs](#audio-codecs)
- [404 Image](#404-image)
- [DB Export](#db-export)
- [Identifier Endpoint](#identifier-endpoint)
- [Variant Cleanup Endpoint](#variant-cleanup-endpoint)
- [Client Pool Configuration](#client-pool-configuration)
### Fixes
- [GIF/webm Transparency](#gif-webm-transparency)
- [Image Orientation](#image-orientation)
- [io_uring file write](#io_uring-file-write)
- [Blur Sigma](#blur-sigma)
### Removals
- [Purge by disk name](#purge-by-disk-name)
- [Aliases by disk name](#aliases-by-disk-name)
- [Filename endpoint](#filename-endpoint)
## Upgrade Notes
If you're running the provided docker container with default configuration values, the 0.4 container
can be pulled and launched with no changes. There is a migration from the 0.3 database format to the
0.4 database format that will occur automatically on first launch. After the migration is complete,
the server will start.
If you have any custom configuration, or are running outside of docker, see [Reworked
Configuration](#reworked-configuration) below. It is likely that you will need to change
configuration values for the migration to run properly.
More information can be found in the [pict-rs
readme](https://git.asonix.dog/asonix/pict-rs#user-content-0-3-to-0-4-migration-guide)
## Descriptions
### Reworked Configuration
Starting off with the most important information for server admins: The configuration format has
changed. The `pict-rs.toml` is now far better organized, and includes more configuration options
than before. Every field has been moved, so please take a look at the [example
pict-rs.toml](https://git.asonix.dog/asonix/pict-rs/src/branch/main/pict-rs.toml) file for
information on how configuration works in 0.4.
Notable changes:
- `RUST_LOG` no longer has an effect, see `PICTRS__TRACING__LOGGING__TARGETS` and
`PICTRS__TRACING__OPENTELEMETRY__TARGETS` for configuring log levels.
- `PICTRS__ADDRESS` has become `PICTRS__SERVER__ADDRESS`
- `PICTRS__PATH` has become `PICTRS__OLD_DB__PATH`, `PICTRS__REPO__PATH` and
`PICTRS__STORE__PATH`
The pict-rs.toml equivalents:
```toml
# pict-rs.toml
[server]
# same format as the previous PICTRS__ADDRESS
address = "0.0.0.0:8080"
[tracing.logging]
# same format as RUST_LOG
targets = "info"
[tracing.opentelemetry]
# same format as RUST_LOG
targets = "debug"
[old_db]
# same as the previous PICTRS__PATH
path = "/mnt"
[repo]
# Added the `sled-repo` subdirectory to the previous PICTRS__PATH
path = "/mnt/sled-repo"
[store]
# Added the `files` subdirectory to the previous PICTRS__PATH
path = "/mnt/files"
```
As mentioned above, every configuration option has moved. Check the linked example configuration
file for more information.
### Reworked Commandline
Also notable for server admins: the commandline interface has changed. All configuration that can be
expressed via the pict-rs.toml file or related environment variables can be set via the commandline
as well.
Running the pict-rs server now requires invoking the `run` subcommand. Here are some examples:
```bash
$ pict-rs run
$ pict-rs -c /etc/pict-rs.toml run
$ pict-rs run filesystem -p /path/to/files sled -p /path/to/metadata
```
There is one flag that I will call out here: `--save-to <path>`. This can be incredibly helpful when
debugging pict-rs. It dumps the configuration that pict-rs is currently using to a file of your
choosing. Example:
```bash
$ PICTRS__SERVER__ADDRESS=127.0.0.1:1234 pict-rs --save-to debug.toml run
CTRL^C
$ cat debug.toml
```
It can also be useful to save your existing commandline arguments into a reusable configuration:
```bash
$ pict-rs --save-to config.toml \
--log-targets warn \
--log-format json \
--console-address '127.0.0.1:6669' \
--opentelemetry-url 'http://localhost:4317' \
run \
-a '127.0.0.1:1234' \
--api-key "$API_KEY" \
--client-pool-size 10 \
--media-max-width 200 \
--media-max-height 200 \
--media-max-file-size 1 \
--media-enable-full-video true \
--media-video-codec av1 \
--media-format webp \
filesystem \
-p /opt/pict-rs/files \
sled \
-p /opt/pict-rs/metadata \
-e /opt/pict-rs/exports
CTRL^C
$ cat config.toml
$ pict-rs -c config.toml run
```
### Object Storage
This is the biggest feature for pict-rs 0.4, but it isn't new. pict-rs 0.3 had initial support for
uploading to object storage hidden behind the `object-storage` compile flag. My docker images and
provided binaries did not enable this feature, so only folks compiling from source could even have
tried using it. In 0.4 I am now much more confident in the object storage support, and use it myself
for my personal setup.
Although pict-rs now supports object storage as a backend for storing uploaded media, it doesn't
support generating URLs that bypass the pict-rs server for serving images. I've seen this
misconception a couple times during the release candidate process and I would like to put these
rumors to rest.
Using object storage when you're in a cloud environment is sensible. The cost for adding object
storage is far less than for adding block storage, and although there are egress fees, they are
typically very low. Admins currently running pict-rs on a VPS should consider moving to object
storage if they find their disk usage is quickly growing.
Using object storage does not fully remove pict-rs' dependence on a local disk. pict-rs stores its
metadata in an embedded key/value store called [sled](https://github.com/spacejam/sled/), which
means it still requires disk access to function.
For new admins, using object storage from the start might be worth considering, but is not
required. pict-rs provides a built-in migration path for moving from filesystem storage to object
storage, which is documented in the [pict-rs
readme](https://git.asonix.dog/asonix/pict-rs/src/branch/main#user-content-filesystem-to-object-storage-migration).
### Backgrounded Uploads
This feature is important for keeping pict-rs fast. Since media needs to be processed on upload,
upload times may grow long. This is especially true when considering pict-rs' other new features
like [full video](#full-video) and [image preprocessing](#image-preprocessing).
Backgrounded uploads work by deferring the image processing work to a background task in pict-rs,
and responding to the upload HTTP request as soon as the provided file has been stored
successfully. Instead of returning a file alias like the inline upload endpoint, it instead returns
a UUID that represents the current upload process. That UUID can be given to a new `claim` endpoint
to retrieve the uploaded media's alias, or an error in the event the media was rejected or failed to
process. The claim endpoint uses a technique called "long polling" in order to notify clients on
completion of the background processing. Clients can request to claim uploaded media as soon as the
backgrounded endpoint reponds with an upload ID, and the server will hold the connection open for at
most 10 seconds while it waits for processing to complete. If processing did not complete in time,
the client can request again to claim the upload.
### Full Video
pict-rs now supports uploading video with audio. In 0.3, pict-rs would always strip audio from
uploaded videos. This was intended as a gif-like feature, since pict-rs' primary use is for storing
pictures and not videos. By default, full video is disabled in pict-rs 0.4, but server admins can
opt into full video uploads by setting `PICTRS__MEDIA__ENABLE_FULL_VIDEO=true`.
Along with supporting full video, an additional configuration option has been added to help keep
down on file sizes and processing time: `PICTRS__MEDIA__MAX_FRAME_COUNT`. This option limits how
many frames an uploaded video is allowed to have, rejecting the upload if it exceeds the provided
value.
It is important to note that while pict-rs is capable of storing videos, it's processing
functionality is limited to just the video's thumbnail.
### Default Video Codec
pict-rs 0.4 has changed the default video codec from h264 to vp9. This was done because mp4 is still
patent-encumbered, and I don't want to ship software the defaults to using it. This change may
result is slower transcoding times.
### Published Dependencies
pict-rs 0.3's experimental object storage support relied on a custom fork of the `rust-s3` library,
which did useful things like enable re-use of an HTTP Client between connections, and making the
Client a `trait`, so it could be implemented for any arbitrary HTTP Client. Since then, the PR that
was meant to upstream those changes was closed, and so pict-rs needed to find a replacement that
suited it's needs.
pict-rs now relies on the [`rusty-s3`](https://github.com/paolobarbolini/rusty-s3) library for
enabling object storage. This library takes a sans-io approach, and was simple to integrate with
`awc`, my HTTP Client of choice.
### HEAD Endpoints
For the `/image/original/{filename}` and `/image/process.{ext}?src={filename}` endpoints, there are
now analogous `HEAD` endpoints for fetching just the headers their `GET` counterparts would return.
### Healthcheck Endpoints
pict-rs now provides a `/healthz` endpoint that can be used to check if the running process is
healthy. It currently attempts to write to the sled database and fetch information from the
configured store, returning 200 if all operations succeeded.
### Image Preprocessing
pict-rs 0.4 introduces a new configuration option: `PICTRS__MEDIA__PREPROCESS_STEPS`. This option
accepts the same syntax as the `process` endpoint, and can be used to apply transformations to all
uploaded images.
Here are some examples:
```toml
# pict-rs.toml
[media]
# Crop all uploaded media to a 16x9 aspect ratio, and then shrink to fit within a 1200 by 1200 pixel box.
preprocess_steps = "crop=16x9&resize=1200"
```
```yaml
# docker-compose.yml
services:
pictrs:
image: asonix/pictrs:0.4.0
ports:
- "127.0.0.1:8080:8080"
restart: always
environment:
# Blur all uploaded images with a sigma value of 1.5
- PICTRS__MEDIA__PREPROCESS_STEPS=blur=1.5
volumes:
- ./volumes/pictrs:/mnt
```
### GIF configuration
pict-rs 0.3 would transcode all uploaded GIF files to video formats. This is no longer the case.
Some new configuration options have been introduced to determine when GIF files should and should
not be converted to silent videos. By default, GIF files will remain GIFs if their dimensions are
smaller than 128x128 px, and they don't have more than 100 frames. GIFs to that do fit these
restrictions will be transcoded to a video format.
To tweak these values, the following environment variables can be set:
```yaml
# docker-compose.yml
services:
pictrs:
# ...
environment:
- PICTRS__MEDIA__GIF__MAX_WIDTH=128
- PICTRS__MEDIA__GIF__MAX_HEIGHT=128
- PICTRS__MEDIA__GIF__MAX_AREA=16384
- PICTRS__MEDIA__GIF__MAX_FRAME_COUNT=100
```
Alternatively, these values can be set in pict-rs.toml
```toml
# pict-rs.toml
[media.gif]
max_width = 128
max_height = 128
max_area = 16384
max_frame_count = 100
```
### JXL and AVIF
pict-rs 0.4 introduces support for two new image formts: JPEG XL (JXL) and AVIF. These two image
formats offer better compression to quality ratios than other standard formats, although they may be
less well-supported by browsers or other clients. Not only can these formats be uploaded and handled
now, but they can also be used with the `PICTRS__MEDIA__FORMAT` variable for transcoding all
uploaded images.
Exmaples:
```yaml
# docker-compose.yml
services:
pictrs:
# ...
environment:
- PICTRS__MEDIA__FORMAT=avif
```
```toml
# pict-rs.toml
[media]
format = "jxl"
```
### H265, VP8, VP9, and AV1
As mentioned earlier, the default video codec has changed from `h264` to `vp9`, however, pict-rs 0.4
introduces support for a broader range of video codecs:
- in the mp4 container
- h264
- h265
- in the webm container
- vp8
- vp9
- av1
The default video codec can be changed with the following settings:
```yaml
# docker-compose.yml
services:
pictrs:
# ...
environment:
- PICTRS__MEDIA__VIDEO_CODEC=av1
```
```toml
# pict-rs.toml
[media]
video_codec = "h265"
```
Note that h264 and h265 in the mp4 container are still patent-encumbered, and you may wish to avoid
them.
### Audio Codecs
With the introduction of full video, pict-rs adds the ability to configure the audio codec used for
uploaded videos. By default, pict-rs will choose a reasonable codec for the provided video codec.
This means that there is no need to configure the audio codec yourself. The option is still
available, though.
Here is the existing mapping between video and audio codecs:
- av1, vp8, and vp9 use opus by default
- h264 and h265 use aac by default
Available codecs for configuration are
- opus
- vorbis
- aac
Setting a preferred codec can be done with the following configurations:
```yaml
# docker-compose.yml
services:
pictrs:
# ...
environment:
- PICTRS__MEDIA__AUDIO_CODEC=vorbis
```
```toml
# pict-rs.toml
[media]
audio_codec = "opus"
```
### 404 Image
pict-rs 0.4 adds support for setting a "Not Found" image. This image will be returned for any
request for an image that does not exist on the server. This is done through an internal endpoint,
which means it requires pict-rs' `PICTRS__SERVER__API_KEY` to be set.
After uploading an image to pict-rs, it's name can be submitted to the `/internal/set_not_found`
endpoint with the following JSON:
```json
{
"alias": "79087a4f-ea31-4878-9e8a-0498dada597b.webp"
}
```
If the alias exists on the server, it will mark that image for use as the "404 image"
### DB Export
Part of administering a server means taking regular backups. Many backup systems rely on simply
copying files around, which doesn't work well for pict-rs' sled database, which may change while
being read by the backup tool.
One solution to this is to use a snapshotting filesystem like btrfs, bcachefs, or zfs. A filesystem
snapshot is "atomic", and will represent the database at an exact moment in time.
For admins running on more common filesystems such as ext4 or xfs, there is now a configuration
option and internal endpoint for producing immutable copies of the current database state.
```toml
# pict-rs.toml
[repo]
export_path = "./data/exports"
```
```yaml
# docker-compose.yml
services:
pictrs:
# ...
environment:
- PICTRS__REPO__EXPORT_PATH=/opt/pict-rs/exports
```
When this value is set, the `/internal/export` endpoint becomes available. Hitting that endpoint
will result in pict-rs exporting its current state to a new timestamped directory inside the
`export_path`. This directory contains the same structure as the `sled-repo` directory, and in the
event of data loss, can be restored by simply moving it into place.
Here's an example process for restoring from an export:
1. Stop pict-rs
2. Move your current `sled-repo` directory to a safe location (e.g. `sled-repo.bak`)
```bash
$ mv sled-repo sled-repo.bak
```
3. Copy an exported database to `sled-repo`
```bash
$ cp -r exports/2023-07-08T22:26:21.194126713Z sled-repo
```
4. Start pict-rs
### Identifier Endpoint
pict-rs 0.4 introduces a new endpoint for retrieving an alias' "True Name", the Identifier. An
Identifier represents the real path to the file in filesystem storage or in object storage.
Currently these paths are generated using the same algorithm, but pict-rs retains the right to
optimize identifiers differently for filesystem and object storage in the future.
### Variant Cleanup Endpoint
This is one of the less exciting features, and won't be very commonly used. A new endpoint at
`/internal/variants` accepts the DELETE verb. This will spawn a background task that iterates over
all files in the database, removing any generated variants of an image, keeping only the Original.
All variants can be generated again on request.
### Client Pool Configuration
Another less exciting feature, the number of connections pict-rs' internal HTTP Client maintains in
its pool can now be configured. This value is 100 by default, and shouldn't need to be tweaked. If
you are experiencing heavy traffic and are using object storage, you could increase the value. If
you are constrained on allowed open file descriptors, you can decrease the value.
Note that this value is multiplied by the number of threads pict-rs spawns, which depends on the
number of cores allowed to the application. Limiting pict-rs to 2 CPUs with default pool settings
will result in an effective limit of 200 connections.
### GIF/webm Transparency
In 0.3, pict-rs removed transparency from uploaded gifs and videos. 0.4 supports preserving the
transparency of GIF and webm files on upload.
### Image Orientation
In 0.3, pict-rs would strip metadata from all images. This would remove metadata about the intended
orientation of an image, and lead to uploaded media being sideways or upside down. pict-rs 0.4 still
strips all metadata from uploaded images, but it now applies a rotation to images that require it as
well. Images uploaded in version 0.4 should no longer be sideways or upside down.
### io_uring File Write
Due to a conflict between the `AsyncReadExt` trait and `StreamReader`, actix-web's `Payload` stream
would be polled after completion when pict-rs' `io-uring` feature was enabled. This has been fixed
by using the `read_buf` API rather than the `read_to_end` API in the io_uring file implementation.
Note that io_uring is still considered an experimental feature, and probably should not be used in
production. YMMV.
### Blur Sigma
pict-rs' `blur` filter would incorrectly apply the provided `sigma` value as the `radius` for a
gaussian blur. This has been fixed in 0.4 to properly set the `sigma` value with a default `radius`
of 0. This tells imagemagick to calculate a reasonable radius for the given image and sigma.
### Purge by disk name
Due to changes in how pict-rs stores files in 0.4 (introducing a pluggable store backend), the
`/internal/purge` endpoint can no longer be provided with a `filename` query, and must be given an
`alias` query instead.
### Aliases by disk name
Due to changes in how pict-rs stores files in 0.4 (introducing a pluggable store backend), the
`/internal/aliases` endpoint can no longer be provided with a `filename` query, and must be given an
`alias` query instead.
### Filename Endpoint
This was replaced by the Identifier endpoint, since files are no longer required to be stored on the
filesystem.