Some work on documentation towards v0.8

This commit is contained in:
Alex Auvolat 2022-09-14 19:31:13 +02:00
parent 89b8087ba8
commit f6aebefcc9
No known key found for this signature in database
GPG key ID: 0E496D15096376BE
15 changed files with 151 additions and 71 deletions

View file

@ -1,6 +1,6 @@
+++
title = "Benchmarks"
weight = 10
weight = 40
+++
With Garage, we wanted to build a software defined storage service that follow the [KISS principle](https://en.wikipedia.org/wiki/KISS_principle),

View file

@ -1,13 +1,13 @@
+++
title = "Goals and use cases"
weight = 5
weight = 10
+++
## Goals and non-goals
Garage is a lightweight geo-distributed data store that implements the
[Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html)
object storage protocole. It enables applications to store large blobs such
object storage protocol. It enables applications to store large blobs such
as pictures, video, images, documents, etc., in a redundant multi-node
setting. S3 is versatile enough to also be used to publish a static
website.

View file

@ -20,6 +20,49 @@ In the meantime, you can find some information at the following links:
- [an old design draft](@/documentation/working-documents/design-draft.md)
## Request routing logic
Data retrieval requests to Garage endpoints (S3 API and websites) are resolved
to an individual object in a bucket. Since objects are replicated to multiple nodes
Garage must ensure consistency before answering the request.
### Using quorum to ensure consistency
Garage ensures consistency by attempting to establish a quorum with the
data nodes responsible for the object. When a majority of the data nodes
have provided metadata on a object Garage can then answer the request.
When a request arrives Garage will, assuming the recommended 3 replicas, perform the following actions:
- Make a request to the two preferred nodes for object metadata
- Try the third node if one of the two initial requests fail
- Check that the metadata from at least 2 nodes match
- Check that the object hasn't been marked deleted
- Answer the request with inline data from metadata if object is small enough
- Or get data blocks from the preferred nodes and answer using the assembled object
Garage dynamically determines which nodes to query based on health, preference, and
which nodes actually host a given data. Garage has no concept of "primary" so any
healthy node with the data can be used as long as a quorum is reached for the metadata.
### Node health
Garage keeps a TCP session open to each node in the cluster and periodically pings them. If a connection
cannot be established, or a node fails to answer a number of pings, the target node is marked as failed.
Failed nodes are not used for quorum or other internal requests.
### Node preference
Garage prioritizes which nodes to query according to a few criteria:
- A node always prefers itself if it can answer the request
- Then the node prioritizes nodes in the same zone
- Finally the nodes with the lowest latency are prioritized
For further reading on the cluster structure look at the [gateway](@/documentation/cookbook/gateways.md)
and [cluster layout management](@/documentation/reference-manual/layout.md) pages.
## Garbage collection
A faulty garbage collection procedure has been the cause of

View file

@ -1,6 +1,6 @@
+++
title = "Related work"
weight = 15
weight = 50
+++
## Context

View file

@ -9,6 +9,15 @@ Let's start your Garage journey!
In this chapter, we explain how to deploy Garage as a single-node server
and how to interact with it.
## What is Garage?
Before jumping in, you might be interested in reading the following pages:
- [Goals and use cases](@/documentation/design/goals.md)
- [List of features](@/documentation/reference-manual/features.md)
## Scope of this tutorial
Our goal is to introduce you to Garage's workflows.
Following this guide is recommended before moving on to
[configuring a multi-node cluster](@/documentation/cookbook/real-world.md).

View file

@ -1,6 +1,6 @@
+++
title = "Administration API"
weight = 16
weight = 60
+++
The Garage administration API is accessible through a dedicated server whose

View file

@ -1,6 +1,6 @@
+++
title = "Garage CLI"
weight = 15
weight = 30
+++
The Garage CLI is mostly self-documented. Make use of the `help` subcommand

View file

@ -1,6 +1,6 @@
+++
title = "Configuration file format"
weight = 5
weight = 20
+++
Here is an example `garage.toml` configuration file that illustrates all of the possible options:
@ -10,7 +10,6 @@ metadata_dir = "/var/lib/garage/meta"
data_dir = "/var/lib/garage/data"
block_size = 1048576
block_manager_background_tranquility = 2
replication_mode = "3"
@ -87,17 +86,6 @@ files will remain available. This however means that chunks from existing files
will not be deduplicated with chunks from newly uploaded files, meaning you
might use more storage space that is optimally possible.
### `block_manager_background_tranquility`
This parameter tunes the activity of the background worker responsible for
resyncing data blocks between nodes. The higher the tranquility value is set,
the more the background worker will wait between iterations, meaning the load
on the system (including network usage between nodes) will be reduced. The
minimal value for this parameter is `0`, where the background worker will
allways work at maximal throughput to resynchronize blocks. The default value
is `2`, where the background worker will try to spend at most 1/3 of its time
working, and 2/3 sleeping in order to reduce system load.
### `replication_mode`
Garage supports the following replication modes:

View file

@ -0,0 +1,85 @@
+++
title = "List of Garage features"
weight = 10
+++
### S3 API
The main goal of Garage is to provide an object storage service that is compatible with the
[S3 API](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html) from Amazon Web Services.
We try to adhere as strictly as possible to the semantics of the API as implemented by Amazon
and other vendors such as Minio or CEPH.
Of course Garage does not implement the full span of API endpoints that AWS S3 does;
the exact list of S3 features implemented by Garage can be found [on our S3 compatibility page](@/documentation/reference-manual/s3-compatibility.md).
### Geo-distribution
Garage allows you to store copies of your data in multiple geographical locations in order to maximize resilience
to adverse events, such as network/power outages or hardware failures.
This allows Garage to run very well even at home, using consumer-grade Internet connectivity
(such as FTTH) and power, as long as cluster nodes can be spawned at several physical locations.
Garage exploits knowledge of the capacity and physical location of each storage node to design
a storage plan that best exploits the available storage capacity while satisfying the geo-distributed replication constraint.
To learn more about geo-distributed Garage clusters,
read our documentation on [setting up a real-world deployment](@/documentation/cookbook/real-world.md).
### Flexible topology
A Garage cluster can very easily evolve over time, as storage nodes are added or removed.
Garage will automatically rebalance data between nodes as needed to ensure the desired number of copies.
Read about cluster layout management [here](@/documentation/reference-manual/layout.md).
### No RAFT slowing you down
It might seem strange to tout the absence of something as a desirable feature,
but this is in fact a very important point! Garage does not use RAFT or another
consensus algorithm internally to order incoming requests: this means that all requests
directed to a Garage cluster can be handled independently of one another instead
of going through a central bottleneck (the leader node).
As a consequence, requests can be handled much faster, even in cases where latency
between cluster nodes is important (see our [benchmarks](@/documentation/design/benchmarks/index.md) for data on this).
This is particularly usefull when nodes are far from one another and talk to one other through standard Internet connections.
### Several replication modes
Garage supports a variety of replication modes, with 1 copy, 2 copies or 3 copies of your data,
and with various levels of consistency.
Read our reference page on [supported replication modes](@/documentation/reference-manual/configuration.md#replication-mode)
to select the replication mode best suited to your use case (hint: in most cases, `replication_mode = "3"` is what you want).
### Web server for static websites
A storage bucket can easily be configured to be served directly by Garage as a static web site.
Domain names for multiple websites directly map to bucket names, making it easy to build
a platform for your user's to autonomously build and host their websites over Garage.
Surprisingly, none of the other alternative S3 implementations we surveyed (such as Minio
or CEPH) support publishing static websites from S3 buckets, a feature that is however
directly inherited from S3 on AWS.
### Bucket names as aliases
- the same bucket may have multiple names (useful when exposing websites for example)
- bucket renaming is possible
- Scoped buckets: 2 users can have a different bucket with the same name -> avoid collision. Helpful if you want to write an application that creates per-user bucket always with the same name.
### Standalone/self contained
### Integration with Kubernetes and Nomad
Many node discovery methods: Kubernetes integration, Nomad integration through Consul
### Support for changing IP addresses
(as long as all nodes don't change their IP at the same time)
### Cluster administration API
### Metrics and traces
### (experimental) K2V API

View file

@ -1,6 +1,6 @@
+++
title = "K2V"
weight = 30
weight = 70
+++
Starting with version 0.7.2, Garage introduces an optionnal feature, K2V,

View file

@ -1,6 +1,6 @@
+++
title = "Cluster layout management"
weight = 10
weight = 50
+++
The cluster layout in Garage is a table that assigns to each node a role in

View file

@ -1,45 +0,0 @@
+++
title = "Request routing logic"
weight = 10
+++
Data retrieval requests to Garage endpoints (S3 API and websites) are resolved
to an individual object in a bucket. Since objects are replicated to multiple nodes
Garage must ensure consistency before answering the request.
## Using quorum to ensure consistency
Garage ensures consistency by attempting to establish a quorum with the
data nodes responsible for the object. When a majority of the data nodes
have provided metadata on a object Garage can then answer the request.
When a request arrives Garage will, assuming the recommended 3 replicas, perform the following actions:
- Make a request to the two preferred nodes for object metadata
- Try the third node if one of the two initial requests fail
- Check that the metadata from at least 2 nodes match
- Check that the object hasn't been marked deleted
- Answer the request with inline data from metadata if object is small enough
- Or get data blocks from the preferred nodes and answer using the assembled object
Garage dynamically determines which nodes to query based on health, preference, and
which nodes actually host a given data. Garage has no concept of "primary" so any
healthy node with the data can be used as long as a quorum is reached for the metadata.
## Node health
Garage keeps a TCP session open to each node in the cluster and periodically pings them. If a connection
cannot be established, or a node fails to answer a number of pings, the target node is marked as failed.
Failed nodes are not used for quorum or other internal requests.
## Node preference
Garage prioritizes which nodes to query according to a few criteria:
- A node always prefers itself if it can answer the request
- Then the node prioritizes nodes in the same zone
- Finally the nodes with the lowest latency are prioritized
For further reading on the cluster structure look at the [gateway](@/documentation/cookbook/gateways.md)
and [cluster layout management](@/documentation/reference-manual/layout.md) pages.

View file

@ -1,6 +1,6 @@
+++
title = "S3 Compatibility status"
weight = 20
weight = 40
+++
## DISCLAIMER

View file

@ -1,6 +1,6 @@
+++
title = "Design draft"
weight = 25
title = "Design draft (obsolete)"
weight = 50
+++
**WARNING: this documentation is a design draft which was written before Garage's actual implementation.

View file

@ -1,6 +1,6 @@
+++
title = "Load balancing data"
weight = 10
title = "Load balancing data (obsolete)"
weight = 60
+++
**This is being yet improved in release 0.5. The working document has not been updated yet, it still only applies to Garage 0.2 through 0.4.**