Add precisions

This commit is contained in:
Alex Auvolat 2021-02-25 11:23:22 +01:00
parent 2b4b69938f
commit fdf908e845

View file

@ -2,30 +2,37 @@ I have conducted a quick study of different methods to load-balance data over di
### Requirements ### Requirements
- good balancing: two nodes that have the same announced capacity should receive close to the same number of items - *good balancing*: two nodes that have the same announced capacity should receive close to the same number of items
- multi-datacenter: the replicas of a partition should be distributed over as many datacenters as possible
- minimal disruption: when adding or removing a node, as few partitions as possible should have to move around - *multi-datacenter*: the replicas of a partition should be distributed over as many datacenters as possible
- *minimal disruption*: when adding or removing a node, as few partitions as possible should have to move around
- *order-agnostic*: the same set of nodes (associated with a datacenter name
and a capacity) should always return the same distribution of partition
replicas, independently of the order in which nodes were added/removed (this
is to keep the implementation simple)
### Methods ### Methods
#### Naive multi-DC ring walking strategy #### Naive multi-DC ring walking strategy
This strategy can be used with any ring-linke algorithm to make it aware of the *multi-datacenter* requirement: This strategy can be used with any ring-like algorithm to make it aware of the *multi-datacenter* requirement:
- the ring is a list of positions, each associated with a single node in the cluster - the ring is a list of positions, each associated with a single node in the cluster
- look up position of item on ring - look up position of item on ring
- select the node for that position - select the node for that position
- go clockwise, skipping nodes that: - go clockwise, skipping nodes that:
- we halve already selected - we halve already selected
- are in a datacenter of a node we have selected, except if we already have nodes from all available datacenters - are in a datacenter of a node we have selected, except if we already have nodes from all possible datacenters
In this way the selected nodes will always be distributed over In this way the selected nodes will always be distributed over
`min(n_datacenters, n_replicas)` different datacenters, which is the best we `min(n_datacenters, n_replicas)` different datacenters, which is the best we
can do. can do.
This method was implemented in the first iteration of Garage, with the basic This method was implemented in the first version of Garage, with the basic
ring construction that consists in associating `n_token` random positions to ring construction from Dynamo DB that consists in associating `n_token` random positions to
each node. each node (I know it's not optimal, the Dynamo paper already studies this).
#### Better rings #### Better rings
@ -43,7 +50,7 @@ To solve this, we want to apply a second method for partitionning our dataset:
I have studied two ways to do the attribution, in a way that is deterministic: I have studied two ways to do the attribution, in a way that is deterministic:
- Custom: take `argmin_node(hash(node, partition_number))` - Min-hash: for each partition, select node that minimizes `hash(node, partition_number)`
- MagLev: see [here](https://blog.acolyer.org/2016/03/21/maglev-a-fast-and-reliable-software-network-load-balancer/) - MagLev: see [here](https://blog.acolyer.org/2016/03/21/maglev-a-fast-and-reliable-software-network-load-balancer/)
MagLev provided significantly better balancing, as it guarantees that the exact MagLev provided significantly better balancing, as it guarantees that the exact