Are logical nodes responsible for a continuous range of keys or a random set of keys

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Are logical nodes responsible for a continuous range of keys or a random set of keys



I was reading the DynamoDB whitepaper. In it, it is explained how keys obtained from a hash function create a (circular) range. Then, logical nodes are responsible for continuous segments of that range.



Dynamo’s partitioning scheme relies on consistent hashing to
distribute the load across multiple storage hosts. In consistent
hashing [10], the output range of a hash function is treated as a
fixed circular space or “ring” (i.e. the largest hash value wraps
around to the smallest hash value). Each node in the system is
assigned a random value within this space which represents its
“position” on the ring. Each data item identified by a key is assigned
to a node by hashing the data item’s key to yield its position on the
ring, and then walking the ring clockwise to find the first node with
a position larger than the item’s position.



However, under Uniform Load Distribution, some strategies are detailed:



Strategy 1: T random tokens per node and partition by token value



Strategy 2: T random tokens per node and equal sized partitions



So then these tokens(which I'm assuming are keys?) are distributed randomly to nodes?



So logical nodes are responsible for a continuous range of keys or a random set of keys?





Note that this isn't a "DynamoDB" whitepaper. It is a paper from 2007 about a predecessor system, called "Dynamo." "Amazon DynamoDB is based on the principles of Dynamo" but to exactly what extent is not specified.
– Michael - sqlbot
Aug 11 at 20:56





1 Answer
1



Disclaimer: I've just the read paper, I'm no expert.



I understand tokens and keys to both occupy key space (i.e. position on the key ring), however they are not the same thing.



Dynamo uses the concept of “virtual nodes”. A virtual node looks like
a single node in the system, but each node can be responsible for more
than one virtual node. Effectively, when a new node is added to the
system, it is assigned multiple positions (henceforth, “tokens”) in
the ring.



So, in the "basic consistent hashing algorithm" approach, you take each node in your system and randomly assign it a position in the key ring. Therefore each node is responsible for a single continuous range in key space. A wedge of the circle if you will.



The authors note this has some problems around uniformity of access. So instead they came up with a "variant of the consistent hashing algorithm".



In the alternative scheme each node is given a set of 'Tokens'. A token is a virtual node. Conceptually you can imagine lots of small pieces of key space, taken from all around the ring, and assigned to the node. Or in my head - lots of tiny wedges from all around the circle.



In the actual scheme they went for, each virtual node (token) is a continuous set of keys. However each actual node has multiple non-continuous virtual nodes.



Therefore each node has many sections of continuous key space, but taken from all over the total key space. Not quite random and not quite continuous either!






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard