Tags
What is memcached?
memcached is an in-memory key-value pairs data store.
memcached is a shared service accessed via App Engine APIs.
memcached structure (see here for more information)
Consistent hashing uses a counter that acts like a clock. Once it reaches the “12”, it wraps around to “1” again. Suppose this counter is 16 bits. This means it ranges from 0 to 65535. If we visualize this on a clock, the number 0 and 65535 would be on the “12”, 32200 would be around 6’o clock, 48000 on 9 o’clock and so on. We call this clock the continuum.
On this continuum, we place a (relative) large amount of “dots” for each server. These are placed randomly so we have a clock with a lot of dots.
As an example, let’s display a continuum with 3 servers (s1, s2 and s3) and 2 dots per server:
If the continuum is a 16 bit number, dots for s1 are around 10 and 29000, more or less. s2 would have dots on 39000 and 55000, and s3 would have 8000 and 48000. Now, when we store a key, we can create a hash that result in a 16 bit number. That number can be plotted onto the continuum as well. We have 4 keys (k1 to k4) which result into 4 numbers: 15000, 52000, 34000, and 38000. Visualized as the red dots in the next figure (pardon my drawing-skills):
To find the actual server where a key should be stored, the only thing we need to do is follow the continuum clock until we hit a “server”-dot. For k1 we follow the continuum until we hit server s1. K2 will hit dot s2. K3 will hit s2, and k4 will hit s3. So far, there is nothing really special happening. In fact, it looks like a lot of extra work to achieve something we could easily do with our modulus-algorithm.
Here is where consistent hashing is advantageous: Suppose server 2 will be removed from the memcache server pool. What would happen when we want to fetch key k1? Nothing strange would happen. We plot k1 still on the same position in the continuum, and the first server-dot it will hit is still s1.
However, when fetching k3, which is stored on s2, it would miss the s2-dot (since it has been removed), and will be moved to server s3. Same goes for k2, which moves from s2 to s1.
In fact, the more server-dots we place onto the continuum, the less key-misses we get in case a server gets removed (or added). A good number would be around 100 to 200 dots, since more dots would result in a slower lookup on the continuum (this has to be a very fast process of course). The more servers you add, the better the consistent hashing will perform.
Instead of almost 100% of the key-movements you have when using a standard modulus algorithm, the consistent hashing algorithm would maybe invalidate 10-25% of your keys (these numbers drop down quickly the more servers you use) which means the pressure on your backend system (like the database) will be much less than it would when using modulus.
Properties
No replication
no major configuration required
uses specified RAM value, used LRU in case RAM is fully utilized
pros and cons
Pros:
- reduces database load.
- perfect for websites with high database load
- significantly reduces the number of retrieval requests to database
- cuts down the I/O access
Cons:
- not a persistent data store
- not a database
- not application specific
- can not cache large object(key size = 255 chars, max value = 1MB)
See the comparison with Redis here.