Improving the Performance of a Simple Key-Value Store in Go with Sharding
In this article, we will show you how to implement a simple key-value store in Go and then improve its performance by using sharding.
We can implement a simple key-value store that is concurrently safe by using a mutex. However, using a single mutex creates a bottleneck, as all concurrent requests will have to wait for the mutex to be released before they can proceed.
package store
import "sync"
type Store interface {
Get(key string) ([]byte, bool)
Set(key string, value []byte)
}
type store struct {
mu *sync.RWMutex
items map[string][]byte
}
func NewStore() Store {
return &store{
mu: &sync.RWMutex{},
items: make(map[string][]byte),
}
}
func (s *store) Get(key string) ([]byte, bool) {
s.mu.RLock()
defer s.mu.RUnlock()
value, ok := s.items[key]
return value, ok
}
func (s *store) Set(key string, value []byte) {
s.mu.Lock()
defer s.mu.Unlock()
s.items[key] = value
}
We can write an integration test to set and get one million records concurrently.
package store
import (
"fmt"
"reflect"
"testing"
)
func TestIntegration(t *testing.T) {
s := NewStore()
goroutines := 1024
count := 1_000_000
sem := make(chan struct{}, goroutines)
// concurrent sets
for i := 0; i < count; i++ {
sem <- struct{}{}
go func(j int) {
defer func() {
<-sem
}()
key, value := makeKeyValuePair(j)
s.Set(key, value)
}(i)
}
for i := 0; i < goroutines; i++ {
sem <- struct{}{}
}
// concurrent gets
for i := 0; i < count; i++ {
<-sem
go func(j int) {
defer func() {
sem <- struct{}{}
}()
key, expected := makeKeyValuePair(j)
actual, ok := s.Get(key)
if !ok {
t.Errorf("key doesn't exist: '%s'", key)
}
if !reflect.DeepEqual(expected, actual) {
t.Errorf("expected doesn't match actual, expected: '%s' actual: '%s'", expected, actual)
}
}(i)
}
for i := 0; i < goroutines; i++ {
<-sem
}
}
func makeKeyValuePair(i int) (string, []byte) {
key := fmt.Sprintf("%d", i)
value := []byte(fmt.Sprintf("%d", i))
return key, value
}
This test is completed in 2.23 seconds on my machine.
To solve the bottleneck caused by the single mutex, we can divide the data into shards, each with its own mutex. This reduces contention for the mutex, which can improve throughput and latency.
A shard is essentially a smaller key-value store that can be accessed independently.
package store
import (
"crypto/md5"
"encoding/hex"
"sync"
)
const defaultShardKeyLength = 2 // default shard count: 16^2
type shard struct {
mu *sync.RWMutex
items map[string][]byte
}
func makeShardKey(key string) string {
h := md5.New()
h.Write([]byte(key))
sum := h.Sum(nil)
encoded := hex.EncodeToString(sum)
return encoded[:defaultShardKeyLength]
}
func newShard() *shard {
return &shard{
mu: &sync.RWMutex{},
items: make(map[string][]byte),
}
}
func (s *shard) get(key string) ([]byte, bool) {
s.mu.RLock()
defer s.mu.RUnlock()
v, ok := s.items[key]
return v, ok
}
func (s *shard) set(key string, value []byte) {
s.mu.Lock()
defer s.mu.Unlock()
s.items[key] = value
}
We can modify our code to implement sharding.
package store
import "sync"
type Store interface {
Get(key string) ([]byte, bool)
Set(key string, value []byte)
}
type store struct {
mu *sync.RWMutex
shards map[string]*shard
}
func NewStore() Store {
return &store{
mu: &sync.RWMutex{},
shards: make(map[string]*shard),
}
}
func (s *store) Get(key string) ([]byte, bool) {
shardKey := makeShardKey(key)
s.mu.RLock()
shard, ok := s.shards[shardKey]
if !ok {
return nil, false
}
s.mu.RUnlock()
return shard.get(key)
}
func (s *store) Set(key string, value []byte) {
shardKey := makeShardKey(key)
s.mu.Lock()
shard, ok := s.shards[shardKey]
if !ok {
shard = newShard()
s.shards[shardKey] = shard
}
s.mu.Unlock()
shard.set(key, value)
}
If we run the integration test, it completes in 1.78 seconds, which is roughly 20% faster than the simple key-value store. This suggests that sharding has improved performance under the conditions of this test.
In conclusion, sharding can be a useful technique for improving the performance of a key-value store.
Thank you for reading my article. I hope that you found it helpful. If you have any questions or feedback, please feel free to leave a comment below.