eBPF Maps State Synchronization using Go and gRPC

When eBPF started gaining popularity, its initial adoption focused primarily on observability, offering developers new ways to monitor and understand their systems. As technology evolved, eBPF’s capabilities expanded significantly. Today, (among other applications) it is widely used for stateful networking solutions such as load balancing, connection tracking, firewalls, and Carrier-Grade NAT (CGNAT).

Deploying these stateful eBPF applications in clusters is essential to avoid single points of failure and ensure high availability. Unlike stateless applications, which do not require synchronization, stateful applications need to maintain consistent state information across all nodes in a cluster like Kubernetes. In stateful applications, state is maintained in the application or in some centralized database but in case of an eBPF application, state or rather information is maintained in the eBPF Maps. And, state of each node needs to be synchronized across the cluster.

Synchronization, in itself, t. Traditionally, netfilter or iptables were used for Linux stateful filtering. Conntrackd, a purpose built daemon was used for synchronization. But, there are no known synchronization tool or daemon available for eBPF Maps.

State Synchronization

State synchronization, also known as data coherence, is not a new concept — but it’s a complex one. It not only involves the problem of timing (time synchronization and event ordering), but also the prevention and resolution of race conditions, among other issues. While there are many ways to tackle this problems, we will focus on a proof-of-concept solution and how we detect the eBPF map changes and exchange it with other nodes.

Our approach leverages asynchronous eBPF map notification updates. Whenever there is a change in a specific eBPF map, this information is sent to all peers using the gRPC protocol, allowing them to synchronize accordingly. But let’s talk about this a bit more into details.

Detecting eBPF Map Updates

To keep things simple, we focus on detecting and triggering events specifically for updates to eBPF Hash Maps (BPF_MAP_TYPE_HASH), while the concept for other types like BPF_MAP_TYPE_ARRAY remain similar. We accomplish this by attaching our programs to eBPF fentry (“function entry”) kernel hook points. Specifically:

fentry/htab_map_update_elem: Triggered on every element update in the Hash eBPF map.SEC(“fentry/htab_map_update_elem”)
int BPF_PROG(bpf_prog_kern_hmapupdate, struct bpf_map *map, void *key, void *value, u64 map_flags) {
log_map_update(map, key, value, MAP_UPDATE);
return 0;
}fentry/htab_map_delete_elem: Triggered on every element deletion in the Hash eBPF map.SEC(“fentry/htab_map_delete_elem”)
int BPF_PROG(bpf_prog_kern_hmapdelete, struct bpf_map *map, void *key) {
log_map_update(map, key, 0, MAP_DELETE);
return 0;
}⚠️ Note: These hooks are triggered both from user space, via syscalls that update the eBPF Hash Map, and from kernel space, through the invocation of eBPF helper functions in your kernel programs.

To funnel these events to our user-space application, we use an eBPF ring buffer. We transfer event data from kernel space to user space using the following function. This function reads the parameters provided by the function entry hook, combines them into a struct, and then sends the struct to user space:

static void __always_inline
log_map_update(struct bpf_map* updated_map, unsigned int *pKey, unsigned int *pValue, enum map_updater update_type)
{
// This prevents the proxy from proxying itself
__u32 key = 0;
struct Config *conf = bpf_map_lookup_elem(&map_config, &key);
if (!conf) return;
if ((bpf_get_current_pid_tgid() >> 32) == conf->host_pid) return;

// Get basic info about the map
uint32_t map_id = MEM_READ(updated_map->id);
uint32_t key_size = MEM_READ(updated_map->key_size);
uint32_t value_size = MEM_READ(updated_map->value_size);

// memset the whole struct to ensure verifier is happy
struct MapData out_data;
__builtin_memset(&out_data, 0, sizeof(out_data));

bpf_probe_read_str(out_data.name, BPF_NAME_LEN, updated_map->name);
bpf_probe_read(&out_data.key, sizeof(*pKey), pKey);
out_data.key_size = key_size;
if (pValue != 0) {
bpf_probe_read(&out_data.value, sizeof(*pValue), pValue);
out_data.value_size = value_size;
}
out_data.map_id = map_id;
out_data.pid = (unsigned int)(bpf_get_current_pid_tgid() >> 32);
out_data.update_type = update_type;

// Write data to be processed in userspace
bpf_ringbuf_output(&map_events, &out_data, sizeof(out_data), 0);
}

State Change Broadcast

Once we receive the eBPF map update struct in the user space, we can optionally filter for specific eBPF maps or events we want to synchronize across the cluster. For simplicity, we’ll leave this discussion for some time else.

So we broadcast these events to cluster-wide entities using a reliable messaging framework. For our proof of concept, we’ve chosen gRPC (Remote Procedure Call) due to its performance benefits, although other frameworks could also be used effectively. In our setup, each node in the cluster functions both as a server and a client. Specifically, it acts as a server to receive updates from other nodes through the SetValue gRPC call:

func (n *Node) SetValue(ctx context.Context, in *ValueRequest) (*Empty, error) {
value := in.GetValue()
key := in.GetKey()
_type := in.GetType()

// This operations are ATOMIC by nature! (Ref: https://man7.org/linux/man-pages/man2/bpf.2.html)
if MapUpdater(_type).String() == “UPDATE” {
n.syncObjs.HashMap.Update(key, value, ebpf.UpdateAny)
log.Printf(“Client updated key %d to value %d”, key, value)
} else if MapUpdater(_type).String() == “DELETE” {
n.syncObjs.HashMap.Delete(key)
log.Printf(“Client deleted key %d”, key)
}

return &Empty{}, nil
}⚠️ Note: Sync messages are actually passed as parameters to the SetValue RPC call, triggered by client on the remote node.

And as a client to send sync messages (received through the eBPF ring buffer) to the rest of the cluster:

for {
record, err := rd.Read()
if err != nil {
panic(err)
}

// Print data for the sake of observing the messages
Event := (*MapData)(unsafe.Pointer(&record.RawSample[0]))
log.Printf(“Map ID: %d”, Event.MapID)
log.Printf(“Name: %s”, string(Event.Name[:]))
log.Printf(“PID: %d”, Event.PID)
log.Printf(“Update Type: %s”, Event.UpdateType.String())
log.Printf(“Key: %d”, Event.Key)
log.Printf(“Key Size: %d”, Event.KeySize)
log.Printf(“Value: %d”, Event.Value)
log.Printf(“Value Size: %d”, Event.ValueSize)

conn, err := grpc.NewClient(address, grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
log.Fatalf(“Failed to connect to peer: %v”, err)
continue
}
client := NewSyncServiceClient(conn)
ctx, _ := context.WithTimeout(context.Background(), time.Second)
_, err = client.SetValue(ctx, &ValueRequest{Key: int32(Event.Key), Value: int32(Event.Value), Type: int32(Event.UpdateType), Mapid: int32(Event.MapID)})
if err != nil {
log.Printf(“Could not set value on peer: %v”, err)
} else {
log.Printf(“Successfully send sync message”)
}
log.Println(“=====================================”)
}

This dual role ensures that all nodes stay synchronized with the latest changes to the eBPF maps.

⚠️ Note: This is not a production code. For this demo, program was simplified to only work with two nodes, but this could be further improved.

Complete source code can be found on my GitHub repository:

GitHub – dorkamotorka/ebpf-multihost-map-sync: Multihost eBPF Map Sync Setup

Conclusion

In conclusion, synchronizing eBPF map states across a multi-node Kubernetes cluster is a complex but essential task for maintaining high availability and avoiding single points of failure in stateful applications. By leveraging eBPF fentry hook points and gRPC, we can effectively broadcast map updates to all nodes in the cluster, ensuring consistent state across the system. This proof of concept demonstrates a viable approach to state synchronization, though further refinements and optimizations are possible.

To stay up-to-date with the latest cloud tech updates, subscribe to my monthly newsletter, Cloud Chirp. 🚀

eBPF Maps State Synchronization across Multi-Node Kubernetes Cluster was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

​ Level Up Coding – Medium

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner
Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.