Blog Detail | Darumatic – Cloud Native Consulting

BLOG

Taming Multi-Cloud Kubernetes Networking with Topology Aware Routing

Oct. 27, 2025

Building a multi-cloud Kubernetes cluster is a fascinating challenge. The goal is a single, unified control plane spanning multiple providers (like AWS, Azure, and GCP), but the real hurdle is networking. How do you make pods in different clouds talk to each other securely and efficiently?

I recently tackled this for a presentation, and the journey was... educational. I started with a simple, lightweight stack but quickly ran into deep networking issues, specifically with Kubernetes Services. Here’s how I built it, what broke, and how I ultimately fixed it.

The Initial Setup: Terraform, Ansible, K3s, and WireGuard

My plan was to keep things simple.

Kubernetes Distro: I chose K3s for its lightweight footprint and simplicity. It's perfect for a project like this, avoiding the overhead of a full-blown K8s setup.
Infrastructure: I used Terraform to create three separate modules, one each for Azure, AWS, and GCP. Each module provisioned two VMs, each with a public IP address. This public IP was crucial for the WireGuard communication tunnel.
Configuration: The Terraform output dynamically created an Ansible inventory file. I then ran a seriesDate: of Ansible playbooks to:
1. Install and configure WireGuard on all nodes, creating a flat, secure overlay network across all three clouds.
2. Install K3s on all nodes to form a single cluster.

With the cluster up, it was time to test. I used the k8s-netperf benchmark to see how well the inter-cluster communication performed.

The Problem: When Services Betray You

The initial k8s-netperf results were a mixed bag.

The Good: Direct node-to-node and pod-to-pod communication over the WireGuard tunnel was fine. Performance was acceptable, even between different clouds.
The Bad: The moment I targeted a Kubernetes Service (e.g., netperf-server), performance fell off a cliff. The traffic was unstable—sometimes acceptable, but most of the time extremely slow, often worse than sending traffic across clouds, even when a local pod was available.

This meant the overlay network (WireGuard) was working, but Kubernetes's service discovery and load balancing (managed by kube-proxy) was not playing nice.

The Troubleshooting Journey

My first instinct was to fix kube-proxy's behavior.

Attempt 1: Topology-Aware Hints

I tried enabling Kubernetes's built-in Topology-Aware Hints. I patched my Service with:

service.kubernetes.io/topology-mode: auto
trafficDistribution: PreferClose

The goal was to tell kube-proxy to prioritize endpoints in the same zone (i.e., the same cloud). This did nothing. It seemed kube-proxy was either ignoring my changes or was incapable of implementing them in this setup.

Attempt 2: Switch to IPVS

K3s, like many distros, defaults to iptables mode for kube-proxy. I thought switching to ipvs mode might be more intelligent.

This was a disaster. As soon as I enabled ipvs, it immediately conflicted with my WireGuard network and completely disrupted the tunnel. Back to square one.

Attempt 3: Replace Kube-proxy (and Flannel)

At this point, I was convinced kube-proxy was the problem. The solution? Get rid of it.

This created a new problem: K3s's default CNI, Flannel, doesn't support running without kube-proxy. I needed to replace both. The two main candidates were Calico and Cilium.

Candidate 1: Calico: I tried Calico first in L3 mode (no overlay). While robust, it created a new routing nightmare. Flannel's VXLAN encapsulation conveniently hides the pod network from the underlying WireGuard network. Without that, Calico didn't know how to route pod traffic to other nodes over the tunnel. This would require setting up BGP or, even worse, manually adding the pod CIDR of every other node to the AllowedIPs list in each node's WireGuard config. This was not scalable.
Candidate 2: Cilium: This looked promising. Cilium supports VXLAN (playing nice with our overlay) and, critically, has a mature kubeProxyReplacement feature.

The Solution: Cilium to the Rescue

I decided to go all-in on Cilium.

Step 1: Initial Cilium Install

I removed Flannel and installed Cilium using its Helm chart, making sure to enable the kube-proxy replacement:

# Define our K3s API server info

API_SERVER_IP=10.200.0.1

API_SERVER_PORT=6443

helm install --upgrade cilium cilium/cilium --version 1.18.3 \

--namespace kube-system \

--set kubeProxyReplacement=true \

--set k8sServiceHost=${API_SERVER_IP} \

--set k8sServicePort=${API_SERVER_PORT}

The result? A massive improvement! The service traffic was stable. The wild performance swings were gone.

However, it still wasn't perfect. The k8s-netperf benchmark showed that traffic was still being routed to pods in other zones (clouds), even when a local pod was available. Cilium was balancing the traffic fairly, but not smartly.

Step 2: The Final Fix

After digging through the Cilium documentation, I found the magic flag I was missing. I ran a cilium upgrade command to enable its native service topology awareness:

cilium upgrade --set loadBalancer.serviceTopology=true

This was it. 🚀

The result was flawless. The k8s-netperf benchmarks now showed exactly what I wanted:

Service traffic always preferred a pod running on the same node.
If no pod was on the same node, it preferred a pod in the same zone (the same cloud provider).
Only as a last resort would it send traffic across the WireGuard tunnel to another cloud.

Final Takeaway

Setting up a multi-cloud overlay with WireGuard is surprisingly straightforward. The real complexity lies in making Kubernetes's internal networking (specifically Service routing) aware of your underlying topology.

While kube-proxy struggled, Cilium's kubeProxyReplacement mode combined with its loadBalancer.serviceTopology feature proved to be the perfect solution. It intelligently routes traffic, respects network zones, and finally made my multi-cloud cluster performant and predictable.