notes on kernel TLS for streaming proxies
kTLS keeps showing up in answers to “why is this proxy slow.” Quick refresher: once
you’ve done the userspace handshake, you can hand the cipher state to the kernel and let
splice() carry the bytes. Saves the userspace round-trip on every read/write. The
catch is the kernel falls back to userspace on any record it doesn’t recognise (alerts,
key updates, post-handshake auth) — so anything that flows mid-stream needs handling on
the control side or you get desync. Boring point but it bites.
The setsockopt to install the cipher is TLS_TX / TLS_RX, with the
record-layer state derived from the handshake. linux/tls.h has the layouts; the
kernel wants the IV, the key, and the record sequence number. Easy to mess up the seq if the
handshake involved any application data already, so you have to count carefully.
Two practical things that bit me. First, key updates: on a long connection a TLS 1.3 peer is allowed to rekey at any time, and once that happens the kernel-side cipher state is stale. Need to either intercept the KeyUpdate alert in userspace (defeats most of the kTLS win) or just disable rekeying on that direction (RFC-permitted but nonstandard). Second, splice from a kTLS fd to another kTLS fd technically works but I haven’t found a kernel old enough to refuse it — documented to require recent enough kernels, can’t pin the version.
Worth it for the “just relay bytes” case. Not worth it if you do anything to the plaintext mid-stream — you end up paying the userspace cost anyway plus the bookkeeping overhead of swapping cipher state in and out.