Using DPI to distinguish QUIC, HTTP/3, and HTTP/2

posted in Network

Deep packet inspection for modern web traffic is less about reading application payloads and more about making fast decisions from the few bytes that are still visible. That is especially true for QUIC. After the handshake, QUIC encrypts almost everything a middlebox would like to inspect. If the DPI engine misses the first few packets, later packets often only say: “this is probably a QUIC connection that I should have classified earlier.”

The goal is not to decrypt users’ web traffic. The practical goal is cheaper:

  1. Quickly screen out packets that are definitely not QUIC.
  2. Identify likely QUIC connections from early UDP packets.
  3. Distinguish HTTP/3 over QUIC from other QUIC applications when the handshake exposes enough information.
  4. Distinguish HTTP/2 from HTTP/3 by remembering that normal HTTP/2 is usually h2 over TLS/TCP, while HTTP/3 is h3 over QUIC/UDP.

The terminology matters. HTTP/3 is the standardized HTTP mapping over QUIC. HTTP/2 is normally carried over TCP, with h2 selected by TLS ALPN. Early drafts and old Google QUIC experiments used different names and version tags, but for current protocol classification the useful split is:

HTTP/2:  TCP + TLS + ALPN "h2"
HTTP/3:  UDP + QUIC + TLS-in-QUIC + ALPN "h3"

What DPI can still see

A DPI classifier sees packet metadata first:

  • IP version, addresses, and packet length
  • L4 protocol: TCP or UDP
  • source and destination ports
  • timing, direction, packet sizes, and flow shape
  • early handshake bytes

For TCP TLS traffic, the ClientHello is normally visible unless the deployment uses newer encryption mechanisms. A DPI engine can often read ALPN from the TLS ClientHello and classify h2, http/1.1, or other protocols.

For QUIC, the first packets are UDP datagrams that carry QUIC long-header packets. QUIC Initial packets carry TLS handshake data inside QUIC CRYPTO frames. Initial packets are protected, but this protection is not meant to provide confidentiality from a passive observer that sees the packet. A DPI engine can implement QUIC Initial parsing and recover the TLS ClientHello in order to inspect ALPN.

After the handshake, the situation changes. QUIC 1-RTT packets use short headers and encrypted payloads. A DPI engine should not expect to classify a connection from arbitrary mid-flow QUIC packets with the same confidence as from the first client Initial.

Fast path before deep parsing

Most packets are not QUIC. The first optimization is to avoid expensive parsing for them.

A cheap packet classifier should run in stages:

packet
  -> L3/L4 screen
  -> cheap QUIC invariant checks
  -> bounded QUIC long-header parser
  -> optional QUIC Initial unprotection
  -> TLS ClientHello / ALPN parser
  -> flow cache

The cost model is:

$$ E[C] = C_0 + p_1 C_1 + p_2 C_2 + p_3 C_3 $$

where C0 is the cheap work done for every packet, C1 is the UDP/port screen, C2 is the QUIC header parser, and C3 is the expensive Initial/TLS parser. The probabilities p1, p2, and p3 should shrink at each stage.

That is the entire performance trick. Do not run the expensive parser on every packet. Do not even run it on every UDP packet. Only run it on packets that pass cheap structural checks.

QUIC header signals

QUIC has version-independent header properties that are useful for DPI.

The first byte gives two important bits:

0x80: header form bit
      1 = long header
      0 = short header

0x40: fixed bit
      QUIC packets normally set this bit

For the first byte b0:

$$ \operatorname{longHeader} = \operatorname{bitand}(b_0, 0x80) \ne 0 $$$$ \operatorname{fixedBitSet} = \operatorname{bitand}(b_0, 0x40) \ne 0 $$

For new connections, the client speaks first and the early packets use long headers. A fast QUIC candidate check can start with:

bool maybe_quic_long_header(uint8_t *p, size_t n) {
    if (n < 7) return false;

    uint8_t b0 = p[0];
    bool long_header = (b0 & 0x80) != 0;
    bool fixed_bit = (b0 & 0x40) != 0;

    if (!long_header || !fixed_bit) return false;

    uint32_t version =
        ((uint32_t)p[1] << 24) |
        ((uint32_t)p[2] << 16) |
        ((uint32_t)p[3] << 8)  |
        ((uint32_t)p[4]);

    return version != 0;
}

This is not a complete QUIC parser. It is only a first-stage rejection filter. It is useful because it is cheap: a few length checks, two bit tests, and one version read.

Version 0x00000000 is special: it indicates QUIC Version Negotiation. That is also a useful signal, but it is server-to-client evidence rather than a normal client Initial.

Parsing enough of the long header

A better candidate parser checks the long-header structure:

first byte
version
destination connection ID length
destination connection ID
source connection ID length
source connection ID
packet-type-specific fields

For an Initial packet, those type-specific fields are:

token length
token
length
packet number
protected payload

The parser should reject impossible lengths early. For example:

  • UDP payload is too short for a long header
  • destination connection ID length points beyond the datagram
  • source connection ID length points beyond the datagram
  • variable-length integer encoding is truncated
  • Initial length field is larger than the remaining datagram

This stage is still cheap. It does not decrypt or decompress anything. It only verifies that the packet looks structurally like QUIC.

Initial packets and ALPN

The strongest HTTP/3 signal is ALPN h3 in the TLS handshake carried by QUIC.

For QUIC v1, a DPI engine can derive Initial packet keys from the destination connection ID in the client’s first Initial packet, remove Initial protection, extract CRYPTO frames, and parse the TLS ClientHello. This is more work than the header checks, so it should only run after the packet passes the cheap filters.

The classification result should look like this:

UDP packet
  -> valid QUIC Initial
  -> TLS ClientHello recovered
  -> ALPN contains "h3"
  -> classify flow as HTTP/3

If the ClientHello is fragmented across multiple CRYPTO frames or multiple datagrams, the DPI engine needs a small bounded reassembly buffer. Keep this buffer per flow and cap it aggressively. The engine only needs enough bytes to parse the ClientHello extensions, not the entire connection.

A practical cap might be:

first 4 to 8 UDP datagrams per direction
first 8 to 16 KiB of QUIC CRYPTO stream data
short timeout, such as 2 to 5 seconds

Those numbers are policy choices, not protocol requirements. The point is to bound CPU and memory even under hostile or broken traffic.

HTTP/2 detection

HTTP/2 detection is different because normal HTTP/2 is not QUIC.

For HTTPS, the best signal is TLS ALPN:

TCP packet
  -> TLS ClientHello
  -> ALPN contains "h2"
  -> classify flow as HTTP/2

Cleartext HTTP/2 is rare on the public Internet, but it has a visible client connection preface:

PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n

That is a direct payload signature on TCP. It is useful in lab traffic, internal networks, and protocol tests, but it should not be the primary detector for normal web traffic because browsers generally negotiate HTTP/2 over TLS.

The important rule is:

UDP + QUIC + ALPN h3  -> HTTP/3
TCP + TLS  + ALPN h2  -> HTTP/2

If someone says “HTTP/2 QUIC”, treat that as ambiguous. They might mean HTTP/3, old pre-standard HTTP-over-QUIC drafts, old gQUIC, or just “web traffic over QUIC”.

Pattern recognition instead of one signature

A single signature is brittle. A better DPI engine assigns a score from multiple features.

For flow f, define binary or numeric features:

$$ x(f) = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix} $$

Examples:

x1 = UDP destination port is 443 or 8443
x2 = first observed client packet has QUIC long-header bit
x3 = fixed bit is set
x4 = version field is known or plausible
x5 = destination connection ID length is sane
x6 = source connection ID length is sane
x7 = Initial packet varints parse correctly
x8 = first client datagram is at least about 1200 bytes
x9 = server replies with long-header QUIC packet
x10 = Initial decrypt succeeds
x11 = TLS ClientHello contains ALPN h3

A simple linear score works well as an engineering model:

$$ S(f) = \sum_i w_i x_i $$

Then classify by thresholds:

$$ \text{class}(f) = \operatorname*{argmax}_{k} S_k(f) $$

In a rule-based system, this is just thresholding: return HTTP/3 when S_h3 >= T_h3, return QUIC when S_quic >= T_quic, and otherwise keep the flow unknown.

This does not have to be machine learning. The weights can be hand-tuned:

ALPN h3 visible:                 very strong signal
Initial decrypt succeeds:        strong signal
valid long-header structure:     medium signal
UDP/443 only:                    weak signal
packet size pattern only:        weak signal

The classifier should also use negative evidence:

long-header bit missing on first client packet:     lower confidence
fixed bit missing:                                  lower confidence
invalid connection ID lengths:                      reject
invalid QUIC varints:                               reject
TCP flow with ALPN h2:                              HTTP/2, not HTTP/3

This is where pattern recognition helps. UDP/443 alone is not QUIC. A 1200-byte UDP packet alone is not QUIC. But UDP/443 plus valid QUIC long header plus sane connection IDs plus a decryptable Initial plus ALPN h3 is a strong classification.

Flow cache: do the work once

The DPI engine should classify flows, not packets.

Use a table keyed by the 5-tuple:

source IP
destination IP
source port
destination port
transport protocol

For QUIC, also record observed connection IDs when possible. QUIC connection migration can change the 5-tuple, while connection IDs help endpoints and some network devices associate packets with the same connection.

A simple state machine:

UNKNOWN
  -> UDP_CANDIDATE
  -> QUIC_LONG_HEADER_SEEN
  -> QUIC_INITIAL_PARSED
  -> HTTP3_CONFIRMED

UNKNOWN
  -> TCP_TLS_CANDIDATE
  -> TLS_CLIENT_HELLO_PARSED
  -> HTTP2_CONFIRMED

Once the flow is classified, stop deep parsing the data path. Future packets only need a cache lookup:

$$ C_{\text{steady}} \approx C_{\text{hash lookup}} + C_{\text{policy action}} $$

That is the performance target. The expensive work belongs to the first few packets, not the whole connection.

Fast screening pipeline

A production DPI path can be split into three tiers.

Tier 0: packet metadata

if protocol is not UDP or TCP:
    return unknown

if UDP and neither port is commonly used for QUIC:
    return unknown_or_low_priority

if TCP and neither port is commonly used for TLS:
    return unknown_or_low_priority

Tier 1: cheap protocol shape

if UDP:
    check QUIC first byte
    check fixed bit
    check version field
    check connection ID lengths
    check Initial varints

if TCP:
    look for TLS record header
    parse ClientHello if present

Tier 2: expensive confirmation

if QUIC Initial candidate:
    derive Initial keys
    remove header protection
    decrypt Initial payload
    extract CRYPTO frames
    parse TLS ClientHello
    inspect ALPN

if TLS ClientHello over TCP:
    inspect ALPN

This gives a good balance. Most traffic exits at Tier 0 or Tier 1. Only likely QUIC Initial packets reach Tier 2.

What can go wrong

There are several ways a DPI classifier can lose confidence:

  • It starts observing after the handshake.
  • The first Initial packet is dropped before the monitor sees it.
  • The ClientHello is split across packets and the DPI buffer is too small.
  • The traffic uses QUIC on an unexpected port.
  • A non-HTTP/3 application uses QUIC.
  • ECH or future TLS changes hide more handshake metadata.
  • Hardware offload, GRO, or capture placement changes packet boundaries.
  • QUIC connection migration changes the 5-tuple.

These are reasons to return “unknown” or “probable QUIC”, not reasons to overclaim. A good DPI system should expose confidence levels:

confirmed_http3
probable_quic
possible_quic
confirmed_http2
unknown_tls
unknown_udp

Example decision table

ObservationClassificationConfidence
TCP ClientHello ALPN contains h2HTTP/2high
TCP cleartext preface starts with PRI * HTTP/2.0HTTP/2 cleartexthigh
UDP long-header QUIC Initial, ALPN contains h3HTTP/3high
UDP valid QUIC long header, Initial decrypt not attemptedQUICmedium
UDP/443 with no valid QUIC structureunknown UDPlow
QUIC short-header packets only, no cached stateprobable QUIClow to medium
UDP QUIC with ALPN not h3non-HTTP/3 QUIC or unknown QUIC appmedium to high

Why this is faster

The naive DPI approach asks every packet a hard question:

What exact application is this?

The faster design asks a sequence of cheaper questions:

Can this packet be ignored?
Can this packet be QUIC?
Can this packet be an Initial?
Can I recover ALPN?
Have I already classified this flow?

The expected cost falls because most packets never reach the expensive work:

$$ E[C_{\text{fast}}] = C_{\text{metadata}} + p_{\text{candidate}} C_{\text{header}} + p_{\text{initial}} C_{\text{crypto}} + p_{\text{new-flow}} C_{\text{state}} $$

For steady-state traffic in already-classified flows:

$$ p_{\text{initial}} \rightarrow 0 $$

So the DPI engine mostly does flow-cache lookups and policy actions. That is how it stays fast under load.

Practical implementation notes

Keep the hot path small:

  • Parse fixed offsets before variable-length structures.
  • Reject impossible lengths before allocating memory.
  • Cap per-flow reassembly buffers.
  • Cache positive and negative classifications.
  • Expire unconfirmed candidates quickly.
  • Track confidence, not just a boolean protocol label.
  • Treat UDP/443 as a weak hint, not proof.
  • Treat ALPN as the strongest signal when it is visible.
  • Stop deep parsing after classification.

If the deployment uses eBPF, XDP, or a kernel fast path, put only Tier 0 and maybe Tier 1 there. Keep QUIC Initial deprotection and TLS extension parsing in a safer bounded userspace path unless there is a strong reason to do otherwise.

The main engineering principle is simple: use DPI to classify the beginning of the flow, then use pattern recognition and flow state to avoid doing DPI forever.

References: