Recognizing encrypted P2P and BitTorrent traffic from packet patterns

posted in Network

Encrypted P2P detection is not magic string matching. Once the peer wire payload is encrypted, a DPI engine cannot honestly say “I read BitTorrent from the payload.” What it can say is weaker and more useful:

this flow behaves like encrypted P2P with probability high enough to classify.

That is the right mental model. For plain BitTorrent, the early TCP stream is easy: the peer wire handshake starts with a one-byte length followed by the string BitTorrent protocol. For encrypted BitTorrent, that obvious header is gone. BEP 8 explicitly points out that Message Stream Encryption makes peer connections harder to identify because bytes are encrypted from the beginning of the peer connection.

So the detector has to move from exact signatures to bounded inference.

The classifier should combine four layers:

  1. Cheap packet metadata.
  2. Transport and protocol structure when it is visible.
  3. Flow behavior over the first few packets or seconds.
  4. Host behavior across many peers.

The important engineering goal is not maximum cleverness. It is to avoid spending work on normal packets while still catching the flows that deserve deeper inspection.

DPI, DFI, and encrypted traffic

Classic DPI looks inside packet payloads. Deep Flow Inspection, or DFI, looks at flow-level behavior: packet sizes, directions, timing, fan-out, connection churn, and session shape.

Encrypted P2P detection needs both.

Call it DPI if the packet classifier still owns the early packet path. But the useful signal is often DFI-like. The engine might parse DHT packets when they are visible, recognize uTP structure when UDP looks like uTP, and then fall back to flow statistics when the payload is encrypted.

The detector should never depend on one observation. A good implementation accumulates evidence:

$$ S = b + \sum_i w_i f_i $$

where \(f_i\) is a feature, \(w_i\) is its weight, and \(b\) is the base bias. Convert the score into a probability-like value:

$$ P = \frac{1}{1 + e^{-S}} $$

Then classify only when the evidence is strong enough:

$$ \operatorname{class}(flow) = \begin{cases} \operatorname{p2p}, & P \ge T_{\mathrm{high}} \\\\ \operatorname{unknown}, & T_{\mathrm{low}} < P < T_{\mathrm{high}} \\\\ \operatorname{normal}, & P \le T_{\mathrm{low}} \end{cases} $$

The gap between \(T_{\mathrm{low}}\) and \(T_{\mathrm{high}}\) is not wasted. It is hysteresis. It prevents a single odd packet from flipping the flow label.

What plain BitTorrent teaches us

Before discussing encrypted traffic, start from the unencrypted protocol. Plain BitTorrent has several strong structural signals.

The peer wire protocol begins with a handshake:

0x13 "BitTorrent protocol" reserved[8] info_hash[20] peer_id[20]

After that, peer messages are length-prefixed:

uint32_be length
uint8     message_id
payload   length - 1 bytes

The common message IDs are small integers: choke, unchoke, interested, not interested, have, bitfield, request, piece, cancel, and port. With plaintext traffic this is a simple parser.

The problem is that this parser is not useful after encryption hides the stream. But it still gives us the shape we are looking for:

  • long-lived peer sessions
  • many concurrent peers
  • small control messages mixed with larger data blocks
  • bidirectional transfer
  • bursts of requests followed by bursts of pieces
  • peer discovery through trackers, DHT, or PEX

The encrypted classifier should look for that shape, not the literal string.

Visible signals that survive encryption

Encryption hides bytes, not everything.

A passive classifier can still see:

  • IP protocol: TCP or UDP
  • source and destination addresses
  • source and destination ports
  • packet length
  • direction
  • timing
  • connection lifetime
  • number of peers contacted by one host
  • number of flows sharing similar behavior
  • UDP payload length and coarse structure
  • TCP packetization and burst shape

Some BitTorrent-adjacent traffic is still structured in visible ways.

DHT uses UDP and bencoded KRPC messages. BEP 5 defines queries with a y value of q, a query name in q, and arguments in a. Common query names include ping, find_node, get_peers, and announce_peer.

uTP is also UDP. BEP 29 defines a compact header with a packet type, version, extension field, connection ID, timestamps, receive window, sequence number, and ACK number. A uTP packet can be checked cheaply without parsing arbitrary payload.

PEX is exchanged after the BitTorrent extension protocol handshake, so it is not always visible when encryption is active. But if the traffic is plaintext or only partially protected, PEX has a very specific semantic role: it spreads peer addresses inside a swarm.

The classifier should treat each visible protocol as one signal, not final proof.

A layered detector

A practical detector should have a pipeline like this:

flowchart LR
  A[packet] --> B[constant-time L3/L4 screen]
  B --> C[flow cache lookup]
  C --> D{known flow?}
  D -->|yes| E[apply cached label]
  D -->|no| F[bounded first-packet parsers]
  F --> G[flow feature update]
  G --> H[host feature update]
  H --> I[score and threshold]
  I --> J[cache label]

The first stage should be boring. It should answer questions that are cheap:

Is this TCP or UDP?
Is the payload long enough for any parser?
Has this 5-tuple already been classified?
Is this a known local service that should be bypassed?
Has this flow exceeded the inspection budget?

Only flows that pass those checks should reach the expensive path.

Small exact examples

Exact parsing is still useful when traffic is not encrypted or when a related discovery protocol is visible.

Plain peer wire handshake

For TCP, the first bytes may be enough:

13 42 69 74 54 6f 72 72 65 6e 74 20 70 72 6f 74 6f 63 6f 6c

That is:

0x13 + "BitTorrent protocol"

Pseudocode:

function match_plain_bt_handshake(bytes):
    if len(bytes) < 20:
        return false
    if bytes[0] != 19:
        return false
    return bytes[1:20] == "BitTorrent protocol"

This is a strong positive signal, but it only works for plaintext peer wire traffic.

DHT KRPC over UDP

A BEP 5 DHT query is bencoded. A simplified ping query looks like:

d1:ad2:id20:<20-byte-node-id>e1:q4:ping1:t2:aa1:y1:qe

Do not parse arbitrary bencode deeply in the hot path. Use a bounded recognizer:

function score_dht_krpc(payload):
    if len(payload) < 20 or len(payload) > 1500:
        return 0
    if payload[0] != 'd' or payload[-1] != 'e':
        return 0

    score = 0
    if contains_bounded(payload, "1:y1:q"):
        score += 3
    if contains_bounded(payload, "1:q4:ping"):
        score += 2
    if contains_bounded(payload, "1:q9:get_peers"):
        score += 4
    if contains_bounded(payload, "13:announce_peer"):
        score += 4
    if contains_bounded(payload, "2:id20:"):
        score += 2

    return score

This is not a general bencode parser. It is a cheap detector for a few structural markers. A slow path can validate the full dictionary later if needed.

uTP-shaped UDP packets

The first byte of a uTP packet packs type and version:

type    = first_byte >> 4
version = first_byte & 0x0f

For BEP 29 uTP, the version is currently 1, and valid packet types include data, fin, state, reset, and syn.

Cheap recognizer:

function score_utp_shape(payload):
    if len(payload) < 20:
        return 0

    type = payload[0] >> 4
    version = payload[0] & 0x0f
    extension = payload[1]

    if version != 1:
        return 0
    if type > 4:
        return 0

    score = 2

    if extension == 0 or extension == 1:
        score += 1

    connection_id = u16be(payload[2:4])
    seq_nr = u16be(payload[16:18])
    ack_nr = u16be(payload[18:20])

    if plausible_sequence(seq_nr, ack_nr):
        score += 1
    if endpoint_has_utp_state(connection_id):
        score += 3

    return score

One packet shaped like uTP is weak evidence. A sequence of bidirectional packets with coherent connection IDs, sequence numbers, ACK numbers, and timestamps is much stronger.

Encrypted peer traffic: infer from behavior

When the peer stream is encrypted from the first byte, the payload may look random. The detector should stop pretending it can decode it.

Useful flow features include:

packet_count
byte_count
duration
client_to_server_packets
server_to_client_packets
mean_packet_size_by_direction
small_packet_ratio
large_packet_ratio
inter_arrival_mean
inter_arrival_variance
burst_count
flow_start_rate_per_host
distinct_remote_ip_count
distinct_remote_port_count
udp_dht_score_nearby
utp_score_nearby

The key is that encrypted BitTorrent is rarely just one flow. It is usually an ecosystem of flows:

  • a host contacts many peers
  • some peers are short-lived
  • some flows become long-lived
  • peer discovery runs near data transfer
  • TCP and UDP may appear together
  • upload and download are both meaningful

That host-level pattern is harder to fake accidentally than one packet-size pattern.

A scoring model

For each flow, keep a feature vector:

$$ \mathbf{x}_{\mathrm{flow}} = \left[ x_{\mathrm{plain}}, x_{\mathrm{dht}}, x_{\mathrm{utp}}, x_{\mathrm{fanout}}, x_{\mathrm{symmetry}}, x_{\mathrm{burst}}, x_{\mathrm{lifetime}} \right] $$

Then compute:

$$ S_{\mathrm{flow}} = \mathbf{w}_{\mathrm{flow}} \cdot \mathbf{x}_{\mathrm{flow}} $$

For each host, keep another vector:

$$ \mathbf{x}_{\mathrm{host}} = \left[ h_{\mathrm{peers}}, h_{\mathrm{newflow}}, h_{\mathrm{udp}}, h_{\mathrm{tcp}}, h_{\mathrm{dht}}, h_{\mathrm{ports}} \right] $$

and:

$$ S_{\mathrm{host}} = \mathbf{w}_{\mathrm{host}} \cdot \mathbf{x}_{\mathrm{host}} $$

The final score can combine both:

$$ S = \alpha S_{\mathrm{flow}} + (1 - \alpha) S_{\mathrm{host}} $$

where \(\alpha\) controls how much to trust the current flow compared with the host context.

This matters because a single encrypted flow may look like many things. A host that opens 80 short encrypted connections to random residential peers while also emitting DHT get_peers queries is much easier to classify.

Packet-level pseudocode

The hot path should be small.

function classify_packet(pkt):
    meta = parse_l3_l4_headers(pkt)
    if meta.invalid:
        return PASS

    flow_key = five_tuple(meta)
    state = flow_table.lookup(flow_key)

    if state != null and state.label_is_final:
        return state.label

    if should_bypass(meta):
        return PASS

    if state == null:
        state = flow_table.create(flow_key)

    if state.inspection_budget_exhausted:
        return state.current_label_or_pass()

    features = cheap_packet_features(pkt, meta)
    state.update(features)

    if state.packet_count <= FIRST_PACKET_LIMIT:
        state.score += bounded_protocol_scores(pkt.payload, meta)

    host = host_table.get(meta.local_host)
    host.update(meta, features, state.score)

    probability = score_to_probability(state, host)

    if probability >= HIGH_THRESHOLD:
        state.set_final_label(P2P_LIKELY)
    else if probability <= LOW_THRESHOLD and state.old_enough:
        state.set_final_label(NORMAL_LIKELY)
    else:
        state.set_temporary_label(UNKNOWN)

    return state.current_label_or_pass()

The most important line is not the scoring function. It is this:

if state.label_is_final:
    return state.label

Most packets in an established flow must not be reclassified from scratch.

Host-level pseudocode

Encrypted P2P is often clearer at host level than flow level.

function update_host_model(host, meta, flow_state):
    now = current_time()
    bucket = host.window(now)

    bucket.total_packets += 1
    bucket.total_bytes += meta.payload_len

    if flow_state.is_new:
        bucket.new_flows += 1
        bucket.remote_ips.add(meta.remote_ip)
        bucket.remote_ports.add(meta.remote_port)

    if flow_state.dht_score > 0:
        bucket.dht_flows += 1

    if flow_state.utp_score > 0:
        bucket.utp_flows += 1

    bucket.p2p_score =
        W1 * log1p(bucket.remote_ips.count()) +
        W2 * log1p(bucket.new_flows) +
        W3 * bucket.dht_flows +
        W4 * bucket.utp_flows +
        W5 * bidirectional_ratio(bucket)

Use sliding windows, not unbounded counters. For example:

short window:  10 seconds
medium window: 2 minutes
long window:   30 minutes

The short window catches bursts. The medium window catches swarm behavior. The long window prevents a host from being forgotten too quickly, but it should use approximate data structures so memory does not grow with every peer.

Feature examples

Fan-out

A web browser opens many connections too, so fan-out alone is not enough. But the shape is different. Browser fan-out usually clusters around CDNs, a small number of domains, and common service ports. P2P fan-out often touches many unrelated IPs and ports.

Define:

$$ F_{\mathrm{remote}} = \log(1 + N_{\mathrm{remote}}) $$

where \(N_{\mathrm{remote}}\) is the number of distinct remote IPs in a time window.

Use logarithms because the difference between 2 and 20 peers is more important than the difference between 202 and 220 peers.

Directional symmetry

P2P flows often have meaningful traffic in both directions. A simple symmetry feature is:

$$ R = \frac{\min(B_{\mathrm{up}}, B_{\mathrm{down}})}{\max(B_{\mathrm{up}}, B_{\mathrm{down}}) + 1} $$

where \(B_{\mathrm{up}}\) and \(B_{\mathrm{down}}\) are bytes by direction. This value is near 0 for mostly one-way flows and near 1 for symmetric flows.

This is not a BitTorrent detector by itself. Video calls and tunnels can also be symmetric. It is one feature.

Packet-size mixture

Many encrypted protocols have high-entropy payload. Packet sizes still leak structure. A BitTorrent-like flow often has a mixture of:

  • small control packets
  • medium ACK/control packets
  • larger data packets

A cheap feature:

$$ M = \frac{N_{\mathrm{small}}}{N_{\mathrm{total}}} + \frac{N_{\mathrm{large}}}{N_{\mathrm{total}}} $$

where small and large are local thresholds chosen from measurement. Do not hard-code universal constants and expect them to survive every access network.

Discovery near transfer

The strongest signal is correlation.

If a host emits DHT get_peers or announce_peer messages, then soon opens many encrypted peer-like flows, the combined signal is stronger than either signal alone:

$$ C = S_{\mathrm{dht}}(t - \Delta, t) \cdot F_{\mathrm{newflows}}(t, t + \Delta) $$

The window \(\Delta\) might be 10 to 120 seconds depending on how quickly the network should react.

Avoiding false positives

False positives are expensive. Mislabeling a backup agent, game launcher, software update system, VPN, WebRTC application, or private sync tool as P2P is worse than missing one weak P2P flow.

Use conservative rules:

  1. Never classify encrypted payload from one packet.
  2. Require multiple independent feature families.
  3. Use host context only inside a time window.
  4. Decay old evidence.
  5. Keep an unknown state.
  6. Log feature reasons, not just final labels.

A good decision record looks like:

flow=10.0.0.5:49152 -> 198.51.100.9:51413/tcp
label=p2p_likely
reasons:
  host_remote_fanout=high
  nearby_dht_get_peers=true
  bidirectional_ratio=0.62
  small_large_packet_mix=true
  flow_duration=long

That is debuggable. A label with no reasons is not.

Performance model

The packet path has a brutal budget. At high packet rates, even a few cache misses per packet matter.

Model the expected cost as:

$$ \begin{aligned} E[C] ={}& C_{\mathrm{base}} \\ &{}+ p_{\mathrm{new}} C_{\mathrm{flow}} \\ &{}+ p_{\mathrm{deep}} C_{\mathrm{deep}} \\ &{}+ p_{\mathrm{score}} C_{\mathrm{score}} \end{aligned} $$

where:

  • \(C_{\mathrm{base}}\) is work done for every packet
  • \(C_{\mathrm{flow}}\) is work done for new or uncached flows
  • \(C_{\mathrm{deep}}\) is bounded parsing work
  • \(C_{\mathrm{score}}\) is scoring and host-model work
  • \(p_{\mathrm{new}}\), \(p_{\mathrm{deep}}\), and \(p_{\mathrm{score}}\) are probabilities that the packet reaches each stage

The design target is:

$$ C_{\mathrm{base}} \ll C_{\mathrm{flow}} \ll C_{\mathrm{deep}} $$

and:

$$ p_{\mathrm{deep}} \ll p_{\mathrm{new}} \ll 1 $$

In plain English: the expensive parser should almost never run.

The normal-packet fast path

Most packets are boring. Keep them boring.

The normal fast path should be:

parse fixed headers
compute flow hash
lookup flow state
if known normal: pass
if known p2p: apply cached action
if unknown but budget exhausted: pass or sample
otherwise update a few counters

Do not:

  • parse every payload
  • allocate memory per packet
  • lock a global table per packet
  • run regex on payload
  • run a model that touches cold memory
  • update large host structures for every packet

Normal packets should pay only the base tax:

L3/L4 parse + hash + cache lookup + branch

If a flow is known normal, the classifier should not inspect the 200th packet differently from the 20th packet.

Parallel implementation

The classifier parallelizes well if ownership is clear.

Shard by flow hash:

worker_id = hash(five_tuple) % worker_count

Each worker owns:

  • its flow table shard
  • its recent-flow LRU
  • packet counters
  • temporary parser buffers
  • local host-feature deltas

Avoid shared writes on the hot path. Periodically merge host deltas into a global host table:

every 100 ms:
    for each worker:
        drain local host deltas
    merge into global host windows
    publish read-only snapshot

Workers classify packets using the newest read-only host snapshot. That snapshot can be slightly stale. A stale host score is better than a lock on every packet.

For very high speed paths, split the work:

fast path:
  per-packet cache lookup
  fixed header parsing
  counters

slow path:
  first packets only
  bounded protocol parsers
  host correlation
  model updates

The fast path can run in kernel space, XDP, eBPF, DPDK, or a router fast path. The slow path can run in user space. The point is not the exact framework. The point is that packet forwarding should not wait for model evaluation.

Memory layout matters

A flow table entry should be small. Avoid storing full packet history.

A practical entry:

struct FlowState {
    five_tuple key
    uint32 first_seen
    uint32 last_seen
    uint16 packet_count_c2s
    uint16 packet_count_s2c
    uint64 bytes_c2s
    uint64 bytes_s2c
    int16  score
    uint8  label
    uint8  budget
    uint8  flags
}

Do not keep an array of all packet sizes unless the link is small and memory is abundant. Use streaming statistics:

count
sum
sum_of_squares
min
max
small_count
large_count
last_timestamp
burst_count

For host fan-out, exact sets can become expensive. Use approximate structures when needed:

  • HyperLogLog for distinct remote counts
  • count-min sketch for heavy peers
  • Bloom filter for recent remote IP membership
  • ring buffers for sliding windows

Approximation is fine because the final output is already probabilistic.

Inspection budgets

Every flow should have a budget:

max_payload_bytes_to_scan = 256
max_packets_to_deep_parse = 8
max_flow_age_for_deep_parse = 10 seconds

After that, only counters are updated. If the detector failed to find strong evidence early, it should not punish every later packet.

This is especially important for normal encrypted traffic. Long TLS downloads, video streams, VPN sessions, and software updates can be large. A classifier that keeps inspecting them will become the bottleneck.

What encrypted detection cannot promise

Encrypted P2P detection is inherently probabilistic. There is no universal packet pattern that proves BitTorrent when payload and peer discovery are both hidden.

The honest goals are:

  • identify likely P2P hosts
  • classify obvious plaintext or partially visible BitTorrent quickly
  • correlate DHT/uTP/peer-like behavior
  • avoid harming normal traffic
  • explain why a flow was classified

The dishonest goal is:

detect all encrypted BitTorrent from a single packet

That is not DPI. That is wishful thinking.

Practical policy

I would implement the detector like this:

  1. Exact match plaintext BitTorrent handshake when visible.
  2. Add bounded recognizers for DHT KRPC and uTP structure.
  3. Keep per-flow statistics for only the first few packets and streaming counters after that.
  4. Keep per-host fan-out and discovery windows.
  5. Score with conservative thresholds.
  6. Cache final labels per flow.
  7. Send uncertain cases to telemetry instead of forcing a label.
  8. Continuously measure false positives against known normal applications.

The performance principle is simple:

Spend CPU only when uncertainty is high and the packet is early enough to matter.

The theory principle is just as important:

Encrypted traffic classification is evidence aggregation, not payload recognition.

If those two principles are respected, DPI for encrypted P2P can be useful without turning the router into a slow and fragile packet microscope.

References