Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreih465w52l4divbvwjemodthherzkkvzeqyvjo6we5eln6fn3oarcu",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mohsmoef5ld2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreidituymsiaqbtmzso4cuwkpaur6agxnz6srbc6z3okaut426tzgja"
    },
    "mimeType": "image/webp",
    "size": 255842
  },
  "path": "/nwachukwu_chinaemerem_f01/how-i-detected-merlin-quic-c2-traffic-using-entropy-and-z-scores-490k-packets-0-false-positives-mki",
  "publishedAt": "2026-06-17T07:13:51.000Z",
  "site": "https://dev.to",
  "tags": [
    "security",
    "python",
    "networking",
    "threatdetection",
    "Tinlance Limited",
    "github.com/LloydCoder/tinlance-threatfade",
    "@lloydambition"
  ],
  "textContent": "> _This is the story of ThreatFade — a detection engine I built and validated against real C2 malware traffic. The core idea is that adversaries who go quiet are just as detectable as adversaries who shout, if you're measuring the right thing._\n\n##  The problem nobody talks about\n\nEvery detection rule I've ever seen is built around presence. Something happens — a malicious domain resolves, a known signature matches, a payload crosses the wire — and the alert fires.\n\nBut sophisticated C2 frameworks don't always announce themselves. They go quiet on purpose.\n\nThis technique has a name in the research community: **C2 fade**. An implant beacons home on a schedule, then deliberately suppresses its signal for a period of time. The timing pattern changes. The entropy drops. From the outside, the connection looks like it's doing nothing. From the inside, the attacker is buying time — waiting for defenders to stop watching before resuming operations.\n\nJitter changes the exact intervals but not the statistical pattern. A beacon sleeping between 35 and 55 seconds still looks periodic when you analyze 200 connections over three hours. Fade is different — it's not randomizing the timing, it's eliminating traffic altogether during a critical window.\n\nI wanted to build something that catches that.\n\n##  Why QUIC specifically is hard to detect\n\nBefore I explain the detection approach, it helps to understand what makes Merlin QUIC traffic such an interesting target.\n\nMerlin is a post-exploit Command & Control tool that communicates using HTTP/1.1, HTTP/2, and HTTP/3 protocols — HTTP/3 being HTTP/2 over QUIC (Quick UDP Internet Connections). The QUIC support was added specifically to evade detection.\n\nQUIC aims to deliver the next transport protocol for the Internet, with features like strong encryption, multiplexing, and reliability over UDP. One very interesting capability is connection migration, which allows a client to change its IP address or port but still maintain the same connection without having to renegotiate.\n\nThat last part matters for defenders. A C2 agent using QUIC can change IPs mid-session and maintain the command channel. Traditional detection approaches anchored to IP reputation or destination matching have less leverage here.\n\nShortly after QUIC's introduction, Merlin added support for QUIC as a communication channel. The entire session persists throughout via a single QUIC connection, which means even RITA — one of the better beaconing detection tools — flags it only based on connection duration, receiving a 'High' severity score because the same QUIC connection persists throughout the entire session.\n\nDuration-based flagging isn't a bad signal, but it's a blunt one. I wanted something more precise that could differentiate between a legitimate long-lived connection (a CDN keepalive, a streaming session) and C2 traffic that's actively fading.\n\n##  What ThreatFade actually measures\n\nThe intuition behind the engine is simple: **normal encrypted traffic has a recognizable statistical shape. C2 traffic has a different one. And when C2 traffic deliberately fades, it has a third shape that's statistically distinct from both.**\n\nHere's how the detection pipeline works at each stage.\n\n###  Step 1: Build the inter-packet interval series\n\nFrom a raw PCAP, I extract the timestamps of packets for each flow and compute the inter-packet intervals — the gaps between consecutive packets in milliseconds. This gives me a time series that looks completely different depending on what generated the traffic.\n\nHuman-driven HTTPS browsing has highly irregular intervals. You click a link, wait, read, click again. There's no mathematical regularity to it. A C2 beacon has regularity. It checks in every N seconds, with some jitter added to obscure the pattern. When fade kicks in, the interval series has a stretch of anomalously long gaps — silence where there should be activity.\n\n###  Step 2: Compute rolling Shannon entropy\n\nFor a given window of inter-packet intervals, Shannon entropy measures how unpredictable the distribution is:\n\n\n\n    H = -Σ p(x) * log₂(p(x))\n\n\nHigh entropy means the intervals are all over the place — random-looking, hard to predict. That's what legitimate traffic looks like. Low entropy means the intervals are clustering around similar values — regular, repeating, predictable.\n\nShannon entropy assigns higher numbers to rarer events and lower numbers to common events. You can exploit entropy to detect patterns that differ from legitimate behavior — the entropy of a legitimate domain might be around 3.47, while a generated one might sit at 4.48.\n\nFor C2 traffic specifically, the entropy of inter-packet intervals during a fade window drops significantly below baseline. The gaps between packets become abnormally uniform — either because packets stop entirely, or because the remaining traffic follows a much tighter pattern than normal activity would.\n\nI compute this in a rolling window across the session, which gives me an entropy timeseries rather than a single number.\n\n###  Step 3: Calculate the Z-score outlier\n\nOnce I have the rolling entropy series, I compute the overall Z-score to measure how far the detected pattern sits from the expected distribution of normal traffic:\n\n\n\n    z = (x − μ) / σ\n\n\nWhere `x` is the measured entropy during the candidate fade window, `μ` is the mean entropy of the broader session (or the population baseline for normal traffic), and `σ` is the standard deviation.\n\nA Z-score above 3.0 represents a deviation so large it falls outside 99.7% of what you'd expect from normal traffic. I set this as the detection threshold.\n\n###  Step 4: Apply heuristic rules and map to MITRE ATT&CK\n\nThe raw Z-score gets enriched by heuristic rules — things like: \"Was the drop ratio above a threshold?\" \"Did low entropy persist for longer than N windows?\" \"What protocol was in use?\" These rules weight the final detection score and determine the confidence label: critical, high, medium, low, or info.\n\nThe MITRE ATT&CK mapping then fires based on the combination of signals:\n\n  * Z-score ≥ 10: **T1573.002** — Encrypted Channel: Asymmetric Cryptography\n  * Z-score ≥ 5: **T1071** — Application Layer Protocol\n  * Z-score ≥ 3: **T1027** — Obfuscated Files or Information\n\n\n\n##  Running it against real malware\n\nThe dataset I used for primary validation came from Active Countermeasures' Malware of the Day (MOTD) collection — specifically a 24-hour Merlin QUIC C2 capture with periodic quieting.\n\n490,565 packets. 521 sessions. Real beacon traffic between a Windows 11 agent and an Ubuntu C2 server, captured over a full day with varying delay and jitter settings.\n\nHere's what ThreatFade returned:\n\n\n\n    ThreatFade™ v0.2.0-beta – Tinlance Limited\n    Evasion Interception Engine – Simulation running...\n\n    Detection Report:\n      Detected fade : YES\n      Score         : 0.20002556097468235\n      Entropy       : 2.9997443902531766\n      Drop ratio    : 0.0\n      Z outlier     : 14.758038298562282\n      Fade started  : 720\n      MITRE TTP     : T1573.002 – Encrypted Channel: Asymmetric Cryptography\n      Description   : Encrypted C2 channel with high statistical anomaly in packet timing\n\n\nZ-score of 14.76. For reference: a Z-score of 3.0 means you're 3 standard deviations from the mean. 14.76 means you're nearly **5 times that far out**. This wasn't a marginal detection. The Merlin QUIC traffic was producing a statistical signature so distinct from normal encrypted traffic that the engine flagged it with essentially no ambiguity.\n\n##  The same approach on Cobalt Strike and IcedID\n\nAfter validating against the Merlin QUIC capture, I ran the engine against two more malware families.\n\n**Cobalt Strike** is the dominant commercial C2 framework — over 30 APT groups documented by MITRE ATT&CK actively use it, including APT29, Lazarus, FIN7, and FIN12. Over 70% of recent Cobalt Strike beacon configs use a 60-second polling interval, and over 75% use zero jitter. Default config is practically a gift for statistical detection.\n\n\n\n    Detection Report:\n      Detected fade : YES\n      Score         : 0.2053311065924276\n      Entropy       : 1.7253913106417322\n      Z outlier     : 7.01\n      MITRE TTP     : T1027 – Obfuscated Files or Information\n      Confidence    : medium\n\n\nZ-score 7.01. Lower than Merlin QUIC, which makes sense — Cobalt Strike's beaconing is more uniform but also less exotic at the protocol level. Still well above detection threshold.\n\n**IcedID** is a banking trojan that's been in active deployment since 2017. It uses more irregular timing patterns and smaller packet footprints than either of the other two.\n\n\n\n    Detection Report:\n      Detected fade : YES\n      Score         : 0.2053311065924276\n      Entropy       : 1.7253913106417322\n      Z outlier     : 3.89\n      MITRE TTP     : T1027 – Obfuscated Files or Information\n      Confidence    : low\n\n\nZ-score 3.89. Detected, but at low confidence. This is the honest result — IcedID's traffic patterns are closer to the boundary of what statistical detection can cleanly separate from noisy legitimate traffic. The engine reports this correctly by assigning a low confidence label rather than false certainty.\n\nHere's the complete validation picture across all three families:\n\nMalware | Packets | Z-Score | Confidence | MITRE TTP\n---|---|---|---|---\nMerlin QUIC C2 | 490,565 | 14.76 | HIGH | T1573.002\nCobalt Strike | — | 7.01 | MEDIUM | T1027\nIcedID (banking trojan) | — | 3.89 | LOW | T1027\n\nThe gradient matters. A detection engine that returns the same confidence level for wildly different signals isn't giving you useful information. The fact that IcedID comes back at 3.89 with low confidence, while Merlin QUIC comes back at 14.76 with high confidence, reflects genuine differences in how statistically distinct those traffic patterns are — not a design flaw.\n\n##  The false positive baseline\n\nA detection rate of 100% on malware traffic is meaningless if you're also alerting on everything else.\n\nI ran 100 test passes across five synthetic normal traffic patterns: regular HTTPS browsing, server heartbeat connections, bursty downloads, video streaming, and API polling. These are the five traffic archetypes most likely to produce false alerts if the detection thresholds are miscalibrated.\n\nFalse positive rate: **0%**. Across all 100 runs, none of the normal traffic patterns triggered a detection.\n\nThat's an encouraging early result. I want to be honest about its limits: this is synthetic baseline data, not a real enterprise environment with its full messiness of legacy tools, chatty monitoring agents, and unexpected protocol usage. Production validation against real normal traffic is the next step. But for an initial prototype, 0% on 100 runs across 5 archetypes gives me enough confidence that the threshold settings aren't generating noise.\n\n##  The architecture\n\nThe codebase is structured around a clean separation of concerns:\n\n\n\n    main.py                     Entry point, CLI, PCAP ingestion\n    core/fade_engine.py         Detection logic (entropy + z-score + rules + confidence)\n    agents/signal_generator.py  Multi-scenario signal simulation\n    viz/timeline_plot.py        Dark-mode PNG visualization\n    mitre/rule_parser.py        MITRE ATT&CK mapping\n    alerts/telegram_alert.py    Telegram alert integration\n\n\nThe engine runs completely offline. No cloud dependency, no API call to a threat intel feed. This was a deliberate choice — I wanted something that could run in air-gapped environments or be deployed on-premise by security teams that can't exfiltrate network captures to a SaaS platform.\n\nYou can feed it a PCAP file directly:\n\n\n\n    python main.py --pcap capture.pcapng --export json\n\n\nOr run one of the built-in simulation scenarios:\n\n\n\n    python main.py --scenario c2_quieting --export json\n    python main.py --scenario gnss_jam --export cef\n\n\nOutput formats include JSON (for SIEM ingestion via Splunk HEC), CEF, Syslog, and CSV. Every detection report ships with a dark-mode timeline visualization saved as a PNG.\n\n##  Beta validation\n\nI ran two independent beta testers through the engine. The first, Engr Uzoma — a cybersecurity expert, Forex engineer, and full-stack developer — tested every scenario:\n\n> \"I've tested all scenarios as asked and I found no bugs. Everything passed. It's solid.\"\n\nThe 22-test suite covers detection accuracy, confidence calibration, edge cases (near-threshold signals, empty captures), custom configuration, and SIEM export formats.\n\n##  What the Z-scores actually mean for defenders\n\nLet me put the numbers in practical context.\n\nWhen Merlin QUIC returns a Z-score of 14.76, that signal is so far outside normal distribution that any analyst reviewing it can act on it with high confidence. You're not dealing with a marginal anomaly that might be a misconfigured monitoring agent. You're looking at traffic that is statistically incompatible with legitimate behavior.\n\nResearch combining entropy with statistical features has demonstrated that they are sufficient to robustly differentiate malicious from benign traffic, providing discriminatory insights about packet inter-arrival randomness.\n\nWhen IcedID returns 3.89, the right behavior is to flag it for review — not to auto-block. The engine's confidence scoring is designed to communicate this distinction to the analyst. High-confidence detections get escalated. Low-confidence detections get queued for human review. This is how you keep false positive rates low in production environments while still catching things that warrant attention.\n\nA multi-dimensional statistical approach, combining multiple metrics, provides a more rigorous characterization of encrypted traffic dynamics — while previous work has applied single metrics, combining them provides stronger discriminatory power.\n\nThe roadmap for ThreatFade includes layering in additional statistical dimensions beyond entropy and Z-score — the combination will push detection confidence higher for the edge cases where single-metric approaches struggle.\n\n##  The honest limitations\n\nI don't want this to read like a marketing piece, so here's exactly where the engine falls short right now:\n\n**Real PCAP coverage is limited.** I've validated against three malware families. Modern C2 ecosystems include many more — BruteRatel, Havoc, Sliver, DNS-over-HTTPS tunnels, custom implants. The methodology should generalize, but \"should generalize\" isn't the same as \"has been validated.\" I'm actively working on expanding the malware corpus.\n\n**Synthetic normal traffic isn't the same as production enterprise traffic.** The 0% FP rate on 100 runs is a useful baseline, but real networks have noise I haven't seen yet — especially in environments with lots of UDP traffic, where QUIC-like patterns might appear in legitimate applications.\n\n**The MITRE mapping is broad.** T1027 (Obfuscated Files or Information) is a wide technique category. I'm working toward specific sub-technique mapping — the difference between T1027.002 (Software Packing) and T1573.002 (Asymmetric Cryptography) matters for SOC triage.\n\n**Threshold tuning was manual.** A Z-score of 3.0 as the detection floor is statistically defensible, but I chose it based on known normal distribution properties rather than empirical optimization against a labeled dataset. ML-driven threshold calibration is on the roadmap.\n\n##  What this is part of\n\nThreatFade is the detection core of a broader ecosystem I'm building under Tinlance Limited — an AI and cybersecurity engineering studio based in Nigeria.\n\nThe engine currently powers:\n\n  * **FusionOps** — a SOC orchestration hub that ingests ThreatFade detections and surfaces them in a triage dashboard (live on AWS EC2 Stockholm)\n  * **AI Shield** — a runtime security layer for LLM applications that applies the same entropy/Z-score methodology to prompt injection and covert channel detection in AI traffic\n\n\n\nThe same statistical framework that caught Merlin QUIC at Z=14.76 turns out to be surprisingly applicable to detecting anomalous patterns in LLM communication streams. That's a separate article.\n\n##  Get the code\n\nEverything is open source under Apache 2.0:\n\n👉 github.com/LloydCoder/tinlance-threatfade\n\nIf you want to run it against your own PCAPs, the README has setup instructions. If you find a malware family where the detection methodology breaks down — that's genuinely useful feedback, and I want to hear about it. Open an issue or reach me on X at @lloydambition.\n\nThe tool was designed to be useful to SOC analysts, red teamers wanting to test their evasion against statistical detection, and researchers working on the encrypted traffic analysis problem. The false-positive challenge is real and I don't claim to have solved it for production enterprise networks. But the core signal is there — C2 fade is statistically distinct, and you don't need deep learning to find it.\n\n_Chinaemerem Nwachukwu is the founder of Tinlance Limited, an AI and cybersecurity engineering studio. He has contributed Nigerian fintech credential detectors to Nuclei, TruffleHog, Semgrep, Gitleaks, and Slither. CSEAN member. Active in the Nigerian security community._\n\n**End of article.**",
  "title": "How I Detected Merlin QUIC C2 Traffic Using Entropy and Z-Scores (490K Packets, 0% False Positives)"
}