VanatorX | Adversary Emulation & Detection Engineering Platform

Detection is an inference game. If your data feed stutters or lies, the game is unwinnable. The most expensive detection rule is the one fed by missing or malformed logs.

A Taxonomy of Log Collection Failures:#

Source darkouts: agents crash, tokens expire, or a VLAN blocks egress—no events at all.
Latency and jitter: events arrive minutes late; correlation windows miss.
Timestamp drift: NTP skew scrambles ordering and creates false negatives.
Parse failures: vendor format changes or unhandled variants zero out key fields.
Cost‑driven sampling: intentional gaps erase exactly the low‑frequency, high‑value signals.

Symptoms in the SOC:#

“It fired yesterday” with identical inputs today—except the log came 90s late.
Duplicate events collapse; the unique one carrying the decisive field didn’t parse.
Analysts pivot to an entity and realize last‑seen timestamps are days old.

Quick Diagnostics You Can Automate:#

Time‑since‑last‑log: alert if a critical source is silent beyond an SLO (e.g., 5 minutes).
On‑time rate: % of events arriving within an acceptable latency budget.
Parse success: % of logs that populate decisive fields per source and version.
Field completeness: distribution of non‑null values for entities you correlate on.

Minimum Viable SLOs:#

Critical source on‑time ≥ 99% within 60s; parse success ≥ 98%.
Clock skew < 250ms across collectors and aggregators.
Backlog depth < 1 minute during peak hours.

Architectural Fixes That Actually Work:#

Agents as code: declarative rollout, health probes, and auto‑remediation.
Buffered ingestion: local queues; broker tiers to ride through bursts.
Schema evolution: versioned contracts and compatibility tests for parsers.
Tiered routing: send high‑value slices to hot storage; archive the rest cheaply.

1) Establish health dashboards tied to paging for darkouts and parse drops. 2) Run “tracer bullets” (synthetic events) every 10–15 minutes; verify end‑to‑end. 3) Add chaos drills: inject delay and drop on a canary path; measure detection resilience. 4) Quarterly parser fire drills: flip optional fields and validate survivability.

Case Sketch: HealthCorp discovered 5% of critical servers were dark due to agent drift and blocked VLANs. Coverage for lateral movement was effectively 0% on those nodes. After instrumenting health checks, buffered pipelines, and tiered routing, blind spots closed and audit trails improved.

Where tools can help: Any stack can implement synthetic tracing and chaos. If you prefer an integrated harness, VanatorX includes them—but the principles stand on their own.

← Back to Blog

Log Collection Issues and Their Impact on Detection: Common Pitfalls and Fixes

A Taxonomy of Log Collection Failures:#

Symptoms in the SOC:#

Quick Diagnostics You Can Automate:#

Minimum Viable SLOs:#

Architectural Fixes That Actually Work:#

Operational Playbook:#