A complete technical reference for the Guardian detection pipeline — architecture choices, training data, feature vectors, anomaly thresholds, and inference design.
Guardian runs two independent anomaly detection models at different levels of abstraction. They catch entirely different attack classes and are deliberately designed not to overlap.
Fires the instant a new executable lands on disk. Encodes 17 structural features from the PE header and compares reconstruction error against a global baseline trained on 10,233 legitimate Windows programs. A file whose structure cannot be well-reconstructed is flagged before it ever runs.
Runs continuously in the background. Aggregates 12 Windows ETW signals over rolling 5-minute windows and compares each window against a personal baseline specific to your machine. Any process driving behavior your machine has never produced before is flagged live.
A polymorphic file that rewrites its PE structure to pass Layer 1a still leaves a behavioral fingerprint that Layer 1b detects. A script-based attack with no PE on disk is caught by Layer 1b alone. The two models cover the full attack chain together.
Both models are trained entirely on benign data. They learn what normal looks like — not what malware looks like. Any structural or behavioral deviation produces high reconstruction error, whether or not it has ever been seen before. Zero-day coverage is inherent.
Every PE file is analyzed structurally before a single line of it executes. Reconstruction error is the anomaly score.
The model encodes the file’s structural fingerprint down to an 8-dimensional bottleneck, then reconstructs all 17 original features. The mean squared error (MSE) between the input and reconstruction is the anomaly score — low means the file looks like normal software, high means its structure is unlike anything in the benign training distribution. The symmetric encoder–decoder forces the network to genuinely learn a compressed representation. All hidden layers use ReLU; the output layer uses linear activation so reconstruction is unconstrained. Loss: MSE.
All features are extracted from the PE binary before execution. No runtime behavior is involved at this layer.
| Feature | Description |
|---|---|
| file_size | Raw byte count of the PE file |
| overall_entropy | Shannon entropy of the full file |
| num_sections | Count of PE sections in the header |
| max_section_entropy | Entropy of the highest-entropy section |
| mean_section_entropy | Mean entropy across all sections |
| section_entropy_std | Std deviation of section entropies |
| num_imports | Count of imported functions |
| num_import_dlls | Distinct DLLs imported |
| num_exports | Exported symbols (non-zero in DLLs) |
| Feature | Description |
|---|---|
| compile_timestamp | PE header timestamp — 0 or future = anomaly |
| is_dll | 1 if IMAGE_FILE_DLL flag is set |
| has_debug | 1 if a debug directory is present |
| is_signed | 1 if valid Authenticode signature present |
| is_packed_heuristic | 1 if entropy > 7.0 or non-standard sections |
| file_ext_enc | Encoded extension (.exe / .dll / .sys / other) |
| architecture_enc | Target machine type (x64 / x86 / ARM64) |
| subsystem | PE subsystem (GUI / console / driver / native) |
section_entropy_std ≈ 0). A single high-entropy section in an otherwise normal PE is consistent with a packed payload inside a dropper.
Watches what your computer does in aggregate every 5 minutes. Flags any window that deviates from your personal behavioral normal.
Every 5 minutes, Guardian collapses all system event activity into 12 normalized rate features from Windows Event Tracing (ETW). The behavioral autoencoder tries to reconstruct each new window from what it learned on thousands of benign windows. A high reconstruction error means the window looks like nothing in the benign distribution — it is behaviorally anomalous.
Unlike Layer 1a, Layer 1b is personal. Before monitoring activates, Guardian collects hundreds of 5-minute windows from your own normal usage and fine-tunes the effective threshold for your specific machine. An engineer running WMI-heavy build tooling will see those patterns learned as normal, eliminating false positives that a generic model would generate.
Each feature is an event count divided by window duration (events/sec). All 12 are clipped at the 99th percentile then scaled to [0, 1] per feature using MinMax normalization to prevent rare bursts from dominating the anomaly score.
| Feature | ETW Source | What It Captures |
|---|---|---|
| wmi_query_rate | WMI-Activity / 5857, 5858 | Rate of WMI queries. Malware heavily abuses WMI for reconnaissance, lateral movement, and execution. |
| task_register_rate | TaskScheduler / 106, 107 | Rate of scheduled task registrations. Attackers register persistence tasks; high rates outside maintenance windows are suspicious. |
| task_run_rate | TaskScheduler / 200, 201, 202 | Rate of task executions. A spike after a registration event strongly suggests a newly placed persistence mechanism firing. |
| bits_event_rate | Bits-Client / any EventID | BITS is commonly abused for stealthy C2 downloads via system-trusted network channels. |
| fw_rule_change_rate | Firewall / 2097, 2052, 2002 | Malware modifies firewall rules to open inbound ports or allow C2 traffic to pass through. |
| rdp_session_rate | TermSvc-LocalSessionMgr | Expected near-zero on single-user workstations. Non-zero on machines without RDP enabled is always anomalous. |
| smb_access_rate | SMBClient / any EventID | Near-zero on isolated workstations. Elevated rates suggest lateral movement scanning or unexpected share access. |
| error_event_rate | EventLog / Level = Error, Critical | Rate of error and critical events system-wide. Malware causes cascading errors through injection failures or service crashes. |
| total_event_rate | All ETW channels | Overall system event rate. Captures gross activity spikes that do not fit any specific category. |
| unique_channel_count | Distinct provider channels | Number of distinct event channels fired in the window. Unusual proliferation suggests broad system manipulation. |
| group_policy_rate | GroupPolicy / any EventID | Legitimate GP activity is periodic and predictable. Manipulation attempts generate irregular bursts. |
| shell_core_rate | Shell-Core / any EventID | Shell lifecycle events. Naturally sparse; spikes correlate with software installation activity. |
preprocessor_config.json alongside scaler parameters so the on-device ONNX runtime applies identical preprocessing without needing a scikit-learn object.
Both models are exported to ONNX and run entirely on your machine via the ONNX Runtime library. No internet connection required — no data ever leaves the device.
A PE file identified by the monitor thread is parsed in Rust to extract 17 structural features. The vector is StandardScaler-normalized, then fed forward through the autoencoder. If the MSE between input and reconstruction exceeds the stored threshold (0.163211), the file is handed to the alert pipeline.
The ETW recorder accumulates event counts in a rolling in-memory buffer. At each 5-minute boundary, counts are converted to 12 rate features, preprocessing is applied (clip → MinMax), and the vector is passed through the behavioral autoencoder. An MSE above 0.000507 generates a behavioral window alert.
PyTorch and TensorFlow models require a full Python runtime to embed. ONNX Runtime ships as a single native library with Rust bindings (ort crate), adding roughly 30 MB to the binary. Inference latency on both models is sub-millisecond on modern hardware.
Each model ships alongside a model_meta.json recording the model name, layer, threshold value, preprocessing parameters, and training date. Updates are applied atomically: the new ONNX file is written to a staging path, validated, and renamed into place — the monitor thread never reads a half-written model.
Not all anomalies are equally urgent. Alert generation is deferred until the process knowledge base (1,000+ known-good processes) has been consulted — a high-MSE score on a process with a known-benign signature is suppressed or downgraded before it reaches the UI.
The behavioral model learns what normal looks like specifically on your machine before monitoring begins.
The model shipped with Guardian was pre-trained on thousands of benign 5-minute ETW windows from real machines. On installation, Guardian begins recording your machine’s own behavioral windows in the same 12-dimensional feature space. As your personal baseline accumulates — Guardian targets several hundred windows across varied sessions — these windows are used to fine-tune the anomaly threshold for your specific machine.
A developer running WMI-heavy build tooling will naturally generate wmi_query_rate values the global model scores as unusual. Once enough of those windows are in the personal baseline, the effective threshold for that feature is recalibrated upward, eliminating false positives from legitimate tools without any action required from the user.
shell_core_rate, task_run_rate, and total_event_rate while Windows initializes startup tasks. Guardian detects post-boot windows (gap > 15 min from previous) and excludes them from false-positive rate calculations.
Open an issue or reach out through the support page.