Back to Frameworks

VaryOn Cascade

/ Systemic Impact

Ecosystem Layer

Quantifying systemic fragility in interconnected AI agent networks through Monte Carlo simulation

5Dimensions
Monte CarloAggregation
BatchProcessing
0-100Score Scale

Purpose

Cascade quantifies systemic fragility in interconnected AI agent networks through five dimensions: Algebraic Connectivity via spectral analysis, Cascade Probability via Monte Carlo simulation, Behavioral Correlation via excess-over-chance detection, Recovery Time via mean-time-to-restore analysis, and Concentration via Herfindahl-Hirschman Index.

The system identifies when a single agent failure can propagate through the network, potentially affecting 87% of downstream decision-making within 4 hours. By combining network topology analysis with behavioral correlation detection and infrastructure concentration measurement, Cascade provides the first comprehensive systemic risk assessment for AI agent ecosystems.

Using pre-computed blast radius maps and GPU-accelerated Monte Carlo simulations, Cascade enables real-time risk monitoring while providing regulatory compliance with EU AI Act Article 15 and US Executive Order 14110. The framework delivers actionable insights for network architects to prevent cascading failures before they occur.

Core Formula

CASCADE=100×(Iaα×Cpβ×Bcγ×Rtδ×Kε)1α+β+γ+δ+ε\text{CASCADE} = 100 \times \left( I_a^{\alpha} \times C_p^{\beta} \times B_c^{\gamma} \times R_t^{\delta} \times K^{\varepsilon} \right)^{\frac{1}{\alpha+\beta+\gamma+\delta+\varepsilon}}

Where I_a = algebraic connectivity, C_p = cascade probability, B_c = behavioral correlation, R_t = recovery time, K = concentration risk, with weights α=0.15, β=0.30, γ=0.20, δ=0.15, ε=0.20.

Aggregation Rationale

The weighted geometric mean ensures multiplicative compounding of independent failure channels. No single healthy dimension can compensate for a critically fragile dimension, reflecting the reality that systemic risk emerges from the weakest link in the network.

Cascade Probability (C_p) receives the highest weight (30%) as it directly measures the likelihood of failure propagation - the core systemic risk. This dimension, computed through Monte Carlo simulation, captures complex network dynamics that simple topology metrics miss.

The multiplicative structure means that a network with perfect connectivity but high behavioral correlation (hidden dependencies) will still show high systemic risk. This non-compensatory property is essential for capturing true fragility in interconnected systems.

Scoring Dimensions

1

Algebraic Connectivity

15%

Spectral robustness via Fiedler value - measures network partition vulnerability through graph Laplacian analysis.

Ia=λ2/λ2maxI_a = \lambda_2 / \lambda_2^{max}

Where λ₂ = second smallest eigenvalue of graph Laplacian, measuring network cohesion.

  • λ₂ = 0 indicates disconnected network (critical fragility)
  • Higher values indicate stronger network cohesion
  • Polynomial-time computable via Lanczos iteration
  • Incremental updates via first-order perturbation
  • Symmetrized for directed graphs: W_sym = (W + W^T)/2
2

Cascade Probability

30%

Monte Carlo simulation of failure propagation using Independent Cascade Model with concentration bounds.

Cp=1Nsimi=1Nsim1{cascadei>threshold}C_p = \frac{1}{N_{sim}} \sum_{i=1}^{N_{sim}} \mathbb{1}\{cascade_i > threshold\}

Probability that single agent failure cascades beyond threshold, with N_sim = 10,000 for 99.7% confidence.

  • Independent Cascade Model simulates failure propagation
  • Pre-computed blast radius maps enable O(1) runtime lookup
  • Hoeffding bound: P(|C_p_est - C_p_true| > ε) ≤ 2exp(-2N_sim·ε²)
  • GPU acceleration reduces 10K simulations to ~60 seconds
  • Incremental recomputation for topology changes
3

Behavioral Correlation

20%

Detects hidden dependencies through excess-over-chance stress response correlation analysis.

Bc=1{(i,j):ρij>ρrandom+2σ}pairsB_c = 1 - \frac{|\{(i,j) : \rho_{ij} > \rho_{random} + 2\sigma\}|}{|pairs|}

Fraction of agent pairs showing independent behavior under stress conditions.

  • Spearman correlation of stress responses between agents
  • Detects shared foundation models, infrastructure, training data
  • ρ > ρ_random + 2σ indicates concerning correlation
  • Invisible to topology analysis alone
  • Sliding window update for streaming telemetry
4

Recovery Time

15%

Mean-time-to-restore assessment with cascading complexity modeling.

Rt=exp(λ×trecovery/tbaseline)R_t = \exp(-\lambda \times t_{recovery} / t_{baseline})

Exponential decay based on recovery time relative to SLA baseline.

  • Accounts for direct restart and downstream cleanup
  • State resynchronization complexity grows with depth
  • λ = ln(2)/t_baseline for half-life at SLA limit
  • Historical MTTR tracking with drift detection
  • Super-linear complexity with cascade depth
5

Concentration Risk

20%

Infrastructure concentration via Herfindahl-Hirschman Index across multiple layers.

K=1maxlayer(HHIlayer) where HHI=isi2K = 1 - \max_{layer}(HHI_{layer}) \text{ where } HHI = \sum_i s_i^2

Worst-layer dominance: single point of failure in any layer creates systemic risk.

  • Multi-layer analysis: models, cloud, embeddings, tools
  • HHI > 0.25 indicates high concentration (< 4 providers)
  • Worst-layer selection prevents gaming through diversification theater
  • Verified through latency fingerprinting and SSL analysis
  • K approaches 0 with single provider dominance

Tier System

Critical0-19
High Risk20-39
Elevated40-59
Moderate60-79
Contained80-100
51.2 / Elevated Risk

Production Tier: Assessment-Grade

Latency: ~80 seconds for 1000 agents (GPU-accelerated)

Gaming Resistance

Attack VectorDescriptionCountermeasure
Topology ConcealmentHiding agent dependencies to reduce apparent riskInfer from API patterns, transaction flows, and behavioral correlation
Artificial DiversificationCreating fake infrastructure diversity to improve concentration scoreVerify through latency fingerprinting, IP geolocation, SSL certificate analysis
Correlation MaskingAdding noise to hide behavioral coupling between agentsSpearman correlation robust to outliers; extended observation windows
Recovery TheaterFaking fast recovery without fixing root causeVerify actual functionality restoration, not just service restart
Simulation GamingOptimizing for specific Monte Carlo parametersRobustness envelope computation across parameter sweep

Edge Cases

Disconnected Networks

  • Compute per-component scores independently
  • Weight by component size (agent count)
  • Report as "fragmented network" with component breakdown
  • Algebraic connectivity = 0 triggers special handling

Complete Graphs

  • Algebraic connectivity saturates at maximum
  • Focus shifts to behavioral correlation and concentration
  • High cascade probability expected and acceptable
  • Emphasis on recovery and diversity dimensions

Single Infrastructure

  • K approaches 0, capping composite score
  • Triggers "critical concentration" alert
  • Recommend immediate diversification
  • Historical examples: AWS us-east-1 outages

Sparse Networks

  • Low cascade probability but high partition risk
  • Algebraic connectivity becomes dominant factor
  • Bridge nodes identified as critical points
  • Targeted redundancy recommendations

Worked Example

Dense Trading Network

Algebraic Connectivity (I_a)0.72
Well-connected topology, 500 agents
Cascade Probability (C_p)0.89
High propagation risk detected
Behavioral Correlation (B_c)0.45
Moderate hidden dependencies
Recovery Time (R_t)0.30
Slow due to position unwinding
Concentration Risk (K)0.65
Three major providers
Elevated Risk

A densely connected trading network of 500 agents shows high systemic risk. While topology is robust (I_a=0.72), the 89% cascade probability indicates that a single agent failure could trigger widespread contagion. Slow recovery times due to position unwinding complexities amplify the risk. Immediate intervention recommended: implement circuit breakers and increase infrastructure diversity.

Use Cases

Cascade could identify systemic fragility across 15 critical infrastructure networks where single agent failures can trigger cascading collapses affecting millions of downstream decisions.

$8.7TAt Risk
15Use Cases
45+Companies

Find cascade risks in yournetwork

Showing 15 of 15 use cases

Critical Systemic Risk

Networks where single failures can trigger market-wide collapse

High-Frequency Trading Networks

Financial Services

Networks of algorithmic trading agents where flash crashes can cascade through interconnected strategies in milliseconds

Systemic Risk:2010 Flash Crash: $1 trillion vanished in 36 minutes

DeFi Protocol Networks

Decentralized Finance

Interconnected lending protocols where liquidation cascades can trigger systemic collapse across the ecosystem

Systemic Risk:Terra/Luna collapse: $60B destroyed in 48 hours
Potential Users:

Global Supply Chain Networks

Logistics

AI-driven supply chain agents where disruptions cascade through just-in-time manufacturing networks

Systemic Risk:Suez Canal blockage: $400M/hour in delayed goods
Potential Users:

Multi-Region Cloud Orchestration

Cloud Computing

Interdependent cloud services where regional failures cascade through availability zones

Systemic Risk:AWS us-east-1: 30% of internet services affected

Real-Time Payment Networks

Payments

Instant payment systems where fraud or failures propagate before detection

Systemic Risk:FedNow processes $5T daily with sub-second finality
Potential Users:

High Contagion Potential

Systems with demonstrated cascade propagation affecting millions

Social Media Moderation Networks

Social Media

Cascading content decisions across platforms affecting billions of users

Systemic Risk:Coordinated deplatforming affects 3B+ users globally
Potential Users:

Mobility Network Orchestration

Transportation

Ride-sharing and delivery networks where surge pricing cascades affect entire cities

Systemic Risk:NYC surge cascade: 10x pricing in 15 minutes
Potential Users:

Smart Grid Management

Energy

Distributed energy systems where demand response cascades can trigger blackouts

Systemic Risk:Texas 2021: cascading failures, 246 deaths

Hospital Network Coordination

Healthcare

Medical AI systems where diagnostic errors cascade through referral networks

Systemic Risk:Misdiagnosis propagation affects treatment chains

Programmatic Ad Networks

Advertising

Real-time bidding systems where fraud cascades through the ecosystem

Systemic Risk:$35B annual ad fraud through cascade attacks

Emerging Cascade Risks

Emerging networks showing early signs of systemic fragility

Foundation Model Ecosystems

AI Infrastructure

Chains of fine-tuned models where errors compound through the stack

Systemic Risk:Model collapse: degradation across 5 generations
Potential Users:

Industrial IoT Systems

Manufacturing

Connected factory systems where sensor failures cascade through production

Systemic Risk:Single sensor failure can halt $10M/day production

Virtual Economy Networks

Gaming

In-game economies where currency crashes cascade across servers

Systemic Risk:EVE Online: $300K+ destroyed in virtual battles
Potential Users:

EdTech Learning Networks

Education

Adaptive learning systems where curriculum errors cascade through cohorts

Systemic Risk:Incorrect learning paths affect thousands of students

Climate Prediction Networks

Environmental

Interconnected climate models where errors cascade through forecasts

Systemic Risk:Cascade errors affect trillion-dollar climate policies