From: "Probabilistic Graphical Models" by Daphne Koller and Nir Friedman
Business decisions are made with incomplete information. Will the product launch succeed? Is this customer about to churn? Will the supply chain disruption last 2 weeks or 2 months?
Traditional models give you point estimates: "Expected revenue: $500K." But what you really need is probability distributions: "60% chance between $400K-$600K, 20% chance below $400K, 20% chance above $600K."
Bayesian networks provide that framework.
A Bayesian network is a directed acyclic graph (DAG) where:
Variables:
Dependencies:
The network encodes: users who log in frequently tend to use more features, and high feature usage reduces churn probability. Support tickets also directly influence churn risk.
Not all variables are observed. Maybe you know a customer has high support tickets but don't know their feature usage. The network can infer likely feature usage patterns based on observed variables and historical correlations.
Edges can represent causal relationships, not just correlations. If you know "rain causes wet ground" (not the other way around), the network structure reflects reality. This matters for intervention analysis: "If we increase login frequency, will churn decrease?"
As new evidence arrives, the network updates all probability distributions. You learn a customer opened 5 support tickets today—instantly, their churn probability updates based on the conditional probabilities encoded in the network.
Given observed variables (evidence), compute probability distributions over unobserved variables. Two main types:
P(Churn | LoginFrequency=Low) — If we observe low logins, what's the probability of churn? This flows forward through the network.
P(FeatureUsage | Churn=Yes) — If a customer churned, what was their likely feature usage? This flows backward, reasoning from effects to causes.
"Bayesian networks make the implicit explicit. Your mental model of how things relate is now a queryable, testable structure."
Symptoms → Diseases → Test Results. Given observed symptoms, the network suggests likely diagnoses and recommends tests that would most reduce uncertainty.
Transaction patterns, account history, geolocation, device fingerprints—all feeding into fraud probability. As each new signal arrives, fraud score updates in real-time.
Project delays depend on: supplier reliability, team experience, technical complexity, external dependencies. Model these relationships, then ask: "If Supplier A delays shipment, what's the probability we miss the launch date?"
What factors influence your outcome of interest? Start broad, then prune. Too many variables make the network intractable; too few miss important dependencies.
Which variables directly influence others? Draw edges. Avoid cycles—if A causes B and B causes C, don't add C → A (that's feedback, requires dynamic Bayesian networks).
For each variable, define: P(Variable | Parents). This requires data, expert judgment, or both. Start with rough estimates, refine with learning algorithms as data accumulates.
Test the network on known scenarios. Does P(Outcome | Evidence) match reality? Are predictions calibrated? Adjust structure and probabilities iteratively.
Bayesian networks don't eliminate uncertainty—they quantify it. Instead of guessing "this customer might churn," you say "72% probability of churn given current behavior." That's actionable.
Bayesian networks exploit conditional independence: variables that are independent given their parents. But if you miss a dependency, the model will give wrong answers. Validate your independence assumptions.
Estimating P(X | Y, Z) requires observing all combinations of Y and Z. With many variables, you need exponentially more data. Solution: impose structure (e.g., noisy-OR models) or use expert priors.
Edges suggest directionality, but correlation doesn't imply causation without further justification. Observational data can't distinguish "A causes B" from "B causes A" or "hidden C causes both." Causal inference requires careful reasoning or experimental data.
Variable elimination, junction tree algorithms. Guaranteed correct probabilities but computationally expensive for large networks.
Monte Carlo sampling, loopy belief propagation. Faster but probabilistic answers are estimates. Good enough for most practical applications.
Standard Bayesian networks are static snapshots. Dynamic Bayesian Networks (DBNs) model how variables evolve over time. Think Kalman filters or Hidden Markov Models, but more expressive.
Machine state at time t depends on state at t-1 and sensor readings. As new sensor data streams in, the DBN continuously updates failure probability. Maintenance is triggered when probability exceeds threshold.
✅ Good fit when:
❌ Consider alternatives when: