Evidence

Mathematically proven, validated in simulation, tested in the world

CF's support comes in three distinct kinds, and they aren't interchangeable. Some results are mathematically proven: exact, independent of any dataset. Some are validated in controlled simulation, where the true coupling is known by construction. And some are tested on real-world public data, where effects are smaller, noisier, and vary by domain. The framework holds these apart rather than summing them into one number. A proof, a simulation, and a correlation aren't the same kind of evidence. Every chart below is drawn from laboratory data.

Exact

Mathematically Proven

Closed-form identities and the complete Boolean manifold (all 16 two-input functions). Strength is proof, independent of sample size. STQA Class 10 / 7.

~34,000

Controlled Simulations

Synthetic systems where the true coupling is known by construction (SVF, HLM, three-layer, IC–PSIS, calibration). IC predicts MFVI failure at r = 0.86.

Real-World Datasets

Public data across domains: neuroimaging (ABIDE, HCP), linguistic iconicity, causal ladders. Effects are smaller and vary by domain; each carries its own evidential class.

The headline figures above are reported by kind, not pooled into one total. The full per-dataset inventory (with type and key result) appears at the foot of this page, and each domain's evidential status is tracked individually on the domain network.

The negative control: when factorization is adequate

A diagnostic that only raises alarms is useless. CF also has to correctly identify when factorization works, when a system really is just its parts and decomposing it loses nothing essential. That's validated directly:

IC < 0.25: no cost

In SVF simulations at low coupling, the MSE ratio is 1.0 within noise. Factorization is free. The mean-field approximation is essentially exact. CF correctly says: proceed, reductionism is adequate here.

Pooling at high IC: safe

In HLM (pooling archetype), high IC means data is plentiful and each group can estimate independently. CF correctly reverses its verdict: factorization succeeds when coupling signals redundancy rather than structural necessity.

Boolean constants: IC = 0

The constant functions (TRUE, FALSE) have IC₂ = 0, IC_int = 0, no relational structure at all. CF correctly reports: nothing to preserve, factorization discards nothing.

Negative controls are built into every validation study: the left edge of each chart (low coupling) confirms that CF doesn't raise false alarms. The framework's value isn't in universally detecting problems. It's in precisely distinguishing where relational structure is load-bearing from where it's safely ignorable.

Filtering regime: factorization cost grows with coupling

In the Single-Variate Factorization study (8,000 simulations), two correlated Gaussian variables are modeled with mean-field factorization (q(x)q(z) instead of q(x,z)). As coupling strength (κ) increases, the mean-field approximation deteriorates steadily. The mean MSE ratio climbs from 1.0 (factorization is free) to roughly 9.7 at the strongest coupling. The MSE ratio measures how much worse the factorized model performs compared to the oracle that respects the coupling.

8,000 simulations (400 replications × 20 coupling levels). Error bars: ±1 SD. MSE ratio = 1 means factorization costs nothing; higher values indicate increasing information loss.

Pooling regime: high IC means hierarchy is redundant

In Hierarchical Linear Models (8,000 simulations), the dependency archetype is pooling: relationships carry signal from shared structure. Here the IC direction reverses: high IC means data is reliable enough that each group can estimate well on its own, making the hierarchical pooling layer redundant. The MSE ratio (no-pooling / partial-pooling) drops as IC increases.

8,000 simulations (400 replications × 20 τ levels). The negative correlation (r = −0.88 aggregated) confirms the pooling archetype: high IC = hierarchy unnecessary. This is the opposite direction from filtering, demonstrating Dependency Asymmetry empirically.

Proximal dominance: the nearest layer wins

In deep three-level hierarchies (16,000 simulations), which layer's coupling matters more? The answer: the proximal layer (nearest to the data). Proximal IC₂₁ correlates with MSE ratio at r = 0.31; distal IC₃₂ at only r = 0.13, a 2.3× difference. The coupling closest to the observation point dominates inference quality regardless of distal structure.

16,000 simulations across all combinations of proximal and distal coupling. Solid: proximal IC (layer 2→1, closest to data). Dashed: distal IC (layer 3→2). The proximal layer's coupling dominates.

Boolean manifold: complete classification of 2-input functions

All 16 possible two-input Boolean functions are classified by their IC structure. The x-axis shows pairwise coupling (IC₂); the y-axis shows interaction strength (IC_int). The classification is exhaustive and algebraically exact: XOR sits at IC₂ = 0, IC_int = 1: pure higher-order structure, invisible to pairwise methods. AND/OR sit at IC₂ = 0.82, IC_int = 0.58: mixed structure visible to both pairwise and interaction methods.

Complete manifold of all 16 two-input Boolean functions (dot size = multiplicity at each position). Source: Walsh-Hadamard spectral decomposition. XOR/XNOR (green) have zero pairwise coupling but maximal interaction, the archetype of higher-order structure. 16 functions collapse to exactly 4 positions by IC structure.

SAT: relational frustration across the constraint range

In random 3-SAT instances (350 problems across 7 clause ratios from α = 2 to 5), spectral frustration (ρ_f), a measure of relational constraint density, rises to a maximum near α ≈ 3 and then falls as the formulas become over-constrained, while the satisfiable fraction declines steadily across the sampled range (from 1.0 toward 0.66 at α = 5). Frustration is highest in the intermediate regime where constraints compete most; the classic 3-SAT satisfiability threshold (α ≈ 4.27) lies within this window. (The sampled range does not extend far enough to show the satisfiable fraction reach zero.)

350 random 3-SAT instances (50 per α level). Solid: fraction satisfiable. Dashed: mean spectral frustration (ρ_f). Frustration peaks in the intermediate-constraint regime (α ≈ 3).

IC-PSIS convergence: coupling predicts model fit

IC was independently validated against Pareto-Smoothed Importance Sampling (PSIS), the standard Bayesian diagnostic for posterior approximation quality. Across 900 simulations with varied coupling strengths, IC predicts the log-likelihood gap between the factorized and true posteriors with r = 0.86. The two diagnostics, one geometric (IC) and one sampling-based (PSIS), converge on the same answer through entirely different computational paths.

900 simulations. Each point is one model configuration. IC (x-axis) predicts the log-likelihood gap (y-axis) with r = 0.86. Points colored by PSIS k̂ diagnostic. This validates IC as a computationally cheaper alternative to PSIS for detecting factorization failure.

Neuroimaging: IC in human brain networks

CF metrics applied to real-world neuroimaging data from the ABIDE consortium (871 subjects, 20 sites) and the Human Connectome Project (70 subjects, task fMRI). IC computed from functional connectivity matrices differs between clinical populations (small but consistent effects) and tracks cognitive load. The CCRP study (heart-brain coupling) demonstrates cardiac phase modulation of neural IC across task states.

ABIDE: ASD vs Control on 3 IC-derived metrics. All significant (p < 0.025). Effect sizes (Cohen's d) are small but consistent across 20 independent acquisition sites.

CCRP: Mean IC profile across 8 cardiac phase bins for 4 HCP task conditions. Neural coupling is modulated by the cardiac cycle, and IC tracks heart-brain interaction.

Beyond neuroscience: The cognitive science domain also integrates linguistic iconicity data (14,776 words from Winter et al. 2017), measuring sound-meaning coupling, the degree to which a word's phonological form is non-arbitrarily related to its meaning. High iconicity = high IC between phonology and semantics; the Saussurean arbitrariness assumption is the factorization q(sound)·q(meaning).

Complete dataset inventory

All validation data, simulation code, and analysis scripts are maintained in the project repository. Each dataset below has been used in at least one published validation result.

Dataset	N	Type	Key Result
SVF Validation	8,000	Synthetic	MSE ratio grows with coupling
HLM Validation	8,000	Synthetic	IC predicts pooling benefit (r = −0.88)
Three-Layer Validation	16,000	Synthetic	Proximal dominance (2.3× stronger than distal)
IC-PSIS Comparison	900	Synthetic	IC predicts log-lik gap (r = 0.86)
Threshold Calibration	1,050	Synthetic	Three-zone diagnostic system validated
Boolean Manifold	16	Algebraic	Exhaustive classification, Walsh-Hadamard exact
SAT Phase Transition	350	Combinatorial	Spectral frustration tracks satisfiability boundary
ABIDE Neuroimaging	871	Empirical	ASD vs Control FC: IC distinguishes clinical populations (p = 0.002)
HCP Working Memory	70	Empirical	IC contrasts under cognitive load: 4 Bonferroni-significant pairs
HCP CCRP (Heart-Brain)	208	Empirical	Cardiac phase modulation of neural IC across 4 task conditions
Iconicity Ratings	14,776	Empirical	Sound-meaning IC: words with high iconicity ratings show non-arbitrary phonology-semantics coupling (Winter et al. 2017)
CC Diagnostic Ladder	54	Real-world	Wine, Framingham, Air Quality: mediation structure
Literary Corpus	90+	Annotated	Narrative structural coupling in literature