Technical Appendix
Target population
This analysis applies to adults aged 30-70 years (the primary analysis uses a 40-year-old baseline) drawn from the general population rather than specifically high-risk groups, with estimates derived primarily from US and European cohorts. Individuals with nut allergies (~1-2% of the population) are excluded. The model does not cover secondary prevention (existing CVD), the very elderly (80+), or non-Western populations.
Monte Carlo uncertainty propagation algorithm
The analysis uses a hierarchical forward sampling model with standardized prior draws (numerically equivalent to “non-centered parameterization” in Bayesian MCMC but without the inference interpretation — there is no likelihood). Since there is no outcome likelihood, inference is direct Monte Carlo sampling from priors rather than MCMC.
Priors: Nutrient effects follow \beta_{nutrient,pathway} \sim \text{Normal}(\mu_{meta}, \sigma_{meta}) from Table 2; the hierarchical shrinkage scale follows \tau_{pathway} \sim \text{HalfNormal}(0.015); standardized deviations follow z_{nut,pathway} \sim \text{Normal}(0, 1); and the confounding fraction follows c \sim \text{Beta}(1.5, 6.0).
Justification for τ ~ HalfNormal(0.015): Earlier versions let nut-specific residual adjustments do too much work. The smaller scale parameter 0.015 constrains those deviations to remain modest after nutrient composition is already accounted for. This still allows walnuts to retain a small CVD-specific edge without letting food-specific bonuses dominate the model.
Model: 1. Compute nutrient-predicted effect: \theta_{nutrients} = \sum_{n} \beta_n \cdot \text{composition}_{nut,n} 2. Add hierarchical deviation via standardized draws: \theta_{true} = \theta_{nutrients} + \tau \cdot z 3. Apply HR-centered Jensen correction: \theta_{hr} = \theta_{true} - \tfrac{1}{2}\left(\sum_n \sigma_n^2 x_{nut,n}^2 + \tau^2\right), so E[\exp(\theta_{hr})] = \exp(\sum_n \mu_n x_{nut,n}). 4. Shrink nut-specific adjustment toward the null by the evidence tier: a_{\text{shrunk}} = 1 + (a - 1)(1 - s_{\text{tier}}), with s_{\text{strong}} = 0.15, s_{\text{moderate}} = 0.30, s_{\text{limited}} = 0.50. 5. Apply nut-specific adjustment: RR_{adjusted} = RR_{hr}^{a_{\text{shrunk}}} - On log scale: \theta_{adjusted} = a_{\text{shrunk}} \times \theta_{hr} - For protective effects (RR < 1, equivalently \theta < 0), adjustments a > 1 amplify the effect (make RR smaller). - Worked example (walnut CVD, strong evidence). Nutrient-predicted \theta_{hr} \approx -0.05. Nominal adjustment a = 1.10. After 15% strong-tier shrinkage, a_{\text{shrunk}} \approx 1.085. Adjusted \theta = 1.085 \times -0.05 = -0.054, so RR_{\text{adjusted}} = \exp(-0.054) \approx 0.947 (slightly stronger CVD protection than the nutrient model predicts alone). After the 20% confounding shrinkage the sample-mean RR lands near 0.96, consistent with Table 4. 6. Apply confounding: \theta_{causal} = c \cdot \theta_{adjusted} 7. Convert to RR: RR_{pathway} = \exp(\theta_{causal})
Tiered publication-bias shrinkage: Steps 3–4 import two layers from the newer Optiqal framework. HR-centering ensures the aggregate pre-adjustment RR has the expected mean \exp(\sum_n \mu_n x_{nut,n}) rather than one inflated by the half-variance of the log-RR; after the nut-specific adjustment multiplication in step 5, a residual Jensen gap of order \tfrac{1}{2}\mathrm{Var}(a_{\text{shrunk}})(\sum_n \mu_n x_{nut,n})^2 remains that the base case leaves uncorrected. The numerical effect is small (under 0.15 pp on RR, worst case walnut CVD) but the gap is noted for completeness. Tiered shrinkage addresses the well-documented gap between initial effect sizes and replicated effects as evidence quality falls: strong-evidence nuts retain 85% of the nominal central estimate for the nut-specific residual, moderate 70%, limited 50%. Following optiqal’s convention, only the central estimate is shrunk; the adjustment SD is left intact so that uncertainty reflects replication risk rather than the attenuation factor itself. Nutrient priors (Table 2) are already drawn from meta-analyses, so they are assumed pre-shrunk and this layer only touches the nut-specific residual.
Monte Carlo sampling: The model draws 10,000 forward samples from priors (no MCMC is needed since there is no likelihood). A fixed seed of 42 ensures reproducibility; runtime is well under one second using vectorized NumPy with no external inference library required.
Lifecycle integration: For each of the 10,000 samples, the model extracts pathway-specific RRs (already confounding-adjusted), computes age-weighted mortality reduction using CDC life tables, applies the smoothed EQ-5D quality-weight trajectory by age, and computes undiscounted QALYs alongside cost-discounted lifetime costs.
Nut-specific adjustment priors
These adjustment factors are priors used in the hierarchical model. The adjustment is applied as an exponent on the RR scale: RR_{adjusted} = RR_{nutrients}^{a}. On the log-RR scale, this is multiplicative: \log(RR_{adjusted}) = a \times \log(RR_{nutrients}). In the current version, these adjustments are deliberately small. They encode modest residual evidence beyond nutrient composition rather than large food-specific bonuses.
Derivation of adjustment values
Adjustments capture residual effects from nut-specific RCTs after accounting for nutrient composition. The derivation for walnut’s CVD adjustment illustrates the method:
In the PREDIMED trial — 2008 pilot biomarker study (Ros et al. 2008); primary CVD-endpoint results (Estruch et al. 2018) — the nut intervention arm (which included 15g walnuts, 7.5g almonds, and 7.5g hazelnuts daily, not walnuts alone) showed approximately 30% reduction in major cardiovascular events. Attributing much of that residual to walnuts specifically is too aggressive: the mixed nut arm does not separately identify walnut-specific causal effects. The revised model therefore treats walnut’s residual edge as modest, using a CVD adjustment of 1.10 with a wide SD (0.12) rather than the much larger multiplier used previously.
Almonds serve as the reference nut (adjustment = 1.00) because their RCT effects are well-explained by nutrient composition (vitamin E, fiber, MUFA). This ensures adjustments represent genuine “beyond-nutrient” effects rather than artifacts.
Independence from nutrient priors: The nutrient priors (Table 2) use effect estimates from studies that pool across food sources (e.g., Naghshi 2021 for ALA includes fish and plant sources). The nut-specific adjustments use residual effects from nut-only RCTs, avoiding double-counting.
| Nut | CVD Adj | Cancer Adj | Other Adj | Evidence | Rationale |
|---|---|---|---|---|---|
| Walnut | 1.10 (0.12) | 1.00 (0.08) | 1.00 (0.08) | Strong | Modest residual CVD edge beyond nutrients |
| Pistachio | 1.04 (0.07) | 1.00 (0.08) | 1.00 (0.08) | Moderate | Small lipid edge, otherwise neutral |
| Almond | 1.00 (0.05) | 1.00 (0.06) | 1.00 (0.05) | Strong | Reference nut |
| Pecan | 1.03 (0.08) | 1.00 (0.10) | 1.00 (0.10) | Moderate | Small residual CVD effect only |
| Macadamia | 1.02 (0.08) | 1.00 (0.10) | 1.00 (0.10) | Moderate | MUFA profile, but weak residual evidence |
| Peanut | 0.98 (0.06) | 1.00 (0.06) | 1.00 (0.06) | Strong | Slightly below tree nuts on CVD |
| Hazelnut | 1.03 (0.07) | 1.00 (0.08) | 1.00 (0.08) | Moderate | Part of PREDIMED nut arm (Estruch et al. 2018); lipid improvements in Orem et al. (2013) |
| Cashew | 0.97 (0.10) | 1.00 (0.10) | 1.00 (0.10) | Limited | Mixed RCT evidence, near-neutral |
Note on cancer adjustments: Previous versions applied a 10% cancer penalty to peanuts based on aflatoxin concerns. However, US FDA regulations limit aflatoxin to <20 ppb, and epidemiological studies show no excess cancer risk in US peanut consumers (Wu and Khlangwiset 2010). The cancer adjustment is now set to 1.00 (neutral). Similarly, macadamia and pecan cancer adjustments are set to 1.00 given insufficient evidence for deviation from nutrient predictions.
Nuts with limited evidence (macadamia, pecan, cashew) receive higher SD values to reflect greater uncertainty.
Confounding prior derivation
I adopt a skeptical Beta(1.5, 6.0) prior with mean 0.2 and 95% interval 0.02-0.53 (roughly 2-53%). This prior reflects healthy-user bias in nutrition cohorts, weak Mendelian-randomization support, and the gap between biomarker changes and hard outcomes.
Three evidence sources inform this choice:
| Source | Implied Causal % | Interpretation |
|---|---|---|
| LDL pathway calibration | Low double digits | Mechanistic floor |
| Mendelian randomization | Near zero to small | Mostly null, but weak instruments |
| Substitution / Golestan evidence | Small to moderate | Prevents the prior from collapsing to zero |
The LDL pathway still provides a floor rather than a ceiling, but the model no longer treats broader pathway stories as strong enough to justify a 50% causal prior. Sensitivity analysis across 10-33% causal priors is presented in the main text.
Cost-effectiveness model
Data sources
The cost-effectiveness model draws on CDC National Vital Statistics (2021) life tables for age-specific mortality (NVSR Vol 72 No 12), cause-of-death fractions from Table 6 of Xu et al. (2024), age-varying health-related quality of life weights derived from Sullivan & Ghushchyan (2006) US EQ-5D index, a 3% annual discount rate for costs, and per-nut retail prices retrieved from nuts.com on 2026-04-19 ((Ghenis 2026)). The raw price snapshot (one row per nut, with product URL, package size, and price) lives at src/whatnut/data/raw/retail_prices/retail_prices.csv; whatnut.data_build.retail_prices reads that CSV, validates the row-level math, and writes the per-nut median to nuts.yaml.
Lifecycle model
For a 40-year-old beginning daily nut consumption, the current model estimates 0.03-0.15 additional life years (0.4-1.8 months) across nut types, corresponding to 0.02-0.10 QALYs under 0% health discounting. ICERs range from approximately $92,106-$453,297 per QALY across nut types.
E-value analysis
Per VanderWeele and Ding (2017), the E-value quantifies the minimum strength of association an unmeasured confounder would need with both exposure and outcome to fully explain an observed association.
For a protective exposure with hazard ratio HR, I first convert to relative risk RR = 1/HR, then calculate:
E\text{-value} = RR + \sqrt{RR \times (RR - 1)}
For HR = 0.78:
- RR = 1/0.78 = 1.28
- E\text{-value} = 1.28 + \sqrt{1.28 \times 0.28} = 1.28 + 0.60 = 1.88
An unmeasured confounder would need RR ≥ 1.88 with both nut consumption and mortality to fully explain the observed effect.
Pathway-specific mortality effects
From Aune et al. (2016):
| Cause of Death | Relative Risk | 95% CI | Deaths in Meta-Analysis |
|---|---|---|---|
| CHD | 0.71 | 0.63-0.80 | 20,381 |
| CVD | 0.79 | 0.70-0.88 | — |
| Cancer | 0.87 | 0.80-0.93 | 21,353 |
| Other | 0.90 | 0.85-0.95 | Assumed |
Note: CHD = coronary heart disease; CVD = cardiovascular disease (broader category). The model’s “CVD pathway” incorporates both CHD and broader cardiovascular effects.
Age-varying cause fractions
Cause-of-death proportions vary by age, extracted from Table 6 of Xu et al. (2024). whatnut.data_build.cdc_cause_fractions parses the 2021 rows for All causes, Diseases of heart, Cerebrovascular diseases, and Malignant neoplasms; CVD below is defined as heart + cerebrovascular (ICD-10 I00–I09 + I11 + I13 + I20–I51 + I60–I69), narrower than the full ICD-10 I00–I99 block by ~3–5 pp. Rows below report fractions at the lower bound of each NVSR 10-year age group (e.g., “40” is the 35–44 band):
| Age (group start) | CVD | Cancer | Other |
|---|---|---|---|
| 25 | 5.8% | 4.4% | 89.8% |
| 35 | 11.9% | 9.0% | 79.1% |
| 45 | 18.6% | 15.5% | 65.8% |
| 55 | 21.7% | 22.6% | 55.7% |
| 65 | 23.3% | 24.7% | 52.0% |
| 75 | 25.8% | 19.9% | 54.4% |
| 85+ | 33.0% | 10.9% | 56.2% |
In Aune 2016’s meta-analytic estimates above, CVD carries the lowest (most protective) prior RR (~0.75); the post-shrinkage model RRs in Table 4 are much closer to null.
Comparison with direct meta-analysis sampling
An alternative approach samples cause-specific relative risks directly from meta-analysis estimates (e.g., log-normal distributions based on Aune et al. (2016)) rather than deriving them from nutrient composition. This simpler approach yields broadly comparable results because the nutrient-derived priors are calibrated to match meta-analysis estimates.
The nutrient-derived approach used in this analysis provides several advantages over direct meta-analysis sampling. First, it offers mechanistic interpretability by attributing effects to specific nutrients (ALA, fiber, magnesium). Second, poorly-evidenced nuts shrink toward nutrient-predicted effects through principled hierarchical shrinkage. Third, each prior is traceable to independent meta-analyses, ensuring transparency. Fourth, compositional differences drive differential estimates across nut types—for example, the model can distinguish walnuts from almonds based on ALA content rather than relying on a single pooled nut estimate.
Both approaches use forward Monte Carlo sampling (no MCMC is needed since there is no likelihood function). The nutrient-derived approach is preferred for its mechanistic transparency and ability to differentiate nuts based on composition.
Limitations
The causal fraction estimate remains uncertain even after shrinkage, and the true value could be lower or modestly higher than the base case. Most source studies come from Western populations (US, Europe, Australia), limiting generalizability. The model still assumes sustained 28g/day intake and does not explicitly model calorie substitution, adherence decay, or personalized baseline risk.