Discussion

This discussion focuses on comparative and integrated analysis of parts 1 and 2 of the project to answer the third question posed in the introduction, which was: ‘if there are any biomarker signals that are concordant across both profiles, what do they suggest about the plausibility of a mechanistic link between C677T and FLD risk?’ In order to answer this question, each biomarker’s cumulative conclusions of part 1 and part 2 are summarized and compared to establish concordance where present. Then, these findings are mapped against the reorganization system proposed in the 1.4 synthesis to evaluate mechanistic plausibility. It is important to note that even if the biomarker profile is consistent across both parts of the analysis, that does not constitute proof of a causal relationship due to the split approach of this project. However, the analysis does add evidence of plausible mechanistic overlap and provides a logical basis for future research.

3.1: Biomarker Concordance Evaluation

This section systematically evaluates the findings of each biomarker present in both parts of the analysis. It’s important to note that while initial project plans included a near identical list of profile components for both MTHFR C677T and FLD, literature scarcity and dataset missingness both impacted the final biomarker selection for each project portion. B6/PLP, betaine and methionine were not present in the NHANES cohort data, and other cohorts lacked essential HSI components and other important biomarkers, like homocysteine, so their absence from cross-examination is notable but ultimately unavoidable given dataset constraints. Similarly, GGT and ferritin had strong mechanistic linkage indications for MTHFR, but scarcity in existing literature that met the inclusion and exclusion criteria prevented profile inclusion in part 1. The remaining biomarkers that are evaluated across both parts 1 and 2 are: homocysteine, serum folate, RBC folate, vitamin B12, MMA, and CRP.

3.3.1: Homocysteine

Homocysteine functions as an anchor for the C677T in the profile and shows clear stepwise dose-response with genotype (TT > CT > CC). However, homocysteine’s notable reorganization feature of incomplete penetrance is exposed through the 80% of TT carriers that were included in the GWAS but did not cross the hyperhomocysteinemia threshold (Shane et al., 2018). Since the C677T is the primary genetic determinant of hyperhomocysteinemia, the absence of a stable phenotype indicates compensatory pathways are likely driving this observed incomplete penetrance. This is important to note for integration with FLD results since we would therefore expect weak individual-level signaling even if the population-mean direction is correctly aligned with expectations.

The modeling in part 2 did not reflect this pattern and assigned homocysteine a coefficient of -0.189, which contradicts expectations in both direction and magnitude. Multicollinearity was investigated and dismissed as an explanation, though further analysis in R revealed substantial sex and female-specific healthy skews present in the data. Those structural features may be the cause of improper model conditioning, and that can be seen through the inconsistent coefficient assignments to other features that the R analysis revealed as counter to expected biological mechanisms, which is discussed further in section 2.3.4; this pattern is particularly noteworthy for the female feature, which is initially assigned a +2 value in traditional HSI scoring, but was assigned -0.37 value by this model. Through sex and HSI category (healthy / borderline / nafld-like) stratification in R, visualization revealed relatively stable central distribution tendencies across sexes, with a diluted signal showing a slightly positive trend in both groups.

Both aspects of that visualization are consistent with the incomplete penetrance conclusions of part 1, even though the logistic regression coefficient value is not. With the incomplete penetrance of C677T signal, we would expect to see an even further dilution of the signal in non-stratified general populace data, like NHANES. The homocysteine visualization’s features of stability across and within sexes and a slight positive association with NAFLD are mechanistically consistent with the diluted signal of incomplete penetrance, though this relationship would require further testing in a genotype-stratified sample to confirm or deny the mechanism fully.

3.1.2: Folate (RBC + Serum)

Like homocysteine, both RBC and serum folate consistently show stepwise concentration reductions in TT carriers (TT < CT < CC) that persist even with folate supplementation, which is a symptom of the enzymatic constraint of the C677T profile (detailed further in section 1.4.1). In modeling, both metrics showed weak negative coefficients (serum folate: -0.0220 nmol/L; RBC folate: -0.0020 ng/mL) which aligns with biological expectations, though confidence in this congruency is limited due to inconsistency in model coefficient assignment across other features, which points to significant data structure issues. Initial visual investigation shows serum folate correlating slightly negatively through decreasing floor values across severity classes, rather than general distribution shape shifts, though RBC folate did not show any readily apparent trends. These features could not be further explored via HSI/sex stratification visualization due to project timeline constraints, but they do warrant further investigation in relation to FLD significance and overlap with C677T profiling.

While both folate metrics support the idea of directional concordance at the population level, the structural basis of their enzymatic-specific multi-feature involvement (i.e. substrate sensitivity, deficiency vulnerability, and enzymatic-gap despite supplementation) established in part 1 is not directly testable in the NHANES dataset. Therefore, despite directional consistency, overlap between folate metrics, FLD, and MTHFR mutations requires additional research in genotype-stratified data for mechanistic determination.

3.1.3: Vitamin B12

Vitamin B12 is a significant cofactor for methionine synthase and provides the strongest example of the stress vulnerability feature in C677T reorganization and compensation machinery. B12’s relationship is more indirect than the previously mentioned biomarkers, showing up as a conditional vulnerability that emerges only under deficient conditions; at baseline, no genotypic effects were observed, though under deficiency, TT carriers exhibited ~4x higher deficiency rates and ~2x higher homocysteine responses compared to other groups (Zittan et al., 2007). Clarifying B12’s role as a conditional modifier rather than a predictive element is critical for interpreting its signaling in population-level data.

When looked through the conditional lens, part 2 does align with mechanistic corroboration. Section 2.2 assigned B12 a near-zero coefficient and showed near-zero correlations with other one-carbon biomarkers which we would expect for a conditional modifier of a specific genetic mutation whose signal is even further diluted in unstratified, population-level data. Additionally, B12 is one of the only variables where the visualization in R is consistent with both part 1 and section 2.2, showing a slightly negative distribution trend across HSI severity and placing B12 within the 6/8 group of biomarkers that were mechanistically consistent with established biology.

Read together, the features of near-zero coefficient, weak correlations with other biomarkers and a slightly negative distribution shift show a consistent pattern that aligns with the expected role of a conditional framing of B12 in part 1. However, because NHANES is unstratified for both genotype and deficiency status, the exact mechanism cannot be verified under the framework of this project. While these findings align with a conditional modifier, they could also be explained through other B12-deficient impacted systems relevant to NAFLD or could be indicative of a weak/absent predictor relationship within modeling contexts specifically. More investigation using data that is both genotypically and deficiency-stratified is required for causal assertions.

3.1.4: MMA

MMA’s role in the C677T profile is structurally distinct from other included one-carbon metabolites since it establishes a mechanistic pathway exclusion boundary. This determination was built off of multiple streams of evidence, including a GWAS that determined MTHFR and MMA were unrelated genetically, and pathway testing which showed a discordance with B12-deficiency driven pattern that indicated homocysteine elevation is folate-pathway dependent and unrelated to MMA. In contrast, the regression model assigned MMA a strong negative coefficient, which contradicts both established MMA-fibrosis association and the r = 0.27 value determined during multicollinearity assessment. Visualization did not clarify these findings, showing consistent distribution density, range and direction across HSI categories.

When integrated, the absence of MMA findings in computational analysis may support the assertion of part 1 that MMA-related pathways fall outside the scope of the reorganized system. However, this is purely based on null consistency and cannot be confirmed in the current materials due to data structure issues that prevent investigation of alternative causes.

3.1.5: CRP

Part 1 frames CRP as evidence of downstream homocysteine signaling that correlates with homocysteine elevation rather than genotype. The combination of this finding and established CRP-NAFLD associations indicates that we would expect CRP behavior to reflect general inflammatory biology in the NHANES unstratified data since CRP functions independent of genotype (Yeniova et al., 2014).

Part 2 showed a strong CRP-NAFLD signal across multiple levels, which matches the prediction established, including coefficient assignment, the feature effect plots, and the distribution visualization panels. This is likely, in part, due to the high coverage (~93%) present in both cohorts, leaving CRP mostly unscathed from the missingness issues permeating other biomarker samples. This is further seen in the 2.3.4.3 sex and HSI-stratified figure, which shows similar visual distributions with only subtle sex-specific features noted. This high coverage and unchanging trends despite significant skews is consistent with the CRP-NAFLD signal expected given profile context.

While this signal is more strongly established across project components and existing literature, like the other metabolites, it ultimately cannot be confirmed through this study. Additionally, the sex-specific features noted in section 2.3.4.3 should be considered with limitations given the small sample size of males used.

3.2: Mechanistic Plausibility of Biomarker Concordance

GGT and ferritin could not be included in part 1 due to insufficient C677T-stratified literature, and they were consequently removed from section 3.1 inclusion. However, their results in part 2 are worthy of consideration when evaluating mechanistic plausibility. GGT showed a strong positive signal at both modeling and visualization levels through coefficient assignment and distribution shapes trending in accordance with HSI severity. Like GGT, ferritin showed strong signaling in visualization, but has a near-zero coefficient at the modeling level. These results align with both the framework of part 1 and established NAFLD biology, since GGT and ferritin were both predicted in section 1.3.3.3 to behave similarly to CRP, given that they all function as downstream markers of homocysteine-driven hepatic inflammation and oxidative stress. However, though ferritin and GGT’s results mechanistically align with the established framework, they cannot be evaluated for concordance and viability in this study and should be considered as candidate biomarkers for future genotype-stratified exploration.

When fully integrated, this profile shows mechanistic plausibility for the link between MTHFR C677T and FLD through concordant distributions across multiple biomarkers that is consistent with the reorganization framework established in part 1. Signals that are expected to be visible in population-level data, like homocysteine, folate forms, and CRP, appeared at the distribution level even when model-derived coefficients diverged from those expectations. In addition, signals that were expected to be diluted or null, like vitamin B12 and MMA, respectively, appeared consistent with predicted patterns across both analysis components. The pattern of prediction mapping across different visible, diluted, and null signals, in and of itself adds an additional layer of consistency in support of integrated findings beyond what any single biomarker concordance or cross-cohort analysis can independently provide. However, this association cannot be proven under the current framework and more research is required to determine mechanistic link beyond plausibility.

3.3: Limitations

While overarching patterns and relationships may have mechanistic backing, there are still several limitations to this integrated analysis approach that have not been addressed elsewhere. One of the largest constraints of this framework is the inability to directly test the majority of features in the asserted profile reorganization; part 2 can only test the first feature, which consists of homocysteine anchoring and its incomplete penetrance. The features of compensation pathways, substrate sensitivity, irrecoverable enzymatic gaps, and conditional stress-based destabilization all rely on biomarkers/features that are absent in NHANES and therefore cannot be directly investigated. Additionally, while noted on individual biomarker concordance analysis, it is important to mention that consistency and coherence across concordance does not elevate confidence in the proposed profiles to the level of definitive corroboration; it provides a basis for plausibility investigation but further research would be required for any causal claims.

3.4: Implications & Future Directions

Next steps for this research will require data that supports direct testing, specifically a single dataset that includes genotype, biomarker, and FLD outcome data from the same individuals. This is not currently publicly available, but may be offered through institutional access with formal application. Future work investigating C677T and FLD risk overlap may benefit from focusing on the betaine-choline axis, the CRP-mediated inflammation pathway, and/or conditional deficiency-based status vulnerability, as those patterns each emerged across the integrated profile and plausibly link methylation pathways to hepatic function through established mechanisms. Due to dataset and project timeline constraints, these relationships could not be further explored at this time.

The construction of the C677T profile and the mechanistic plausibility link to FLD demonstrate the value of an integrated and interdisciplinary approach. The systems-level framing detected a profile that would have otherwise remained unseen across seemingly disconnected biomarkers. However, C677T may not be unique in this respect; other gene-disease relationships likely carry profile-level patterns that single-biomarker work misses but integrative framing could recover. These undetected patterns could bear weight for clinical treatment or population health protocols, especially in conditions with common variants that affect large population segments, which would compound the importance of standardizing integration in research approaches. However, this kind of work is rare in the existing research structure, and that shortage impacted this study’s design on several levels. The first instance where this became apparent was during initial data searches, where the lack of genotype-stratified datasets required a split analysis workaround. Even after adapting the project framework to address this, scarcity in genotype-stratified literature made certain biomarker inclusion, like ferritin and GGT, impossible. Finally, even within cohorts selected to maximize biomarker coverage in a national CDC survey, complete-case subsets represented 7.4% and 31.3% of cohorts 1 and 2, respectively, and demonstrated significant bias and data structure consequences. This scarcity may be reflective of industry cultural practices and realities that prioritize small scope and definitive, disciplinary research over the risk and size associated with larger systems investigations. Nevertheless, better infrastructure for intersectional research would not only improve confidence in answering questions related to this investigation, but would also make discovery of profile-level patterns more prevalent in a scientific landscape where they currently remain largely undetected.