MTHFR C677T & Fatty Liver Disease: A Biomarker Analysis
Overview
This project investigates whether biomarkers associated with MTHFR C677T genetic variants retain relevance and show correlated patterns in population-level Fatty Liver Disease (FLD) analysis. FLD is a liver condition characterized by the accumulation of lipid droplets in the liver, affecting up to 42.2% of adults in the US (Jones et al., 2022). The MTHFR C677T mutation is one of the most common functional genetic variants in the human genome, carried by approximately 25% of the US population (Graydon et al., 2019). Since the liver functions as the central hub for both lipid regulation and folate-dependent one-carbon metabolism, it provides a biologically plausible context by which to examine potential downstream effects of MTHFR mutations on metabolic dysfunction cases, like FLD. If an association can be found between the two, that can be used to develop protocols and recommendations for preventative and reactive care that could impact a significant portion of the population, making it an incredibly important area of research. Current research confirms the observed phenomenon of the C677T variant of the MTHFR gene affects liver-related biomarkers, so this project investigates how the two factors are related (Christianson et al., 2025).
Directly testing this relationship would require a single dataset that contains both MTHFR genotype data and FLD biomarker data from the same individuals in order to avoid data and conclusion conflation. However, this combination is not publically available and access to the sources that exist require extensive applications and vetting. This project therefore uses a split analytical approach to focus instead on overlapping biomarker trends as potential indications of mechanistic links.
It is important to note that even if the biomarker profile is consistent across both parts of the analysis, that does not constitute proof of a causal relationship. Conflation of the two would require further research, specifically using individuals whose biomarkers and genotype is recorded in the same study. However, the analysis adds evidence of plausible mechanistic overlap and provides a logical basis for future research.
Project Aims/Research Questions
This project aims to answer the following questions:
What, if any, directional biomarker profile trends are associated with the MTHFR C677T genotype, and how consistent are they across peer-reviewed literature?
Do any biomarkers associated with MTHFR C677T profile carry independent predictive signalling for FLD classification in the NHANES population cohort?
If there are any biomarker signals that are concordant across both profiles, what do they suggest about the plausibility of a mechanistic link between C677T and FLD risk?
Project Structure
This project is organized as a Quarto website with the following components:
Part 1 - MTHFR Biomarker Profile
This part establishes a biomarker profile associated with the MTHFR C677T mutation through a PRISMA-informed mini meta-analysis focused on mechanisms and biomarker trend exploration. This part compiles peer-reviewed literature that investigates how specific biomarkers change across C677T genotypes (CC/CT/TT) in order to determine whether mutation type has a directional impact on biomarker levels.
While not a full meta-analysis, PRISMA (Preferred Reporting Items for Systemic Review and Meta-Analyses) techniques including an explicit search strategy, a pre-defined inclusion and exclusion criteria, and a study characteristic table are used to ensure conclusions are grounded in systematically reviewed literature.
Part 2 - FLD Classification and Biomarker Analysis
This part is population-framed data analysis of mechanisms and biomarker trend exploration related to FLD. It evaluates whether the biomarker trends established in Part 1 are present in population-level FLD data, using machine learning concepts and data analysis techniques applied to NHANES 2001–2004 cohort data. The machine learning portion contains two parallel models to be compared: a core feature model that reconstructs the validated Hepatic Steatosis Index (HSI) proxy, and a biomarker model that incorporates additional metrics that are associated with FLD status but are not included in HSI calculations. Data extraction and preprocessing were performed in SQL, modeling in Python, and interpretive visualization in R, all contained within a single Quarto document.
Discussion
This portion integrates the findings from both parts. It evaluates concordance between the mechanistic profile constructed for MTHFR C677T and population-level signalling from the models, as well as provides interpretation of compiled evidence. This section also addresses project limitations and directions for future research.
Technical Overview
This project was completed by Colette Rouiller as an IVSP capstone for her B.S. of Bioinformatics and Computational Biology at the University of Maryland, College Park. The two-part structure allows the capstone to demonstrate biological and biochemical mechanism understanding and analysis, programming and statistical analysis skills across multiple languages, and integration of the two elements into a realistic bioinformatic-based inquiry, process, and final deliverable.
.