Regression Residuals for Total Energy Intake Adjustment

Confounders are undesirable variables that affect the relationship between two variables of interest in a study. For example, in epidemiological research studying the effects of a nutrient on a health issue, such as diabetes, the participant’s body size or physical activity can confound the relationship between the nutrient and the health issue. Therefore, it is important to eliminate or mitigate the effect of the confounder so that the correlation estimate between the nutrient and the health issue reflects the true unbiased effect of the nutrient.

In observational studies involving nutrition, total energy intake is usually collected and used as a proxy for participant’s body size and physical activity. In addition, total energy intake is the result of intake of many nutrients, including the nutrient we are interested to study its effect. Therefore, both the total energy and the nutrient of interest could be correlated with the outcome. If we build a risk model, such as a regression model, the estimate for the nutrient is confounded by the total energy intake. Therefore, we need to eliminate total energy intake effect from the estimation of nutrient effect.

There are several methods to eliminate or reduce the effect of total energy intake, such as multiple regression, nutrient density scaling, and nutrient residual model.

Nutrient residual model is a commonly used statistical method in nutrition research. This method uses simple regression to regress the nutrient on total energy intake (nutrient is the dependent variable and total energy is the predictor). The residuals from this regression model are assumed to reflect the values of the nutrient purified from the total energy intake. Because the mean of the residuals is zero, a constant is added to all residuals, putting the residual back on the approximate nutrient scale. This constant can be the observed mean of the nutrient from the collected data or the predicted value of the nutrient for mean total energy intake (e.g., 2000 kcal).

The nutrient residual model can be implemented in any statistical software that can run a regression model, such as R, SAS, SPSS, or Excel. The steps are as follows:

  1. Construct a simple regression model where the nutrient is the dependent variable and the total energy is the predictor.
  2. Calculate and save the residuals (residuals are the difference between the predicted values and actual observed values of the nutrients).
  3. Calculate the mean of nutrients in the data and add it to all residual values.

Listing 1 shows the R code for applying nutrient residual method to control for the confounding effect of total energy intake. The nutrient of interest is daily sugar intake. The data is hypothetical and must not be construed as medical fact.

Listing 1: R code to implement nutrient residual adjustment for total energy intake.
# Read in the data
dfNutrientResid <- read.csv("dfNutrientResid.csv")

# Step 1: regress nutrient intake (sugar) on total energy
model <- lm(sugar ~ totalEnergy, data = dfNutrientResid)

# Step 2: extract residuals (variation in nutrient intake not explained by energy)
sugarResiduals <- resid(model)

# Step 3: add back the arithmetic mean of the nutrient intake
meanSugar <- mean(dfNutrientResid$sugar)
energyAdjustedSugar <- sugarResiduals + meanSugar

# Step 4: add eneregy adjusted sugar variable to the dataset
dfNutrientResid$sugarAdj <- energyAdjustedSugar

Now let’s compare the effects of sugar intake without and with adjustment for total energy intake in a logistic regression where we intend to predict diabetes condition. Listing 2 includes the R code and the abbreviated output for comparison.

Listing 2: R code to comapre sugar without and with adjustment for total eneregy intake.
# Sugar WITHOUT adjustment for total eneregy intake
noAdjModel <- glm(diabetese ~ sugar, family = binomial, data = dfNutrientResid)
summary(noAdjModel)

# Abbreviated output
Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)  0.065597   0.204950   0.320    0.749
sugar       -0.002350   0.004181  -0.562    0.574


# Sugar WITH adjustment for total eneregy intake
AdjModel <- glm(diabetese ~ sugarAdj, family = binomial, data = dfNutrientResid)
summary(AdjModel)

# Abbreviated output
Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)  0.048216   0.205629   0.234    0.815
sugarAdj    -0.001978   0.004196  -0.471    0.637

In Listing 2, we can see that the coefficients for sugar unadjusted for total energy and the coefficients for sugar adjusted for total energy are slightly different. Their p values, though nonsignificant, show slight difference, as well. For further details on nutrient residual method and a published research study, please consult the references below.

References

Haghighatdoost, F., Feizi, A., Esmaillzadeh, A., Keshteli, A. H., Roohafza, H., Afshar, H., & Adibi, P. (2020). The MIND (Mediterranean-DASH Diet Intervention for Neurodegenerative Delay) and Mediterranean Diets are differently associated with psychosomatic complaints profile in adults: Results from SEPAHAN Cross-sectional study. Mediterranean Journal of Nutrition and Metabolism13(4), 341-359.

Willett, W. C., Howe, G. R., & Kushi, L. H. (1997). Adjustment for total energy intake in epidemiologic studies. The American journal of clinical nutrition65(4), 1220S-1228S.

Scroll to Top