LePetitEconomiste Posted August 30, 2023 Posted August 30, 2023 Hello everyone and thank you for your support. I am new to the forum I have the following question. Suppose we have the following variables: Height, a continuous variable Sport, a dummy equal to 1 if the person did any sports when young; Vitamins, a dummy equal to 1 if the person took any extra vitamins when young. We set up the following regression: Height = a + b•Sport + c•Vitamins + d•SportXVitamins + e (1) … Suppose instead we define two new variables: SportOnly, a dummy equal to 1 if the person did any sports but did not take any vitamins when young. VitaminsOnly, a dummy equal to 1 if the person did take vitamins but did not do any sports. We set up the following regression: Height = ß1 + ß2•SportOnly + ß3•VitaminsOnly + ß4•SportXVitamins + ε (2) (By the way, it turns out that coefficients ß1, ß2 and ß3 in regression (2) are respectively equal to a, b and c from regression (1). Instead, d is not equal to ß4. However, it must be noted that ß4 = b + c + d.) I am interested in whether turning a two-variable regression model with an interaction (case 1) into what basically is a three-variable linear regression (case 2) makes any sense or can provide any additional insight. For instance, suppose that we run regression (1) and that the interaction is statistically not significant. Then we run regression (2) and we find that the coefficient associated to "both Sport and Vitamins" is significant. Which of the two regressions can better help me understand what is going on between Sport and Vitamins? The one where the interaction is insignificant, or the one where it is? It might be a silly question but I am getting confused. Thanks a lot!
Recommended Posts