Hello everyone,
I have a linear regression model with a single dependent variable and several independent variables. Among the independent variables, I have 4 categorical variables that have been turned into dummies. However, some of the categorical variables have many levels and consequently many dummies were created…
I need fit the model in a 95% confidence level, so I’m running the Stepwise algorithm on the model. The Stepwise algorithm “deleted” many of the dummies that had been created, causing, for example, that a categorical variable that previously had 10 dummies referring to it, to have only 2 dummies referring to it. That happened because some of the dummies could not be considered at a confidence level of 95%…
My doubt is, should I discard the categorical variables that had some of their dummies excluded during the Stepwise algorithm and keep only the categorical variables whose all dummies were preserved? Or should I keep the categorical variables which dummies have been excluded? Which of these 2 options is better for a predictive model?
Grateful for anyone who can help.
submitted by /u/7inchesdream
[link] [comments]