What Is The Best Way To Build A Model From A Dataset That Has Many Dummy Variables?

Hello everyone,

I have a linear regression model with a single dependent variable and several independent variables. Among the independent variables, I have 4 categorical variables that have been turned into dummies. However, some of the categorical variables have many levels and consequently many dummies were created…

I need fit the model in a 95% confidence level, so I’m running the Stepwise algorithm on the model. The Stepwise algorithm “deleted” many of the dummies that had been created, causing, for example, that a categorical variable that previously had 10 dummies referring to it, to have only 2 dummies referring to it. That happened because some of the dummies could not be considered at a confidence level of 95%…

My doubt is, should I discard the categorical variables that had some of their dummies excluded during the Stepwise algorithm and keep only the categorical variables whose all dummies were preserved? Or should I keep the categorical variables which dummies have been excluded? Which of these 2 options is better for a predictive model?

Grateful for anyone who can help.

submitted by /u/7inchesdream
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *