In the previous segment, we talked about effect sizes. An effect size is a number that summarizes how the output of a model changes when we change the input.
When we are looking at the effect of a quantitative input X on the output Y, the effect size is a rate, and has units of Y divided by X.
But for an effect size involving a categorical input on an output Y, the effect size is a difference and has the same units as Y.
What happens when the response variable is categorical, that is, when the output is one of a set of named levels instead of a number? This is more than a technical question. It goes to the heart of what should be the output of the model function for a categorical response variable. It turns out that providing a category as output, while natural, is very limiting. Better to give a number or set of numbers: the probabilities according to the model, of the class of interest or of all the classes.
[[3.05B]] As an example, consider a model of the categorical variable married as a function of explanatory variables like age, education, and sex.
As always, we need to have a model from which to calculate the effect size. We'll compare the model output for two different ages.
[[3.06]] As you can see, the output is the same for both ages. Does this mean that the effect size of age on married is zero: no effect of age? Not really.
Changes in categorical outputs are all or nothing: either a change or no change at all. It's as if we were tracking one individual over the years: "no change this year", "no change the next year", "still no change", "finally, a change". But our models are really about groups. For any individual, marriage is all or nothing, but for groups, we can talk about the probability of an individual being married.
[[3.07]] Many model architectures for categorical outputs do calculate the probability of each possible level of the output.
The model indicates that an extra year of age is associated with a 16 percentage point increase in the probability of being married.