Categorical regression mirrors conventional multiple regression, except this technique can also accommodate nominal and ordinal variables. In particular, nominal and ordinal variables are effectively transformed into interval variables. Multiple regression analysis is then applied to these transformed variables.
To illustrate categorical regression, suppose a researcher determines the self esteem, age, extroversion, and religion in a sample of individuals. An extract of the data are displayed below.
The researcher wants to ascertain whether or not self esteem is influenced by age, extroversion, or religion. To fulfill this objective, the researcher would prefer to undertake a multiple regression. Unfortunately, two of the variables are ordinal and one of the variables is nominal. Multiple regression, therefore, is sometimes considered unsuitable under these circumstances.
To illustrate this problem, first consider the ordinal variables. Multiple regression will assume that consecutive levels of these variables are equal. For instance, the difference between 1 and 2 on self esteem is regarded as equivalent to the difference between 3 and 4.
These numbers, however, are arbitrary. For instance, the researcher could have justifiably utilized the numbers 3, 7, 100, 430, 1094 instead of 1, 2, 3, 4, and 5. Unfortunately, each scale will generate an entirely different pattern of results. The question, then, is which scale yields the correct solution: the scale that entails 3, 7, 100, 4430, 1094 , the scale that entails 1, 2, 3, 4, 5, or another one of the endless number of scales that could have been applied?
As discussed in the previous section, the scale of ordinal variables is arbitrary, which presents an interesting problem to multiple regression analysis. Nominal variables also pose a problem to multiple regression. For instance, consider the variable labeled as religion. Suppose that 's represent Christians, '2's represent Muslims, and so forth.
This variable, unless modified appropriately, obviously cannot be entered into the multiple regression. Otherwise, the procedure will assume that religion is quantitative, and thus regard Muslims as higher on some trait than Christians--a meaningless assumption.
Fortunately, nominal variables can be entered in traditional regression, provided they are transformed appropriately. In particular, the researcher needs to dummy code the variable. In essence, this process involves creating a separate column for each religion, apart from one religion, which will be denoted as the reference category. The upshot of this process is depicted below For example, in the column labeled Christians, 1s denote Christians and 0s denote non-Christians. Likewise, in the column labeled Muslims, 1s denote Muslims and 0s denote non-Muslims. Participants coded as 0 on all the religions must pertain to the reference category - that is, the religion that was not coded, such as Buddhists.
When these dummy variables are entered into traditional regression, the output will provide information about the relationship between religion and self esteem. For instance, suppose the variable labeled as 'Christians' attains significance. Roughly speaking, this finding would indicate that Christians differ from Buddhists, the reference category, on self-esteem. Likewise, suppose the variable labeled as 'Muslims' does not attain significance. This finding would indicate that Muslims do not differ from Buddhists on self-esteem.
This discussion neglects some of the complexities associated with dummy coding, but is sufficient to demonstrate one of the shortfalls of this approach. For example, not all of the religions have been compared. For instance, the output does not reveal whether or not Christians differ significantly from Muslims. To undertake this comparison, a different coding scheme would have to be undertaken. However, this coding scheme may neglect some other vital comparisons. Traditional regression does not permit the researcher to undertake all possible comparisons.
To reiterate, ordinal and nominal variables can undermine traditional regression. For ordinal variables, the scale is arbitrary and yet different scales yield disparate findings. For nominal variables, the output is difficult to interpret and may not provide information about all of the relevant comparisons.
Fortunately, categorical regression analysis, one of the options in SPSS, circumvents these problems. Essentially, categorical regression converts nominal and ordinal variables to interval scales. This conversion is designed to maximize the relationship between each predictor and the dependent variable. To appreciate this transformation, see Overview of Optimal Scaling, which is an article that is currently being constructed.
To implement this technique,
The only complication relates to specifying the level of measurement. In general, ordinal variables are specified as ordinal, nominal variables are specified as nominal, and so forth Nonetheless, exceptions to this principle exist.
For instance, consider two interval variables that are related in a non-linear fashion. To optimize the relationship between these variables, the researcher may designate one of them as nominal or ordinal. As a consequence, SPSS will modify the scale of this variable to optimize the relationship.
In other words, when the researcher does not want to modify the spacing between consecutive levels, the variable should be designated as "Numeric". When the researcher wants to modify the spacing between consecutive levels, without adjusting the order, the variable should be designated as "Ordinal". Otherwise, the variable should be designated as "Nominal".
Traditional regression provides the unstandardized beta values, standard error, standardized beta values, t value and p value associated with each predictor. Categorical regression differs in the following respects:
The zero-order correlation is simply the correlation between each predictor and the dependent variable, after these variables have undergone the appropriate transformations.
Importance obviously indicates the importance of each predictor, using Pratt's measure. This measure is roughly equivalent to the product of the regression coefficient and zero-order correlation. This index is primarily used to uncover suppressor variables. That is, suppose a predictor yields a relatively high beta but low importance. This situation suggests the variable may have been suppressed by other predictors.
A clear exposition of partial correlations, part correlations, and tolerance can be located in most multivariate textbooks. In essence, partial and part correlations are like zero-order correlations, except the effect of all other predictors has been controlled. Tolerance is utilized to identify multicollinearity.
To generate some additional tables or execute more complex procedures, press the "Options" button, and tick the appropriate box.
Last Update: 6/2/2016