if we have several groups of independent variables (continuos and categorical) that represent various constructs (i.s: socio-enviromental variables, behavioural variables, personal variables) and we want to test their correlation with a categorical dependent variable do we (after the bivariate results):
- do logistic regression in three stages where we first regress the personal variables on the dependent variable, then we do another model with both personal and behavioural variables and a later a third model including all three variable groups?
- or is it more appropriate to conduct a confirmatory factor analysis given that all the variables are hypothesized to influence each other and the dependent variables ?
and if the second option is correct ...what are the best resources to teach me how to do it!
I spent a lot of time learning to analyze those data using GEE modelling because I have a cluster sample design and I ended up creating 3 independent models for each group of variables with the dependent variable...then realized that I didn't combine them! given that they are all interrelated (hypothetically speaking)
Any feedback is highly appreciates
I think you should keep two things separate:
1. If the bunch of variables and the DV are expected to be causally related, you should do a logistic regression. Here you should note that one should not compare results from different logistic regressions (sorry, do not have a reference for this).
2. If the bunch of independent variables are thought to be measures of the same construct, or multiple constructs, you should do an exploratory factor analysis or, if you have good expectations about the factor relationships, a confirmatory factor analysis. If this applies, you should do the factor analysis first and use the results for the logistic regression (if you find god evidence for underlying factors). I do not have a good reference on factor analysis, but you should find a text on this in the sociology or psychology literature.
thank you so very much for your reply!. Yes I do have a big bunch of independent V, some are categorical single variables, and some are constructs (represented summing responses to several scale items) . i have seen that previous research which validated the instrument had published a paper with confirmatory factor analysis and structural equation modelling (which by the way I am reading about and every time I feel I grasp the concept, i fail to actually apply it on my data) (I used my survey in a different population)
I tried to consult with a statistician who recommended I do a multivariable logistic regression (specifically GEE, cause I have a cluster design, students within classrooms) . but with the number of variables I have, I am worried about interpretations. He did say to check for multicollinearity first and take decisions about highly correlated variables before modelling but he also seemed unhappy with my large pool of variables..
I asked him regarding factor analysis but he wasn't experienced enough in it.
what are the limitations of performing the GEE modelling without a factor analysis ??
I do not know much about GEE modelling, but high collinearity among variables indicates that the variables are measures of the same construct. Whether this is true or not is a matter that you need to consider in light of your theory and conceptual knowledge. If you think you have multiple variables measuring the same concept, you should somehow reduce the amount of your data. Otherwise, you effectively overweigh the concept that the variables are measuring. Factor analysis is one way to do this, SEM another, more sophisticated one. So, the question is not factor analysis (or something else) vs. GEE, but that you do one before the other. If you have the opportunity, you should consult some quantitative person from psychology or sociology because they have ample experience with data reduction.
Ok,now I understand. I will try to find someone. I really appreciate your help.