23rd March 2015 at 10:02 am #646
I have a study on unemployment. The DV has 4 categories = unemployed, employed part-time, employed full-time and self-employed. My sample size is approximately 1400.
For a specific article, I am interested in the unemployed population. Ive run a binary LR regression with 0=employed (full-time, part-time and self) and 1= unemployed. I have been told by the article reviewer that this is not the appropriate technique because
1) the dependent variable can be defined as full-time, part-time, self-employed, and unemployed, and subsequently estimated by the multivariate logit model regression, and
2) The dependent variables are very unbalanced (8% unemployed vs 92% employed). The problem is that maximum likelihood estimation of the logistic model suffers from small-sample bias. And the degree of bias is strongly dependent on the number of cases in the less frequent of the two categories. In order to correct any potential biases, the author should utilise the penalised likelihood, also called the Firth method, or exact logistic regression (another estimation method for small samples but one that can be very computationally intensive).
My question is, should I indeed perform multivariate logistic regression and utilise penalised likelihood, or is the technique I used, given my focus of the unemployed sample, sufficiently appropriate?
Thank you in advance for your assistance,
Kim28th March 2015 at 10:47 pm #651Stephen GorardParticipant
I’ve never seen a successful LR with the dep var so far off 50:50. What exactly are you trying to find out by running this analysis?30th March 2015 at 6:06 am #650
Thanks for your reply. I’m trying to predict which graduates are most likely to be unemployed. I have used binary logistic regression with 0= employed, either full time , part time or self-employed and 1=unemployed. I have demographic variables IVs (race, age, gender, area, first language, parents educations, socio-economic status) and educational variables IVs (field of study, level of study, receipt of career guidance, high education institution type, marks).
Kim30th March 2015 at 4:46 pm #649Stephen GorardParticipant
Well actually you know which kind of graduates are more likely to be unemployed (you cannot strictly ‘predict’ that). You can characterise unemployed graduates in terms of what you know about them. It’s not as neat as LR but given the 92:8 problem it is ok.1st April 2015 at 11:36 am #648Ingo RohlfingMember
I admit I am not familiar with the Firth method, but you might also consider rare events logistic regression:
King, Gary and Langche C. Zeng (2001): Explaining Rare Events in International Relations. International Organization 55 (3): 693-715.
King, Gary and Langche C. Zeng (2001): Logistic Regression in Rare Events Data Political Analysis 9 (2): 137-163.
On the dependent variable, it is partly a theoretical matter of how many categories you define. I guess you reviewer is concerned that the group of employed people is heterogeneous and that your covariates have different effects on the three types of employment. In case your N allows it, you might contrast unemployed with only one type of employed people. Or you run a multinominal logit regression (I guess you mean this by “multivariate”).7th April 2015 at 10:34 am #647
Your reply is very helpful. Thank you.
- You must be logged in to reply to this topic.