# Using the Tobit model

HI All,

Apparently it was Einstein that said You do not really understand something unless you can explain it to your grandmother. I was hoping someone could treat me like their grandmother and explain the Tobit model to me.

Basically, I have 100 participants in two groups, in each group. I used an assessment of ability of basic functioning at 4 time-points. The assessment comprised 6 questions, each scored 0 – 5. I do not have individual scores for each question, just total scores that range from 0 – 30. Higher scores indicate better functioning. The problem is that the assessment is very basic and has a significant ceiling effect. Results are very negatively skewed. The majority of participants scored close to 30, especially after the intervention at the 3 follow-up time-points. It is likely that not all of the participants who scored at the upper limits are truly equal in ability: some of the participants were just about scoring 30 and others scored 30 with ease and would score much higher if it were possible and so the data are censored from above.

I want to compare the two groups and over time but obviously this is very difficult given the nature of the results. Transformations of any kind make no difference. I have been advised that the Tobit model is the best equipped for this assessment.

However, I have only a basic knowledge of statistics and have found information on the Tobit model to be quite complicated. I need to be able to explain this model in plain language and I cannot find a plain language, nuts and bolts explanation as to what the Tobit model actually does and how. Can anyone explain the Tobit model (as if I was their grandmother remember!)?

Extremely grateful for any help

A

Katie Metzler
These slides have a pretty simple explanation – see slide 12. I’m no statistician, but I can sort of understand what this means!Good luck!

Katie

Hi, I would have to re-read the problem and see the data, but from the outside it doesn’t look like a problem suited for Tobit. Tobit is a type of constrained regression analyses, often used if the researcher has data on the independent variables; but data on the dependent are missing. For example: in the population of employed and unemployed workers, data on the number of hours worked is simply not there for the unemployed, by definition. Tobit takes the data for the unemployed into consideration; the beta-coefficients are a mix of participation (employment) and the effect of independent variables (age; education), on number of hours worked. The problem seems that the measurement is not valid; there is no technique to cope with that. If the problems appear at the high end of the scale, why not drop them from the analysis?

Anonymous
i have a somewhat different opinion from Robert when it comes to Tobit (but we both agree on a central point on which i will elaborate on later). i have seen Tobit regression models being used to handle data with strong floor or ceiling effects. an aritcle i usually recommend people to read because i find it ammenable and relates to social scientists is Matthew McBee (2010) Modeling Outcomes With Floor or Ceiling Effects: An Introduction to the Tobit Model which you can find on Gifted Child Quarterly with doi 10.1177/0016986210379095 (i hate, hate APA style so i dont follow it, but i think i’ve given you enough info to locate it).

Robert’s explanation is pretty clear so i dont really have anything to add to it, but i do have to point out one very, very, VERY big concern: just as with regular OLS regression, regular Tobit regression also assumes independence among the errors. you’ve just mentioned you made your assessments at 4 different time points, which means you’ve already violated that assumption… which means you’ll either have to approach this from a mixed-effects perspective  (sometimes i see the name “multileve linear model” or “hierarchical linear model” being thrown around in case you’re more familiar with those) or i would handle it using generalized estimating equations because it makes minimal distributional assumptions on the data. i know there are modifications for longitudinal tobit models so i guess you could also look into those as well.

where do Robert and I agree? perhaps you could handle this much more easily if you either change your measure or get rid of a few cases because (at least from the way i’m looking at it) i think you’re gonna have to dwell deeply in the statistical complexities of these techniques if you want to go ahead and analyze the data the way it is now.

Robert, many thanks for the explanation and suggestion. I think i will be following your advice! especially after Oscars response!

