# Question about sample size in multiple regression

Home Forums Methodspace discussion Question about sample size in multiple regression

Viewing 15 posts - 1 through 15 (of 20 total)
• Author
Posts
• #1158

I have a question about sample size calculations.  I would like to look at predictors of stage of adoption (participants will be classified into 7 categories – unaware, unengaged by the issue, undecided, decided not to act, decided to act, action, and maintenance).  I have 14 variables that I would like to examine as potential predictors.  What is the best way specifically to figure out sample size for this type of analysis?

#1177
Dave Collingridge
Participant

Greetings Lisa,

Power and sample size calculations for multiple regression  usually require specifying the multiple regression coefficient or the specific correlations between the predictors and outcome variable. Either approach assumes that the outcome is a continuous (ratio or interval level) variable. However, unless I’ve misunderstood, it does not look like your outcome (stage adoption) is a continuous variable. You might consider analyses where the outcome is categorical.

#1176

Thank you Dave for your response.  I see your point.  Would you suggest logistic regression then since it is actually categorical (you could make the argument that it is an ordered category I suppose)?

#1175
Dave Collingridge
Participant

If you can reduce stage of adoption to two categories then a logistic regression might work. If your categories cannot be reduced and they do not have an inherent ordering, a multinomial regression may work. If the categories are ordered you might consider ordinal regression. Beware that ordinal and multinomial regression are quite sophisticated. I have not seem power and sample size tools for these analyses.

#1174

Thank you very much for the direction!  Much appreciated.

#1173
Stephen Gorard
Participant

As already stated, a standard multiple linear regression would not suit here. But how is the classification of adoption stage to be achieved? If it is based on a score, then that score could be predicted variable instead.

In general, samples should be as large as feasibly possible. If this is not large enough to be convincing then the work cannot be done. A minimum to convince me might be around 50 times the number of predictors (counting each dummy variable as one in its own right). ‘Power’ of course is irrelevant, since it is based on the flawed logic of significance testing.

I might try converting the classes into three (not really aware, not acting, acting or similar). And then running 2 sets of binary logistic comparisons. AB and BC. Note that you will need about the same number of cases in each outcome as grouped (else the skew will prevent sizeable variation explained).

#1172

As a rule of thumb: 10-15 sample per each predictor.

#1171
Robert Rieg
Member

You can use the following software to calculate the sample size needed to achieve a certain level of power (typically at least 80, better 90%):

G Power 3: http://www.gpower.hhu.de/

It is free and I found it very useful;

select “F-Test” “Mulitple Regression Fixed model deviation from zero”

Parameters: effect size (0.15 is a medium effect), alpha = 5, power 0.90, number of predictors (your number is 14), you will get 166 as sample size needed!

#1170
Stephen Gorard
Participant

The major problem with such software and such an approach is that it is predicated on significance testing. This cannot be used where the sample is not true, complete and random – and which sample ever is? And even if the sample is complete, significance does not provide the relevant answers here. ‘Effect’ sizes do.

#1169
Aizazullah Khan
Participant

I think you should take 5 observation per variable

#1168
Robert Rieg
Member

The point of this software is to calculate sample size given pre-defined power, not statistical significance. Power is basically the probability to measure an effect if there is really an effect. In that sense; I work with effect sizes and confidence intervals, I do not see why this procedure should (solely) rely on statistical significance. One parameter is the proposed alpha value but that you need also for confidence intervals (1-alpha).

#1167
Stephen Gorard
Participant

Sorry Robert – but the calculations you describe ARE predicated on significance testing. The alpha used in power calculations is the same as significance alpha. And means exactly the same. The algorithm for the software works out how many cases would be needed to have a certain probability (often 80%) of detecting a difference (pattern or trend) at the alpha level of significance. Since sig does not work, nor do power calculations,.

Exactly the same problems arise with confidence intervals (which are just another form of significance for a range not a point). They would also require a full true random sample even if they worked as intended. They do not work for same reasons as sig (they do not provide the probability we want, but are often misinterpreted as doing so). P(D|H) not P(H|D).

Effect sizes have no such assumptions. An effect size can be quoted validly for a population, convenience sample, incomplete random sample etc.

Sample size is important, and is about how many cases would be needed to be convincing (to warrant a claim to knowledge).

#1166

Dear lisa carter

There is soft ware called sample size calculator, you can use that and get accurate sample.

You can also use one of the Random sampling procedure, multi stage or stratified sampling technique