Home › Forums › Methodspace discussion › Question about sample size in multiple regression
- This topic has 19 replies, 10 voices, and was last updated 6 years, 7 months ago by
Dave Collingridge.
-
AuthorPosts
-
19th March 2014 at 10:49 am #1158
Lisa Carter-Harris
MemberI have a question about sample size calculations. I would like to look at predictors of stage of adoption (participants will be classified into 7 categories – unaware, unengaged by the issue, undecided, decided not to act, decided to act, action, and maintenance). I have 14 variables that I would like to examine as potential predictors. What is the best way specifically to figure out sample size for this type of analysis?
19th March 2014 at 4:15 pm #1177Dave Collingridge
ParticipantGreetings Lisa,
Power and sample size calculations for multiple regression usually require specifying the multiple regression coefficient or the specific correlations between the predictors and outcome variable. Either approach assumes that the outcome is a continuous (ratio or interval level) variable. However, unless I’ve misunderstood, it does not look like your outcome (stage adoption) is a continuous variable. You might consider analyses where the outcome is categorical.
19th March 2014 at 4:47 pm #1176Lisa Carter-Harris
MemberThank you Dave for your response. I see your point. Would you suggest logistic regression then since it is actually categorical (you could make the argument that it is an ordered category I suppose)?
19th March 2014 at 8:32 pm #1175Dave Collingridge
ParticipantIf you can reduce stage of adoption to two categories then a logistic regression might work. If your categories cannot be reduced and they do not have an inherent ordering, a multinomial regression may work. If the categories are ordered you might consider ordinal regression. Beware that ordinal and multinomial regression are quite sophisticated. I have not seem power and sample size tools for these analyses.
19th March 2014 at 8:39 pm #1174Lisa Carter-Harris
MemberThank you very much for the direction! Much appreciated.
28th March 2014 at 3:46 pm #1173Stephen Gorard
ParticipantAs already stated, a standard multiple linear regression would not suit here. But how is the classification of adoption stage to be achieved? If it is based on a score, then that score could be predicted variable instead.
In general, samples should be as large as feasibly possible. If this is not large enough to be convincing then the work cannot be done. A minimum to convince me might be around 50 times the number of predictors (counting each dummy variable as one in its own right). ‘Power’ of course is irrelevant, since it is based on the flawed logic of significance testing.
I might try converting the classes into three (not really aware, not acting, acting or similar). And then running 2 sets of binary logistic comparisons. AB and BC. Note that you will need about the same number of cases in each outcome as grouped (else the skew will prevent sizeable variation explained).
28th March 2014 at 8:21 pm #1172abdolghani abdollahi mohammad
ParticipantAs a rule of thumb: 10-15 sample per each predictor.
29th March 2014 at 8:09 pm #1171Robert Rieg
MemberYou can use the following software to calculate the sample size needed to achieve a certain level of power (typically at least 80, better 90%):
G Power 3: http://www.gpower.hhu.de/
It is free and I found it very useful;
select “F-Test” “Mulitple Regression Fixed model deviation from zero”
Parameters: effect size (0.15 is a medium effect), alpha = 5, power 0.90, number of predictors (your number is 14), you will get 166 as sample size needed!
29th March 2014 at 9:08 pm #1170Stephen Gorard
ParticipantThe major problem with such software and such an approach is that it is predicated on significance testing. This cannot be used where the sample is not true, complete and random – and which sample ever is? And even if the sample is complete, significance does not provide the relevant answers here. ‘Effect’ sizes do.
3rd April 2014 at 5:37 am #1169Aizazullah Khan
ParticipantI think you should take 5 observation per variable
3rd April 2014 at 7:04 am #1168Robert Rieg
MemberThe point of this software is to calculate sample size given pre-defined power, not statistical significance. Power is basically the probability to measure an effect if there is really an effect. In that sense; I work with effect sizes and confidence intervals, I do not see why this procedure should (solely) rely on statistical significance. One parameter is the proposed alpha value but that you need also for confidence intervals (1-alpha).
3rd April 2014 at 9:05 am #1167Stephen Gorard
ParticipantSorry Robert – but the calculations you describe ARE predicated on significance testing. The alpha used in power calculations is the same as significance alpha. And means exactly the same. The algorithm for the software works out how many cases would be needed to have a certain probability (often 80%) of detecting a difference (pattern or trend) at the alpha level of significance. Since sig does not work, nor do power calculations,.
Exactly the same problems arise with confidence intervals (which are just another form of significance for a range not a point). They would also require a full true random sample even if they worked as intended. They do not work for same reasons as sig (they do not provide the probability we want, but are often misinterpreted as doing so). P(D|H) not P(H|D).
Effect sizes have no such assumptions. An effect size can be quoted validly for a population, convenience sample, incomplete random sample etc.
Sample size is important, and is about how many cases would be needed to be convincing (to warrant a claim to knowledge).
9th April 2014 at 4:42 am #1166Dear lisa carter
There is soft ware called sample size calculator, you can use that and get accurate sample.
You can also use one of the Random sampling procedure, multi stage or stratified sampling technique
Prof.Dr.Subhadra Iyengar,PhD
HOD,Research Department,PSG,Coimabtore,Tamil Nadu,India
9th April 2014 at 3:00 pm #1165Stephen Gorard
ParticipantAgain sorry Subhadra – the applies as above. Do not use significance for samples size calculations (remember Meehl and all that). Just consider what would be convincing in relation to the number of variables.
10th April 2014 at 2:44 am #1164thank you stephen, I will reconsider.I thought lisa, is concerned about calculation of sample size. So I suggested.
-
AuthorPosts
- You must be logged in to reply to this topic.