For anyone interested in or applying mixed methods research
For anyone interested in or applying mixed methods research
Categories: Mixed methods
McNemar/Bowker references
 This topic has 11 replies, 3 voices, and was last updated 5 years, 1 month ago by Stephen Gorard.

AuthorPosts

28th September 2014 at 8:19 am #831Tessa MearnsMember
Following a lot of searching through books and journals to find an alternative to Chi squared for insample testing, I eventually resorted to the internet and found a lot of recommendations for McNemar or Bowker. I ended up using these (well, Bowker, as I always have more than 2 variables), and this seemed to work well.
My problem now is that I don’t have a decent reference to defend my use of this test in my thesis. I can’t find either test in any of my books, and searching for papers hasn’t done much good either. From the information I’ve found, I’m fairly certain I’ve chosen the right test, but I don’t think that the sources I’ve used would look great in my references.
I tried writing to my stats contact at my university (I’m a distance student) but she took 3 months to reply and then didn’t even attempt to answer my questions about my choice of test, so I’m kind of on my own with this. Does anyone know of a book or article that would justify or even just discuss the use of these tests?
My field is education and my research is mixed methods. I’m using the test to measure the significance of differences between responses from the same group regarding different school subjects, and to compare responses from the same respondents during two data collection periods. The data are mostly Likert scale, which I am treating as nominal rather than scale. I am using SPSS and have used Chi squared elsewhere to compare two different groups of respondents.
28th September 2014 at 8:38 am #842Stephen GorardParticipantHave not heard of this much, You can try – Computational Statistics & Data Analysis, 51, 9, pp.4124–4142
But please do not do this. It does not make sense (why on earth would you want the probability that this ‘test’ generates?). Just convert the difference to ‘effect’ sizes for between subject or between episodes comparisons.
28th September 2014 at 9:40 am #841Tessa MearnsMemberThanks for this very quick response. I had actually found this article too (it was the only one I found!) but I am interested in the rest of your comment.
Could you explain a bit more of what you mean about converting effect sizes? I’m completely selftaught when it comes to stats and SPSS so there may be quite a few gaps in my knowledge that I’m not aware of. Often it’s just a question of terminology.
Thanks again,
Tessa
28th September 2014 at 9:53 am #840Stephen GorardParticipantWell, what is it you are trying to decide here? You have your ordinal responses for each subject and at each time period. You can look at them as scatters, or compute the percentage at or above a certain response level in each group, or a range of options. You could present V or more usefully observedexpected.
Take the percentage option. One subject group either has a higher percentage than the other or not. The difference in points is the result. What else would you need to know? Or put another way, what would chisquared or any other sig test tell you additionally?
28th September 2014 at 10:09 am #839Tessa MearnsMemberThat was my original approach but one of my supervisors told me that wasn’t good enough. He wanted me to use ttests and treat the data as continuous but I was dead against that. Chi squared was my compromise, but it can’t be used for comparisons within one group of respondents. I found lots of recommendations for McNemarBowker online, but nothing in the actual literature.
I have found it quite useful to use these tests as I have a lot of data from a large sample. I identified the general trends based on percentages at first and then went back to see what Chi squared showed to be significant or not in order to narrow my findings slightly. Mostly, the differences I had identified were statistically significant, and if they weren’t it usually just confirmed by own doubts about whether I should include them. I’ve also kept in a couple of differences that showed up in the percentages and weren’t significant according to Chi squared, so I wasn’t completely reliant on tests.
Does that make any sense as a justification?
I like the idea of using scatters or looking at percentages in the way you suggest. I hadn’t thought of either of those so will need to look into them a bit more.
28th September 2014 at 10:18 am #838Stephen GorardParticipantOk good. But mainly misses my point. Whether you used t, chi or something more obscure they all try to generate a probability. Why do you want this really rather weird probability and what does it tell you? Put another way, when you say you looked at what was ‘significant’ what does this mean? Remember the probability of being a US senator if one is an American is very small, but the probability of being an American if one is a US senator is 100%.
28th September 2014 at 11:30 am #837Tessa MearnsMemberIt’s really great to actually have a conversation about this! I wish I had posted this question earlier.
What I was trying to find out was whether the differences I had identified were just random or whether they seemed to be connected to the two different educational models (CLILtype Bilingual Education or ‘normal’) that I had used to group the respondents. In terms of probability, I suppose that means I wanted to know whether it was more probable that a pupil studying bilingually would respond more positively (or negatively) to certain items than a pupil studying only in their native language. From my reading, it looked like Chi squared was the ‘softest’ test I could run for that, as I didn’t want to get into using means to understand Likert scale data as had been suggested to me.
With the other type of test, I wanted to find out whether certain responses were more probable among, for example, the bilingual learners, at either the beginning or end of the school year. I used the same approach when looking at differences between responses from bilingual learners in relation to lessons taught in either English or the first language.
28th September 2014 at 11:38 am #836Tessa MearnsMemberI just realised that I’ve misunderstood the probability thing – I think.
When you say it’s about probability, do you actually mean the difference between what the data says and what the probability is that it will say this? I remember reading that and it making sense to me.
If that’s the case, it’s a lot clearer. I wanted to know whether the differences in the responses given by each group differed from each other to a degree that was more than probable if they had just been two random groups rather than groups separated by a specific criterion.
In the withingroup comparisons, I wanted to know whether the differences in the Round1 and Round 2 responses, or in the responses regarding Englishmedium or ‘normal’ lessons, differed more than might otherwise be expected.
I hope that makes more sense.
28th September 2014 at 6:27 pm #835Stephen GorardParticipantBut none of the tests mentioned (or indeed any sig test) can give you that probability. How could it? It’s not magic. Just think about it clearly.
They will all give you the probability of obtaining data as different as you obtained if there was really no difference between the two sets of scores (groups). This is not what you ask for (above). You want, quite properly, the probability of there being no difference between the two sets of scores given the data you actually obtained. These probabilities are completely different and the first cannot be used to compute the second. Hence the tests are useless to you. Hence you should stick to what you originally planned. Do real analysis not pushbutton pseudomagical nonsense.
29th September 2014 at 8:07 pm #834Tessa MearnsMemberThank for your input. Lots of food for thought. I’ll talk to my supervisors about it and have a think about my next steps.
Thanks again for your time,
Tessa
1st October 2014 at 9:43 pm #833Pat BazeleyParticipantThe point of asking for probabilities, as I understand it, is to make it possible to predict, with some degree of assurance, beyond the current sample to a broader population of which this sample is representative. Exploring actual differences as they exist for this sample will tell you only about this sample. The probability is about confidence in prediction, not how much difference the intervention made – the effect size can tell you something about the latter.
2nd October 2014 at 11:43 am #832Stephen GorardParticipantIf ‘exploring actual differences… for this sample will only tell you about this sample’ then what can be used instead for the kind of generalisation you suggest above? It does not make sense.
To summarise. If the sample achieved is not complete (no missing cases or values) or not randomly selected then the kinds of test the supervisor proposed must not not be used.
In the very unilkely event that Tessa has an appropriate sample (and I have never seen or read about one in 20 years) then the probability the test will generate is that of finding a difference as large as observed (which is why we focus on the difference in the sample) assuming that the difference is only caused by random sampling. Clearly this probability is of no practical use to anyone. What Tessa would want is the probability of the difference being caused only by random sampling given the szie of the observed difference. This is a very different probability and cannot be deduced from the useless test result (even though many people pretend that it can). Hence the US senators.

AuthorPosts
 You must be logged in to reply to this topic.