Quick bit of background on my question: A study was undertaken at a number of clinics to look at the number testing positive for a specific infection. One of the clinics had an unusually high positivity rate for men and a unusually high difference in positivity between men and women.
I now have data on behavioural and socio economic variables from the sample to analyse and look for associations with positivity for men and women separately.
The problem I have is understanding how to test whether my sample is representative of the clinic positivity rate for the clinic in the year the sample was taken.
I had thought to construct a 2*2 contingency table with the sample and clinic in the columns and negative and positive in the rows. The performing a chi sq test to see if the differed significantly (p<0.05).
However, a colleague pointed to an issue that this would be counting people twice as those in the sample were in the population and said that for the clinic population counts I need to subtract the sample counts. This would ensure the totals of the columns and rows did not exceed the whole clinic population.
I guess this would effectively be seeing if those in the sample differed significantly from those not in the sample.
So that’s where I am and any support on the right way to go about this would very very very welcome!!