Quick bit of background on my question: A study was undertaken at a number of clinics to look at the number testing positive for a specific infection. One of the clinics had an unusually high positivity rate for men and a unusually high difference in positivity between men and women.
I now have data on behavioural and socio economic variables from the sample to analyse and look for associations with positivity for men and women separately.
The problem I have is understanding how to test whether my sample is representative of the clinic positivity rate for the clinic in the year the sample was taken.
I had thought to construct a 2*2 contingency table with the sample and clinic in the columns and negative and positive in the rows. The performing a chi sq test to see if the differed significantly (p<0.05).
However, a colleague pointed to an issue that this would be counting people twice as those in the sample were in the population and said that for the clinic population counts I need to subtract the sample counts. This would ensure the totals of the columns and rows did not exceed the whole clinic population.
I guess this would effectively be seeing if those in the sample differed significantly from those not in the sample.
So that's where I am and any support on the right way to go about this would very very very welcome!!
What are your Predictor(s) and Outcome(s) variables?
For this test, data source, (clinic or sample) would be independent variable and positive infection tests would be the dependent variable. The actual analysis (should the sample be representative) will use binary outcomes of positive/negative infection tests against a number of demographic and behavioral independent variables. All data is categorical.
If the population is large enough, this won't matter. If it's not, your colleague is correct, you need to subtract so that the grand total is the number of people.