Dealing with Violation of Independence

Home Forums Methodspace discussion Dealing with Violation of Independence

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #882
    Henry Tran
    Member

    Greetings all,

    I’m new here, have a dilemma and hope that you all can help. I want to run a randomized block design, with salary as my outcome variable and different relevant labor markets (i.e., performance based, geographic based, sized based, revenue based) as levels my independent variable. Essentially I want to compare the salary of the different labor markets, as you would in an ANOVA. However, because there are only a small number of companies in the industry I am looking at, some of the companies in a particular labor market (e.g., geography) will also show up in another labor market (e.g., revenue based), thus violating the assumption of independence. 

    Does anyone have a good idea to resolve this? One idea that I’m considering is a multilevel model (MLM), but I am not quite sure how I would conceptually frame this problem in a MLM, as I want to see if salaries are different between different definitions of the labor market. Thank you.

    Sincerely,

    Henry

    #886
    Stephen Gorard
    Participant

    Dear Henry,

    I think you have answered your own question in the last sentence. If you were in a shop buying a tin of soup and one brand was at 45 cents and the other at 47 cents, is there a difference in price and if so which is the higher? The answer in real-life is obvious. No need for absurd tests or complex nonsense like MLM (yeuk!). If the two scores are different you can see. And more importantly you can see how much they differ by. 

    Now imagine that you were in a shop and you examined the trolleys of two shoppers. You add the cost of the items in the first trolley to get 101 dollars. The total in the second trolley is 107 dollars. Again it is clear they are different and by how much. But, you ask, what difference does it make if some of the items in the two trolleys are the same? Shoppers may well have several items in common. So what?

    What difference does that make to how you add them up? None. To the difference between the totals (6 dollars)? None. Etc. 

    The difficulty you face only arises from the confusion of purported methods experts and resources. Ignore them. What I suspect underlies your question is whether the difference that you will find (Meehl’s conjecture and all that) matters. This is the judgement you make as an analyst. There are techniques to help in this, and some are covered in my book:

    Gorard, S. (2013) Research Design: Robust approaches for the social sciences, London: SAGE

    I have attached an extract on ‘how big is a difference?’.

    #885
    Henry Tran
    Member

    Hi Stephen,

    Thank you for sharing your input and extract. Since I am using inferential statistics, violation of the independence assumption is a concern I had. Even if I didn’t think it mattered in this particular case, I don’t know if my colleagues who serve as journal reviewers would agree. Because the number of companies in the industry is small, I am considering working with data for the entire population. By doing that, I can bypass the whole issue of inference and just identify the difference as I see it, without worrying about the assumptions of inferential statistics. 

    Henry

    #884
    Stephen Gorard
    Participant

    But why are you using the absurd inferential statistics? What are you attempting to generalise to? Do you really have a complete random sample (I suspect not)? And why on earth bother with all of that if you have population data? 

    I am a journal and funding agency reviewer (for about 80), and journal editor. I would never allow anyone to publish with inferential statistics for a population or a less than complete random sample (allocation). In practice, of course, this means never. I know there are plenty of mistaken reviewers out there (see issue 38,1 of the Psychology of Education Review, for example). But have some principle. If the reviewers are wrong we tell them so even if it makes it slightly harder to publish while the ‘new’ statistics takes hold. 

    #883
    Henry Tran
    Member

    I had originally planned to generalize to the future based on past data, but as you mentioned if I can access population data then there is no need for me to bother with inferential statistics at all. That is what I am working on at the moment. Thanks. 

Viewing 5 posts - 1 through 5 (of 5 total)
  • You must be logged in to reply to this topic.