Cluster Analysis

Home Forums Default Forum Cluster Analysis

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
  • #3735
    Shyh-Mee Tan

    Dear all,


    Please share with me how to do cluster analysis where I can categorize the respondents into groups according to their qualifications and ability. Thanks….

    Jeremy Miles

    You need to tell us what software you have access to before we can answer that.  

    There are several good books on cluster analysis, and most multivariate analysis books cover it.  


    However, you should also be aware that cluster analysis is somewhat frowned upon these days.

    Shyh-Mee Tan


    I am using IBM SPSS 18;  Yes I have found a few books about it but just need some sharing ‘outside’ the reference books. Care to share any researches / statements about ‘cluster analysis is somewhat frowned upon’ ? 


    Appreciate your sharing.


    shyh mee

    Sunny Bose


    Personally I don’t know why Jeremy said thst cluster analysis is frowned upon but I can tell you that it is a very good technique to categorise or segment respondents/ objects based on their proximity of responsens. The major problem (I belive that’s the reason behing Jeremy’s opinion) is that it is too a large extent dependent on the researchers judgement. I am attaching a file …. this might help you.

    It would be better if you can let us know the objective behind running the custer analysis, then we might be able to help you better.


    Dear all,


    If I may chip in: I think there is a perceived problem with most contemporary cluster analytic methods which is, essentially, the lack of robust ‘validation’ techniques: that is to say, “How do I know that the cluster model I am fitting is the best one”?

    When you don’t know how many clusters you are seeking to find (which I am assuming you don’t), most cluster analyses start with either a hierarchical agglomerative (e.g. Ward’s method) or divisive technique. But both these methods are sensitive to variations in the ordering of your cases (e.g. if I ordered by ‘name’ or by ‘qualifications’ I would get different results) which means that you will often get what are called ‘locally optimised solutions’: i.e. another researcher using your data would get a different result. For instance, two years ago I was horrified to find that SPSS 16.0 and R 2.10 use subtly different start points for hierarchical analyses: the exact same dataset ordered identically will therefore give you completely different results. This does not generally make for good science and leads to a great deal of subjectivity in data interpretation.

    A lot of recent work has therefore been in trying to produce ‘global optima’: i.e., methods that will produce the same result on the same data with on any computer. There are two important innovations here that not all textbooks will recount:

    i) Bayesian or Model-based clustering (e.g. using the ‘mclust’ package in R/S Plus by Chris Fraley and Adam Raferty): these methods use the calculation of a Bayesian statistic (BIC) to give a ‘best fit’ for a range of cluster solutions not just of numbers but also of intra and inter-cluster dispersion (are your clusters round? Ellipsoid? Variable? etc). This is a great ‘turnkey’ solution and I have seen it published in epidemiology journals, although I have noticed it tends to get very eccentric if you are clustering with a lot of categorical variables: in this case you are probably better off using Latent Class Analysis.

    ii) ‘Optimisation’ of local optima using multiple (1000000+) iterations of k-means clustering techniques: Doug Steinley at UMC has published widely on this and advocates measures such as Silhouette width or the Adjusted Rand Index to validate cluster solutions. See for example Steinley, D (2006) Profiling Local Optima in K-Means Clustering. Psychological Methods 11(2): 178-192.

    So my recommendation would be to give ‘mclust’ a try if you have access to R (if not, get it! it’s free) or to speak to a local statistician and see what he/she would advocate.

    Good luck!





    Shyh-Mee Tan

    The main purpose is to find out the readiness of the respondents to the implementation of information literacy. i am trying to group or categorize the respondents into groups (maybe 3) based on their qualifications and self-estimated skills to see which group is ‘ready’ to the information literacy implementation. Therefore, I hope to use cluster analysis for this purpose. Is it possible?


    Thanks for the notes. Do you have the reference?

Viewing 6 posts - 1 through 6 (of 6 total)
  • The forum ‘Default Forum’ is closed to new topics and replies.