    I am currently writing a paper analysing newspapers articles and to do so I am using a combination of text mining analysis and more ‘classic’ qualitative analysis.

    What is your experience on this mixed approach?

    Thank you in advance


    There are few promising software that can do automatic text analysis (e.g. leximancer, Tlab), but i am concerned about the validity and reliability issues. i am using QDA Miner and Wordstat for an analysis of newspaper articles–this software provides greater flexibility and many ways to cross-validate findings–but I guess validity and reliability will always an critical issue. Also I find developing categorization dictionaries something of a subjective process. It might be a good idea to gather best practices and tips for developing keyword dictionaries that are contextual yet standardized
    Looking forward to further comments
    Thank you



    Me an my company Provalis Research (makers of QDA Miner and WordStat) have been promoting combination of text mining and qualitative analysis (along with quantitative analysis) for quite some times and we strongly believe that qualitative researchers could profit in many ways from using text mining and quantitative content analysis techniques. Those who use text mining also often need to perform more in depth analysis of specific cases, and thus could profit from CAQDAS tools. I am preparing a paper where I will identify various ways and the benefits of combining different approaches to text analysis.

    I would be interested to learn more about how you plan to combine those and what benefits you see in such a combination. Do you use text mining as a preliminary exploration tool, as a method of triangulation, as a way to increase the coding reliability, a way to automate coding? How would you characterize your attempt using a variation of Morse’s notation system? QUAL+TM ? TM -> QUAL ? QUAL + tm ? TM -> qual?


    Hi Normand,
    thanks for your reply. I have used text mining is a QUAL+ TM framework. I used as method of triangulation and a coding reliability tool. In particular I used as another source of information about emerging themes in mass media coverage, combining cluster analysis and traditional thematic analysis.
    The paper that I mentioned has been submitted to Public Understanding of Science Journal and I am waiting for feedback,

    I know QDA miner, although I am afraid I used another software to carry out my research.

    More interestingly now, I am working on something different, still a text mining technique, rather simple really but missing (as far as I know) that might interest you (it might be even a potential feature in software development). Let’s talk it a bit more confidentially (Giuseppe.Veltri@ec,europa,eu)


    I Use QDA miner/Wordstat for tracking online news-issues. I had many difficulties preparing the data (cleaning,formatting) before importing into qdamws (mostly glitches),utilities for these tasks will be useful.
    The following steps:
    1) I read a selection of news articles on a particular issue closely- and prepare a list of keywords and phrases that I find interesting/useful- similar to manual qualitative coding process
    2) I collect and clean data- in a format for importing (I use several texts-cleaning software to convert the unstructured web page news articles in a format qda can read (with wizard)
    3) I Create a code book in QDA Miner based on the theory that I use and include codes from the first reading–they are the basic modes
    4) I run wordstat, initially with the default setting, create categorization dictionary

    5) explore the keyword list to develop a dictionary-IF KEYWORD / PHRASE MATCHES THE PREDEFINED CODE,
    THEN I USE THE BASIC CODE, ELSE I CREATE NEW CATEGORY/WORD LIST. I find The Feature Extraction in wordstat very useful function for categorization-the process takes a long time–critical decisions are made here (e.g.. should i group mentions of “government of India” and “government?)

    6) If in doubt, I leave it as a separate word/phrase store it in a new Unsorted category

    7) I review my categorization dictionary, clean it

    8) I begin with cluster analysis I sometime find interesting grouping of word list– I create cluster based category if relevant-Tweaking Dendograms get me many new Categories

    All these takes me a lot of time and only when I am satisfied with the final categorization dictionary I would proceed to do other types of analysis, validation etc.(the nagging question here is mostly-Have I left any import words out? Have I included relevant words that can distort the concept being measured

    I quess we could modelled like QR=>TM=> QR

    Qualitative Data=>Qualitative Coding=TM Coding=TM-Analysis=Qualitative Analysis
    Thank you


    I’ve learned a lot from this thread and from exchanging emails with Normand.

    Do any of you use POS taggers in your research? If so, which ones would you recommend? How do you typically use them?



    Do you have any experience using T-Lab?

