18th November 2013 at 2:01 pm #1355Maria JeppesenMember
I could really use some guidance on how to analyze some of my research data. I have spent time searching both books, web pages and forums about statistics, but I have had no luck with finding examples that correlates with mine…
My dataset consist of data from cell culture experiments: I have established primary cultures of cancer cells from 8 different tumors, each tumor sample coming from a different patient. The cells are grown in a 3D culture system, where the cells grow as spheres of cells adhering to each other. The purpose of the experiment was to record how much these cell spheres grow over a period of 7 days. When setting up the experiment, I sorted the cell spheres from each patient according to size by filtering in order to see if the baseline size had any influence on growth rate in the experiment. For each patient 3 wells were seeded each containing approx. 15-20 cell spheroids of a determined size (small, medium and large). Microscope images were obtained of each well right after seeding the cells (baseline sphere size) and each day for the next seven days. The spheres are embedded in a gel material, keeping them in place, which makes it possible to follow the growth of the individual sphere. The area of each individual cell sphere was measured at baseline and for the following 7 days.
So basically I have the following data for each of the 8 patients:
Small spheres: N = 10-15, size measured for 8 time points for each N
Medium spheres: N = 10-15, size measured for 8 time points for each N
Large spheres: N = 10-15, size measured for 8 time points for each N
My supervisor has suggested me to use a three-way MIXED ANOVA for doing the analysis with sphere area as the dependent variable and time, sphere size and patient no as the independent variables. In this case, the repeated measure time would of course be a within-subject factor, but I have problems with defining what the actual “subject” is in my case and therefore also if the other factors are within- or between-subjects factors.
So my first question is what the actual “subject” would be in my case?
I am leaning towards that my “subject” is the individual cell sphere, since the individual sphere is the single entity I have subjected to repeated measures. Furthermore, cancerous tumor consists of a heterogeneous population of cells and each sphere is derived from a limited number of these cells. I therefore expect that cell spheres derived from the same patient grow differently (even though their growth of course is still related to some degree, since they are derived from the same tumor…). I found some examples of repeated measures analysis on data from cell line studies. Here it was argued that you should use the average value of replicated measures for statistical analysis, since these replicates are only included to account for “technical” variation (e.g. precision of pipetting, assay performance and performance of detection platform). So in contrast to my study, you do not expect biological variation across samples. So I guess that my suggestion would be to analyze the data with time as the within-subject factor and both patient no. and sphere size as between-subject factors.
My supervisor on the other hand argues that each of the 8 patients is the “subject”. This of course makes sense from the point of view that all the cancer cells derived from the same patient are related. In that case, I am really in doubt how I should label the independent factors: Time is of course still a within-subject factor, but what about sphere size and patient no.? Should both sphere size and patient no. be set as within-subject factors in order to get all data from the same patient in the one row? Or should patient no. be set as a between-subject factor and only sphere size as a within-subject factor? The result would be that spheres of different sizes from the same patient assigned with the same number (numbers assigned randomly) will be in the same row. Will data for the 3 spheres in the same row then be compared with each other? Spheres assigned the same number do not exhibit a closer relation than spheres from the same patient assigned with another number, so that would be a problem. Or will the whole group labeled patient no. 1, size small (containing data for all 10-15 spheres measured) be compared overall to the other size groups for the same patient?
And my second question would be: Are we basically over-complicating things here?
Several of the statistics guidelines that I have consulted point out that you really should consider if a MIXED ANOVA specifically answers the research questions you have set-up or if other test would be more suitable. And there is also a issue of low power when performing MIXED ANOVAs with a small sample size… We could of course compare the growth of the 3 different sphere sizes for each patient using a repeated measures ANOVA (or even a one-way ANOVA if we just compare the growth of each sphere on day 7 relative to baseline size). And the same analyses can be used for comparing the growth of spheres from different patients, by pooling the data for the different sphere sizes (if no significant difference in growth rate is found for the different sphere sizes). Would that be a more suitable strategy for conducting the analysis? Or do better alternatives exist?
Sorry for the long post, I hope it makes sense! I would really appreciate any comments on these thoughts, since I totally stuck with this…
Maria18th November 2013 at 7:03 pm #1356Dave CollingridgeParticipant
When appropriate, I prefer a multivariate ANOVA (MANOVA) over a mixed design because the interpretation of MANOVA results are often more straight forward. If I understand your design correctly, you have 3 petri dishes for each of 8 patients (24 total dishes). Each patient has a dish for small, medium, and large baseline area cells. You are going to measure each culture over 7 days to determine if initial baseline size influenced growth, and evaluate growth rates over time. A repeated measures MANOVA (RMANOVA) will capture all these measures. It will tell you if there are differences across time when taking all sizes (S,M,L) into consideration (multivariate test), it will tell you which sizes are different across time (univariate test), and it will run post hoc comparisons for all sizes across each time (pairwise comparisons).
The RMANOVA can be run by entering “time” (Baseline, Day1, D2, D3, D4, D5, D6, D7) as the within subjects variable with 8 levels, and entering size (S,M,L) as 3 separate outcome variables. There are no between subjects factors to enter unless you want to include something like gender, and there are no covariates to enter unless you want to include something like BMI and age.
If you use SPSS, here is how it is done: Analyze -> GLM -> repeated measures -> within subjects factor = “time” -> number of levels = 8 -> measure name = “small”, “medium”, “large” -> define -> move the 8 small measures into the first 8 slots in the “within-subjects variables” window, 8 medium size measures into the next 8 slots, and 8 large size measures into the 8 large slots -> model, select full factorial -> select other settings as needed. Make sure you display means for “overall” and “time” in the options tab.
Data entry will have each row representing a single patient and columns representing area measures like small_baseline, medium_baseline, large_baseline, small_day1, medium_day1, large_day1, small_day2, medium_day2, large_day2 etc.
- You must be logged in to reply to this topic.