Why are experiments run in triplicate?

fibonacci · March 20, 2013

I don't understand why many biological experiments in literature are automatically run in triplicate. What's so special about the number 3? Why is it automatically assumed that a study run in triplicate has enough statistical power to make the observations from that study meaningful? Don't you have to run power of analysis first before determining proper sample size? Why is "triplicate" a mindless automatic default for number of times to do an experiment? I even see papers that get published in Nature and Science with experiments "done in triplicate". I don't understand why so much of the literature out there talks about the need to do power of analysis for proper sample size (in fields like clinical medicine etc.), yet many other branches of science can simply get away with the de facto default of "triplicate" and conclude that results from such an experiment are meaningful. Why doesn't all science require power of analysis to design an experiment?

Eigen · March 20, 2013

Think you're confusing different types of error here.

Usually, you run something in triplicate not to get a large enough sample size (which would be the number of samples you run), but to rule out experimental bias or some random error.

3 is nice, because if you have an outlier, you can see that two agree and one's out in the cold.

If you're talking about cell culture work, then you have to start with a huge sample size- thousands to millions of cells.

fibonacci · March 20, 2013

Think you're confusing different types of error here.

Usually, you run something in triplicate not to get a large enough sample size (which would be the number of samples you run), but to rule out experimental bias or some random error.

3 is nice, because if you have an outlier, you can see that two agree and one's out in the cold.

If you're talking about cell culture work, then you have to start with a huge sample size- thousands to millions of cells.

Right, experiments are done in triplicate so that statistical tests that generate P values can then be used to show "statistical significance". However, if your study is underpowered statistically, you run into all sorts of problems with type II errors and over estimating the mean effect size from an experiment.

I'm under the impression that in cell culture work, each cell doesn't represent an independent measurement. Anytime you obtain a sample and assay your cells, the sample as a whole only counts as one measurement (technical replicates do not count of course).

Am I confusing something here? I don't understand why many branches of science require a priori power of analysis while other fields can get away with simply doing triplicates.

Edited March 20, 2013 by fibonacci

Eigen · March 20, 2013

You're being very general here. I'm going to suggest you give some more specific examples of what you're talking about, as I'm not really sure what examples you're using of "they just did it in triplicate".

Also, data is data.

If it's not strong enough, that's for the reader to interpret as well. Publication by itself just means it's potentially useful, most of the time.

Also worth noting that I'm from a branch of "science" that doesn't do statistical analysis of our data at all.

fibonacci · March 20, 2013

You're being very general here. I'm going to suggest you give some more specific examples of what you're talking about, as I'm not really sure what examples you're using of "they just did it in triplicate".

Also, data is data.

If it's not strong enough, that's for the reader to interpret as well. Publication by itself just means it's potentially useful, most of the time.

Also worth noting that I'm from a branch of "science" that doesn't do statistical analysis of our data at all.

Ok, for a very easy example:

Let's say I want to test drug X on cancer cells to test for toxicity. I test 0, 50, and 100 uM of the drug on the cells and then count them to determine toxicity. The typical way to complete this experiment would be to repeat this experiment 2 more times and then run ANOVA or some other statistical test to determine statistical significance between concentration and cell count. This would be acceptable for many journals.

What I'm hung up on, is why is n=3 by far and away accepted as a default number of times to run an experiment like this? Shouldn't one do a power of analysis to determine how much of a sample size you'd need to perform the experiment and have enough data points to run a proper ANOVA?

However, if I were to test drug X in mice at different concentrations, then it would probably be absolutely required by an institutional board to conduct a power of analysis to determine how many mice I'd need so that I'm not unnecessarily killing too many mice or to determine if I'm not using enough mice which would result in my data being worthless.

My question I suppose is, what makes cells different than the mice?

Also, one other thing to mention--is I don't think 'data is just data' and that it should be up to the reader to determine if it is useful or not. The problem also with underpowered studies is that it can propagate type II errors, once a study is published with a type II error and it is repeated in literature, it can gain a foothold and be established as a scientific 'fact' when in reality, the results from all of the underpowered studies that replicated the original results are wrong because they mistakenly made a type II error due to lack of power.

Edited March 20, 2013 by fibonacci

March 20, 2013

Convention & rule of thumb often come into play.

Similar rule of thumb comes in when asked what is a large, medium and small effect size.

That said, it really does depend on the experiment and question being asked.

There are possible studies where an n of 1 might suffice and others where an n of 100 might not. Other times your n is a small number due to other limiting factors, of course.

Eigen · March 20, 2013

The number.

You're working on millions of samples at once, so individual variations are assumed to be minimized, which means you're primarily doing multiple runs to rule out some experimental error, rather than to separate populations.

In a typical cell culture plate, your n is millions. They may be the same type of cell, but they're growing individually.

Counting them together would be like considering mice housed together to be only one sample.

fibonacci · March 20, 2013

The number.

You're working on millions of samples at once, so individual variations are assumed to be minimized, which means you're primarily doing multiple runs to rule out some experimental error, rather than to separate populations.

In a typical cell culture plate, your n is millions. They may be the same type of cell, but they're growing individually.

Counting them together would be like considering mice housed together to be only one sample.

I don't think each cell counts as an individual "n". See:

http://labstats.net/articles/cell_culture_n.html

N should be the number of times your run your experiment independently. For example, let's day I do the tox test above on the cells day 1 with technical triplicates, repeat again on day 2 w/ technical replicates, and repeat again on day 3 with technical replicates. My n is still only 3 even if each sample I tested on those days contained millions of cells..

I've never seen a literature example where they run an ANOVA on a n=millions of cells.

Cell culture cells all come from the same cell line and are all tested at the same time. Mice, however, are separate biological entities that are not the same, and can be tested independently. Testing all cells at the same time means they aren't independent because the same test was done on them all at the same time. That's on top of the fact they're all from the same culture.

Again, in order to properly rule out error, you need to know how many times to run a test, which is "n". That's what power of analysis is for. So if that's what POA is for, then why is triplicate automatically assumed to be acceptable?

Edited March 20, 2013 by fibonacci

fibonacci · March 20, 2013

Convention & rule of thumb often come into play.

Similar rule of thumb comes in when asked what is a large, medium and small effect size.

That said, it really does depend on the experiment and question being asked.

There are possible studies where an n of 1 might suffice and others where an n of 100 might not. Other times your n is a small number due to other limiting factors, of course.

Good points. I agree, magnitude of effect and biological importance should be the most important concept to analyze for a biologist, but too often journals won't accept a manuscript if they don't see "statistical significance" , which in the end is meaningless without effect size. If one knows what statistical test they want to run on the data that they are going to obtain, I was under the impression POA should be used to determine how many times to run the experiment. Why is it then so many journals accept n=3 without justification for 3? I don't understand the rule of thumb I guess if there's simply so much other literature out there describing the importance of enough statistical power in fields like psychology, clinical medicine, drug trials, etc.

Eigen · March 20, 2013

So, again, I'd say you're conflating sources of error. Data is only as useful as the generalizations you draw from it:

As the page you linked mentions, all cells are from the same population. It's considered to be a relatively homogenous population. But there are a lot of individuals of that population, so a lot of replication to rule out some individual variation, should it crop up.

And when you write results, you're not trying to generalize to human cells as a whole- if you were, you'd need a lot more statistical rigor, if it were even possible. You're stating results from a specific cell line, or several specific cell lines.

That's different than the other fields you're describing, where results are (hopefully) generalized to a much larger population- "mice", "humans", "lung cancer", etc.

You seem to get what POA analysis tells you, abstractly, but I think you're placing too much importance on it. It's necessary for drawing specific conclusions, specifically it tells you how generalizable your results are to a whole population from your samples. In the examples you're citing of people not doing that, they aren't trying to generalized their results to a whole population, and so it's not really applicable.

So a triplicate experiment (or duplicate, if you're short on material or space, or 4 or 5 replicates if you have room) is considered a good rule of thumb to rule out experimental errors. It doesn't worry about systematic errors as much, nor does it try to take enough replicates that the results are generalizable to a whole population.

Also, to add, at least in my field, you'd never want to directly compare data taken on different days. There's too much of a chance for other factors (climate, temperature, power surges, instrumental variations) to have an effect on your result. You compare data between samples run together, so you minimize the possible variables, or such that the primary variables in your experiment are those you can control (ie, your drug).

Eigen · March 20, 2013

Also wanted to mention:

Don't forget to remember the importance of magnitude of effect. Most biological studies I see are looking for huge changes in effects- you aren't looking to kill a few % of cells, you're looking to kill most of them, or all of them.

As the magnitude of the effect increases (to the point where a "casual observer" could see the difference") the probability of error decreases, and it counterbalances the effects of a small sample population.

Sign In

Why are experiments run in triplicate?

Recommended Posts

fibonacci

Eigen

fibonacci

Eigen

fibonacci

Guest |||

Eigen

fibonacci

fibonacci

Eigen

Eigen

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Search

Results

Important Information