Is this ethical and/or commonly done?

qat · September 11, 2014

I did the same experiment on 60 different datasets. A few of them worked, most of them were meh. I'm giving a talk about my work, and my supervisor wanted me to make a slide about the one that worked, even though I had to dig through like 20 experiments to find it. I didn't want to and she was cool with that, but I was wondering, do most scientists consider this ethical?

I was also going to put another chart in my talk, and interpret it a certain way (the interpretation was very nice, and supported our hypothesis). But the day before the talk I realized there was another way to present the data, that lended itself to the exact opposite interpretation. My supervisor wanted me to tell the first story anyway and I'm wondering if she is unscrupulous or if people in experimental fields just generally do this.

PsychGirl1 · September 11, 2014

I did the same experiment on 60 different datasets. A few of them worked, most of them were meh. I'm giving a talk about my work, and my supervisor wanted me to make a slide about the one that worked, even though I had to dig through like 20 experiments to find it. I didn't want to and she was cool with that, but I was wondering, do most scientists consider this ethical?

I was also going to put another chart in my talk, and interpret it a certain way (the interpretation was very nice, and supported our hypothesis). But the day before the talk I realized there was another way to present the data, that lended itself to the exact opposite interpretation. My supervisor wanted me to tell the first story anyway and I'm wondering if she is unscrupulous or if people in experimental fields just generally do this.

What field are you in? I'll be answering from a psych perspective.

In both scenarios, I can picture times when it would be ethical and when it would not. For the first one, for example, if you had reasons by certain ones worked and didn't (ex. sampling, procedure, power issues), then I could see presenting the data from a few that worked with the caveat that it was not replicated in all samples (or whatever). But in general, you wouldn't want to run it across the 20 studies to begin with- you could just run it all together with study as a covariate or do a meta-analysis across the 20 studies.

For the latter, I'm not sure how just putting data on a graph differently could dramatically change the interpretation. But that being said, people generally present their data, give THEIR interpretation of it, and then talk about some other possible interpretations and what that means.

So yes, if your adviser is telling you to dig through and data mine tons of data and selectively present and selectively interpret things, then that could be considered inappropriate research strategy. On the other hand, you may just not be understanding how she wants you to approach or discuss things, or she might just not be well-versed in statistics. As an advisee, I think it is totally appropriate to say "I don't want to mine through these 20 datasets looking for something significant- I think it would be better research methodology if I conduct a meta-analysis across the 20 datasets" (or whatever statistical approach). Similarly, you can suggest adding the alternate explanations for how to interpret findings to the discussion section.

Vene · September 11, 2014

That seems kind of off to me. Unless you can justify it with some sort of procedural difference I'm having a hard time being convinced its not due to statistical noise. And I can't think of a scenario where withholding data is ethical.

TakeruK · September 11, 2014

I agree that this is tricky and that it's tough to tell for us, as an outsider, because we likely don't have the full story.

It also depends on the experiment? That is, what is the reason for having 60 different datasets? If all 60 are supposed to return the same result but only a small number did and you only report the small number that agree with your interpretation, then this is likely poor research practice.

However, if there are 60 different datasets that are not likely to return the same result then it might make sense for a short talk to only discuss one result.

Or, if you are not certain that all 60 datasets have been experimented on correctly, or analysed correctly, you can present your one experiment that works as a "proof of concept" that whatever you are trying to show could potentially work. But this means you have to clarify that you are doing 59 other datasets and so far you have not yet replicated this result.

If it matters, I'm in a field where talks are not as important as papers. I would say that if you did something like this in a paper though, it would definitely be very fishy!

bsharpe269 · September 11, 2014

I agree that it really depends on the details. Is there a reason that the others didnt work? Is the result only seen occasionally? Can you replicate it? I can think of plenty of studies where the result in one case would be worth discussing, even if not seen in all. I can think or plenty others where this wouldnt be the case. Discussing in more detail with your PI or is you dont trust your PIs judegement then someone else who understands the reserach more would be helpful.

dat_nerd · September 11, 2014

I can see it being ethical if you fully disclose the overall results, and then talk about the working datasets in the context of why the experiment worked in those dataset, but not in the others. Was there something about those datasets that caused the experiment to work? Likewise, what was it about the other datasets that did not result positively?

Eigen · September 11, 2014

As others have said, I can envision scenarios in which this is perfectly normal, and those in which it's ethically shaky.

To come at it from a lab sciences perspective: If I have a protocol to couple two chemicals together, and I run it on 60 different pairs of chemicals, I'm not likely to report the 59 that didn't work- I would report the 1 that did, and maybe some stand out examples of the others either as controls, or if I learned something significant from them.

Now, on the other hand, if I'm testing a novel cytotoxic drug on different cell lines, all of which it should kill, and it only shows a reduction in viability of 1 out of the 60 I tested, then only showing data for that 1 would be fishy.

So a lot of it depends on what the difference in the data sets is, what the experiment was, and what caused it to work/not work on all of the above.

And if you think there's a valid alternate hypothesis, personally, I'd think of a way to test something that might differentiate between the two possible interpretations.

Usually, the act of research is to figure out something (a framework of ideas, a procedure, a bit of code, a theory) that works. When it comes time to publish, you are primarily sharing what *worked* with the larger community in and out of your field. Sometimes it can be helpful to share examples of things that didn't work, and sometimes it can be helpful to publish that nothing worked (aka Journal of Negative Results), but primarily you're sharing something you discovered that works.

SymmetryOfImperfection · September 21, 2014

Depends. If there was some uncontrollable variation in the sample prep, and the measurements yielded only a few that worked out of many, then this is valid as long as you can quantify the differences between the samples that were not good and the samples that were good. Elsewise, it is bad practice, in my opinion, since you don't know which one is the accident.

Sign In

Is this ethical and/or commonly done?

Recommended Posts

qat

PsychGirl1

Vene

TakeruK

bsharpe269

dat_nerd

Eigen

SymmetryOfImperfection

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Results

Important Information