Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Intentional bias and scientific fraud

Detecting scientific fraud

Effective detection of fraud is a very important component of prevention. There are three main ways by which fraud can be detected - by examination of the data produced, by collaborators 'blowing the whistle' on the perpetrators, and by regular data audits. Curiously enough, very little scientific fraud is detected by simple inspection of published work. This is because, in the absence of corroboration, even the most dubious results cannot be distinguished from simple incompetence.

  1. Examination of data

    The same data validation procedures used to detect unintentional errors in data can sometimes be used to detect fraudulent data, especially if the fault is caused by a technician at the data gathering stage. Data checks should be instituted whenever possible, and data should be checked as soon as possible after it is collected. Some of the variability in data may result from technicians 'covering up' problems in the data collection process. You should therefore ensure that any problems encountered are always reported.

    The frequency distribution of the data should be compared with the distribution of a 'clean' set of data. It is sometimes said that the data should always follow a symmetrical bell-shaped distribution. This is definitely not the case, and the expected distribution will depend on the type of data gathered. However, you should watch out for truncated data distributions - that is where there are no unusually large or small values. People fabricating data often concentrate values excessively around the mean, reducing the range and other measures of variability of the data.

    It has been suggested that fraud can be detected in a randomized clinical trial by comparing means and variances of baseline data between groups on the basis that data distributions should be similar in all but about one in 20 cases (if the type I error rate of ones statistical tests is set at P = 0.05. ) More frequent differences would indicate fraudulent data. This certainly may be the case, but it could also indicate a failure of the randomization process which can happen for a variety of reasons. Groups can also be compared at baseline for any evidence of digit preference.

    Another way of internal data checking is to look for known relationships within the data - in other words, ask yourself is this information internally consistent? If for example two of the variables being measured are correlated, any sudden change or loss of the relationship is suspicious.

    A number of frauds have been detected simply because there was insufficient time for the data to have been gathered. Lock (1995) gives a good example of this with the case of Malcolm Pearce, a British gynaecologist found to have falsified data. He claimed to have collected 191 women, over three years for a trial, with a syndrome so uncommon that a major referral centre was only seeing one or two new cases a month.


  2. Whistleblowers

    Fraud is most often first suspected by a scientist working in the same institute, as a result of knowledge on how (or even whether) the research was actually carried out. Many of the more infamous cases of fraud have come to light in this way. The dilemma facing a scientist who discovers fraudulent practices by a colleague is whether or not to 'blow the whistle'.

    The problem is that the consequences for whistleblowers can be very much worse than those for the guilty party. An extreme example of this occurred in Germany where a young veterinary scientist stripped of his PhD was charged with trying to kill his whistle-blower by spiking his tea with digitoxin. More commonly the whistleblower is dismissed or not promoted, especially if the scientist committing fraud is more senior than the whistleblower. This is true even when the misconduct is proven and generally accepted. This has important implications for the progress of science, and one's career within it. Smith (1998) reports on legislation introduced in the late 1990s to provide protection for scientists who reveal fraudulent practices, but Gooderham (2009) suggests that further measures are needed.

    Many cases are not at all clear cut. The Baltimore case is an excellent example. A postdoc, Margot O'Toole, first 'blew the whistle' on data she claimed had been falsified by the immunologist Thereza Imanishi-Kari and published by Nobel laureate David Baltimore. A number of investigations supported the allegations leading to retraction of the paper and the resignation of Baltimore as president of Rockefeller University. Subsequently, however, Kevles (1998) reported that the National Institutes of Health Appeals Board has rejected all the charges that had been laid against Imanishi-Kari and Baltimore.


  3. Data audits

    Data audits are investigations carried out, usually by donor agencies on selected studies, to determine the quality of the research - and to ensure that there have been no fraudulent practices. They have been carried out by the Food and Drug Administration (FDA) in the U.S.A. for the last ten years or so, following an increase in the number of reported cases of fraud. Serious deficiencies in research work were noted in 12% of audits before 1985, although by 1989 it had dropped to about 7%. (Shapiro & Charrow, 1989)

    In the case of clinical trials, checking up on patients who were named as having taken part in clinical trials has proved to be an effective method of exposing fraud. This highlights the importance of ensuring that such information is kept, and can be made available to any interested party. This option is of course not available for veterinary studies, and may partly explain the paucity of fraud cases in veterinary as compared to medical research.



Reducing scientific fraud

Fraud, like any other crime, is common where it profits the fraudster, is easiest to commit, is unlikely to be detected, and where the penalties are least. We will never eliminate scientific fraud completely, but the following measures would help to reduce it:

  • Acceptance by all the players, including government, that the problem exists, is serious, and must be tackled.

  • Support for an independent research culture, with accepted ethical principles for all scientific research - not just for medical research.

  • Detection mechanisms in place, so that there is a high probability of detecting fraud.

  • Adequate protection for whistle blowers.

  • Severe penalties for those found guilty of falsifying data.

  • More methodological detail given in journals - with improved reporting of both the methods used, and the statistical analysis carried out.

In the real world, unfortunately, very few of these are likely in the foreseeable future. Given which, if you do not wish to be defrauded, it is wise to assume that whatever you read may be subject to some form of intentional bias - if only to the extent that it is much easier to publish material that conforms to prevailing opinion and expectations. In particular, do remember that a number of people make a career out of intentionally biasing information - although they call it 'influencing public opinion' - and the interests they serve are not to be underestimated.

For individual scientists reading the work of other scientists, Montori et al. (2005) recently gave some useful (if somewhat controversial) advice - namely to only read the methods and results sections of a paper and bypass the discussion section. This was to avoid being influenced by the author's misleading conclusions from the work. His views, not surprisingly, led to protests from several contributors to the journal. The best of these was from Penston (2005) who felt that discussions should be read - so that misleading claims could be identified and broadcast loudly, because they signal doubts about the entire study.

Perhaps the best advice we can give is 'Be suspicious always' - when you are reading the literature, when you get data from technicians, and when you co-author work with other scientists. Random errors and mistakes are relatively easy to allow for, intentional bias is not!