Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Data verificationOn this page: Definition At the data gathering stage At the data entry stage At the data analysis stage
What is data verification?
The purpose of data verification is to ensure that data that are gathered are as accurate as possible, and to minimize human and instrument errors - including those which arise during data
Be aware! Some authorities use the term "data validation" and "data verification" much more narrowly. Data validation is taken to refer to an automatic computer check that the data is sensible and reasonable, and "data verification" to refer to a check to ensure that data entered exactly matches the original source. Under these definitions neither term refers to
The lack of agreed terms may explain why there is so little interest in these two vitally important aspects of data analysis!
At the data gathering stage
At the data gathering stage it is probably best to make as few assumptions as possible about the accuracy of your equipment, or for that matter the human beings taking the readings. Common problems include mislabelling of samples, poor storage and transport of samples, and erroneous counts because of miscalibration and instrument error.
Observer bias is also common - one example is a carry-over effect where (for example) a set of samples containing high counts of eggs in faecal smears tend to be followed by excessively high counts even when numbers are low. Another example is a bias towards even numbers especially if one is estimating a reading half way between marked positions on the scale. This is sometimes termed digit preference bias. However, observer bias can take many forms - often quite unexpected! Only by appropriate checking can you be certain that the data are as accurate as possible. Familiarity with the type of data you are gathering, and the common errors, are both essential.
Data gathering using a questionnaire is especially liable to inaccuracies. Many errors and biases are introduced when a questionnaire is translated to another language - the only way to avoid this is to get someone (independent) to backtranslate the (translated) questionnaire and compare the two questionnaires. The other big problem if the questionnaire is given verbally is interviewer bias. Someone who has done hundreds (or thousands) of questionnaires will expect particular answers to certain questions, and will often stop listening (or even not ask the question) and just insert the expected (or desired) answer. This can only be detected if a sample of interviewees is re-interviewed shortly afterwards by independent interviewers. We consider questionnaire design and implementation in more depth in
At the data entry stage
At the data entry stage, a number of data checking packages are available. These commonly check that data are in a specified format (format check), that they lie within a user-specified range of values (range check) and (sometimes) that they are consistent - for example, that there is no milk yield for male cattle! They cannot tell you if some data have been missed out, nor can they detect errors within the accepted range. These can only be eliminated by a visual check (that is proof-reading) or (better) by using double data entry. With this method two data entry operators enter the data independently, and the two data files are compared using a computer programme. Even this method may not detect errors arising from misreading of carelessly written numbers (for example 6 and 0).
At the data analysis stage