Biology, images, analysis, design...
Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)



simple statistics formulae for beginners

average,  arithmetic mean,  median,  proportion,  quantile,  standard deviation,  Download R  R is Free, very powerful, and does the boring calculations & graphs for scientists.


  • There are any number of ways of obtaining almost any statistic you can name.

  • Which method is best will depend upon your circumstances.

  • Most basic statistics formulae assume you are using a calculator, and working from the original data.

  • In practice, such formulae are a waste of time, given the ready availability of statistical computer programms.

  • So the formulae below attempt to provide some insight as to each statistic's reasoning and assumptions.

  • For simplicity, these formulae make no assumptions as to how the data were obtained.

average, see beginners statistics: averages 


provides an estimate of the average
y is a list of values
assuming the values may be numeric or non-numeric
y(i) is the ith value of y
assuming y contains n values, and i is a whole number from one to n
random is a randomly selected integer (from i=1 to n) where every value of i is equally likely to be selected, and the outcome of that selection cannot be predicted in advance

arithmetic mean, see beginners statistics: mean 


gives the mean of y
sum(y/n) is the sum of y/n, or the (sum of y)/n, or the sum of y(i)/n
assuming y is a list containing n numbers
y(i) is the ith value of y
assuming i is a whole number from one to n, and y is summed for all values of y

median, see beginners statistics: median 


gives the median
y is a list of n rankable values
assuming, when y is non-numeric, that ranking does require criteria external to y.
y[r] is the rth ranked value of y
assuming there are n ranks, in the range 1 to n
mean(r) is the arithmetic mean of the n ranks

proportion, see beginners statistics: proportion 


gives the proportion
y is a list of n values
assuming y may be numeric or non numeric, and either equal A, or do not equal A
y(i) is the ith value of y
assuming y contains n values, and i is a whole number from one to n
f is f(y(i)=A) the sum of the number of items whose value equals A the value of A may be any desired value for comparison
when y can only equal 1 or 0, then f(y(i)=A) is the sum of y, so the proportion is simply the mean of y.

quantile, see beginners statistics: quantile 


gives the rank of the pth quantile
where r is the rank
yr is the rth ranking value of a list (y) containing n, different, rankable items
assuming n is extremely large (approaching infinite)
p is the proportion of ranks below r,
when r is a fraction you interpolate, or choose the best value.

standard deviation, see beginners statistics: standard deviations 


gives the 'population' standard deviation
where the variance is the mean squared-error, sum(e2)/n, sometimes described as the 'population variance'
assuming each value of e is the difference between the value of y, and the mean of y
y contains n numbers