InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

simple statistics formulae for beginners

average,  arithmetic mean,  median,  proportion,  quantile,  standard deviation,  Download R  R is Free, very powerful, and does the boring calculations & graphs for scientists.

Note

  • There are any number of ways of obtaining almost any statistic you can name.

  • Which method is best will depend upon your circumstances.

  • Most basic statistics formulae assume you are using a calculator, and working from the original data.

  • In practice, such formulae are a waste of time, given the ready availability of statistical computer programms.

  • So the formulae below attempt to provide some insight as to each statistic's reasoning and assumptions.

  • For simplicity, these formulae make no assumptions as to how the data were obtained.


average, see beginners statistics: averages 

y(i=random)

provides an estimate of the average
where
y is a list of values
assuming the values may be numeric or non-numeric
y(i) is the ith value of y
assuming y contains n values, and i is a whole number from one to n
random is a randomly selected integer (from i=1 to n) where every value of i is equally likely to be selected, and the outcome of that selection cannot be predicted in advance


arithmetic mean, see beginners statistics: mean 

sum(y/n)

gives the mean of y
where
sum(y/n) is the sum of y/n, or the (sum of y)/n, or the sum of y(i)/n
assuming y is a list containing n numbers
y(i) is the ith value of y
assuming i is a whole number from one to n, and y is summed for all values of y


median, see beginners statistics: median 

y[mean(r)]

gives the median
where
y is a list of n rankable values
assuming, when y is non-numeric, that ranking does require criteria external to y.
y[r] is the rth ranked value of y
assuming there are n ranks, in the range 1 to n
mean(r) is the arithmetic mean of the n ranks


proportion, see beginners statistics: proportion 

f/n

gives the proportion
where
y is a list of n values
assuming y may be numeric or non numeric, and either equal A, or do not equal A
y(i) is the ith value of y
assuming y contains n values, and i is a whole number from one to n
f is f(y(i)=A) the sum of the number of items whose value equals A the value of A may be any desired value for comparison
note
when y can only equal 1 or 0, then f(y(i)=A) is the sum of y, so the proportion is simply the mean of y.


quantile, see beginners statistics: quantile 

pn

gives the rank of the pth quantile
where r is the rank
yr is the rth ranking value of a list (y) containing n, different, rankable items
assuming n is extremely large (approaching infinite)
p is the proportion of ranks below r,
when r is a fraction you interpolate, or choose the best value.


standard deviation, see beginners statistics: standard deviations 

squareroot(variance)

gives the 'population' standard deviation
where the variance is the mean squared-error, sum(e2)/n, sometimes described as the 'population variance'
assuming each value of e is the difference between the value of y, and the mean of y
y contains n numbers