InfluentialPoints.com
Biology, images, analysis, design...
Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

 

 

simple statistics formulae for beginners

On this page: average,  arithmetic mean,  median,  proportion,  quantile,  standard deviation,  Download R  R is Free, very powerful, and does the boring calculations & graphs for scientists.

Note

  • There are any number of ways of obtaining almost any statistic you can name.

  • Which method is best will depend upon your circumstances.

  • Most basic statistics formulae assume you are using a calculator, and working from the original data.

  • In practice, such formulae are a waste of time, given the ready availability of statistical computer programms.

  • So the formulae below attempt to provide some insight as to each statistic's reasoning and assumptions.

  • For simplicity, these formulae make no assumptions as to how the data were obtained.


average, see beginners statistics: averages 

y(i=random)

provides an estimate of the average
where
y is a list of values
assuming the values may be numeric or non-numeric
y(i) is the ith value of y
assuming y contains n values, and i is a whole number from one to n
random is a randomly selected integer (from i=1 to n) where every value of i is equally likely to be selected, and the outcome of that selection cannot be predicted in advance


arithmetic mean, see beginners statistics: mean 

sum(y/n)

gives the mean of y
where
sum(y/n) is the sum of y/n, or the (sum of y)/n, or the sum of y(i)/n
assuming y is a list containing n numbers
y(i) is the ith value of y
assuming i is a whole number from one to n, and y is summed for all values of y


median, see beginners statistics: median 

y[mean(r)]

gives the median
where
y is a list of n rankable values
assuming, when y is non-numeric, that ranking does require criteria external to y.
y[r] is the rth ranked value of y
assuming there are n ranks, in the range 1 to n
mean(r) is the arithmetic mean of the n ranks


proportion, see beginners statistics: proportion 

f/n

gives the proportion
where
y is a list of n values
assuming y may be numeric or non numeric, and either equal A, or do not equal A
y(i) is the ith value of y
assuming y contains n values, and i is a whole number from one to n
f is f(y(i)=A) the sum of the number of items whose value equals A the value of A may be any desired value for comparison
note
when y can only equal 1 or 0, then f(y(i)=A) is the sum of y, so the proportion is simply the mean of y.


quantile, see beginners statistics: quantile 

pn

gives the rank of the pth quantile
where r is the rank
yr is the rth ranking value of a list (y) containing n, different, rankable items
assuming n is extremely large (approaching infinite)
p is the proportion of ranks below r,
when r is a fraction you interpolate, or choose the best value.


standard deviation, see beginners statistics: standard deviations 

squareroot(variance)

gives the 'population' standard deviation
where the variance is the mean squared-error, sum(e2)/n, sometimes described as the 'population variance'
assuming each value of e is the difference between the value of y, and the mean of y
y contains n numbers