 InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

### Note

• There are any number of ways of obtaining almost any statistic you can name.

• Which method is best will depend upon your circumstances.

• Most basic statistics formulae assume you are using a calculator, and working from the original data.

• In practice, such formulae are a waste of time, given the ready availability of statistical computer programms.

• So the formulae below attempt to provide some insight as to each statistic's reasoning and assumptions.

• For simplicity, these formulae make no assumptions as to how the data were obtained.

### average, see beginners statistics: averages y(i=random)

provides an estimate of the average

where
y is a list of values
assuming the values may be numeric or non-numeric
y(i) is the ith value of y
assuming y contains n values, and i is a whole number from one to n
random is a randomly selected integer (from i=1 to n) where every value of i is equally likely to be selected, and the outcome of that selection cannot be predicted in advance

### arithmetic mean, see beginners statistics: mean sum(y/n)

gives the mean of y
where
sum(y/n) is the sum of y/n, or the (sum of y)/n, or the sum of y(i)/n
assuming y is a list containing n numbers
y(i) is the ith value of y
assuming i is a whole number from one to n, and y is summed for all values of y

### median, see beginners statistics: median y[mean(r)]

gives the median
where
y is a list of n rankable values
assuming, when y is non-numeric, that ranking does require criteria external to y.
y[r] is the rth ranked value of y
assuming there are n ranks, in the range 1 to n
mean(r) is the arithmetic mean of the n ranks

### proportion, see beginners statistics: proportion f/n

gives the proportion
where
y is a list of n values
assuming y may be numeric or non numeric, and either equal A, or do not equal A
y(i) is the ith value of y
assuming y contains n values, and i is a whole number from one to n
f is f(y(i)=A) the sum of the number of items whose value equals A the value of A may be any desired value for comparison
note
when y can only equal 1 or 0, then f(y(i)=A) is the sum of y, so the proportion is simply the mean of y.

### quantile, see beginners statistics: quantile pn

gives the rank of the pth quantile
where r is the rank
yr is the rth ranking value of a list (y) containing n, different, rankable items
assuming n is extremely large (approaching infinite)
p is the proportion of ranks below r,
when r is a fraction you interpolate, or choose the best value.

### standard deviation, see beginners statistics: standard deviations squareroot(variance)

gives the 'population' standard deviation
where the variance is the mean squared-error, sum(e2)/n, sometimes described as the 'population variance'
assuming each value of e is the difference between the value of y, and the mean of y
y contains n numbers