| |
Note
- There are any number of ways of obtaining almost any statistic you can name.
- Which method is best will depend upon your circumstances.
- Most basic statistics formulae assume you are using a calculator, and working from the original data.
- In practice, such formulae are a waste of time, given the ready availability of statistical computer programms.
- So the formulae below attempt to provide some insight as to each statistic's reasoning and assumptions.
- For simplicity, these formulae make no assumptions as to how the data were obtained.
y(i=random)
provides an estimate of the average
- where
- y is a list of values
- assuming the values may be numeric or non-numeric
- y(i) is the ith value of y
- assuming y contains n values, and i is a whole number from one to n
- random is a randomly selected integer (from i=1 to n) where every value of i is equally likely to be selected, and the outcome of that selection cannot be predicted in advance
sum(y/n)
gives the mean of y
- where
- sum(y/n) is the sum of y/n, or the (sum of y)/n, or the sum of y(i)/n
- assuming y is a list containing n numbers
- y(i) is the ith value of y
- assuming i is a whole number from one to n, and y is summed for all values of y
y[mean(r)]
gives the median
- where
- y is a list of n rankable values
- assuming, when y is non-numeric, that ranking does require criteria external to y.
- y[r] is the rth ranked value of y
- assuming there are n ranks, in the range 1 to n
- mean(r) is the arithmetic mean of the n ranks
f/n
gives the proportion
- where
- y is a list of n values
- assuming y may be numeric or non numeric, and either equal A, or do not equal A
- y(i) is the ith value of y
- assuming y contains n values, and i is a whole number from one to n
- f is f(y(i)=A) the sum of the number of items whose value equals A
the value of A may be any desired value for comparison
- note
when y can only equal 1 or 0, then f(y(i)=A) is the sum of y, so the proportion is simply the mean of y.
-
pn
gives the rank of the pth quantile
- where r is the rank
- yr is the rth ranking value of a list (y) containing n, different, rankable items
- assuming n is extremely large (approaching infinite)
- p is the proportion of ranks below r,
- when r is a fraction you interpolate, or choose the best value.
-
squareroot(variance)
gives the 'population' standard deviation
- where the variance is the mean squared-error, sum(e2)/n, sometimes described as the 'population variance'
- assuming each value of e is the difference between the value of y, and the mean of y
- y contains n numbers
-
|