# Weighted Averages:

[*** Under Construction ***]

# Weighted Averages Background

Weighted averages are the key to all survey adjustment methods. Any time there is conflicting information a decision has to be made of much much trust (or weight) that each piece of data has. The final result is going to be made up of the sum of part (possibly zero) of each piece of information.

All of the traditional pre-computer adjustment methods were ways of weighting the information in a survey to come up with a probable location of a survey point. There were many practical rules that worked well in practice and were carefully studied. (Such as "Transit Rule", "Compass Rule", "Baarda's modification of Bowditch's method", etc.) These methods, in the end, all determine a reasonable set of weights to use, or actually form the weighted averages.

In the pre-computer days there were also careful theoretical studies of adjustment that were not often used in practice, because they involved too much computation.

With the advent of computers, it became possible to handle computations that were impractical before. The weightings determined by theory could now be used directly. In particular, one method with good statistical properties that had been impractical before (namely "least squares techniques") became practical and common.

As we will see in the Least Squares chapter this technique, while derived from serious calculus, just forms specially weighted averages and can be explained in those terms.

Geek Note: All the least squares techniques we will cover produce as a result something of the form p=As as a final result. That is, the shots multiplied by a solution matrix giving the points. *ANYTHING* of that form is a linear combination of the s vector elements into a solution, and can be looked at as a weighted average.

# What is a "weight"?

Intuitively, a weight is a (non-negative) number that reflects how much trust you have in a value. A large number means you put a lot of faith in the value, a small number means that you put little trust in the value.

A related concept in mathematics is "standard deviation" (and "variance"). Standard deviation is a measure of the amount of slop a measurement has built in. In many ways standard deviation and variance are easier to compute with than weights are. Almost any book on statistics will tell you more about standard deviations than you would ever want to know, including the full details of how one computes them. Standard deviation determine variances, and variances determine weights.

Mathematically, the concepts are more than just related, there are specific formulas that relate them. Namely:

```Variance is standard deviation squared.
Weight is the inverse of variance.
```

These simple definitions allow one to derive the basic properties of weights and variances. A simple example, in one dimension, is measuring a straight line with a short ruler, so that we measure a length A, and a length B, and we have the weights, variances, and deviations of both A and B, we can compute the numbers for the sum A+B.

```   weight(SUM) = 1.0 / (1.0/weight(A) + 1.0/weight(B))
deviation(SUM) = sqrt(deviation(A)**2+deviation(B)**2)
variance(SUM) = variance(A)+variance(B)
```

The equations above show why people who really deal with these prefer to use variance, and compute weights when they have to, and don't like to play with standard deviations at all.

This said, weights also have their place. Adjustments, in the end, are usually done from weights, and use weighted averages or equivalent methods (such as least squares).

I am not generally going to go into derivations here, but merely point out that these are well known results that have already been derived in the general literature [See the bibliography].

Geek Note: Note that the equations above mean that two shots of standard deviation of +-1cm DO NOT produce a sum that has standard deviation of +-2cm (as some people's intuition would give) but instead give a sum with a standard deviation of +-sqrt(2)cm instead. This is because the two measurements are unrelated (uncorrelated). If the measurements were perfectly correlated correlated then when one changed the other would change the same amount, and the errors of the two would add. This is the first of a number of places where we will see that difference between correlated and uncorrelated measurements is significant.

Geek Note: Weights have to be non-negative because they are the inverse of a squaring, and so are the inverse of something positive. In the multiple dimension cave they are positive definite because they are the matrix inverse of a positive definite covariance matrix.

# One dimensional weights

If we have some measurements (M1, M2, M3, ...) of a certain quantity, and some measure of the precision of those measurements, as standard deviations (D1, D2, D3, ...) we can compute up the weights for the measurements (Weights being the inverse of the square of the deviations.). This gives us some weights (W1, W2, W3, ...)

# One dimensional weighted averages

A "weighted average" of a set of measurements is an average that takes into account the relative precision of the measurements. The weighted average is the quantity W1*M1 + W2*M2 + W3*M3 ... divided by the sum of all the weights.

Or, in other words: (W1*M1 + W2*M2 + W3*M3 ... ) / (W1 + W2 + W3 ...)

When we are dealing with matrices we have to make distinctions that we don't have to with real numbers, and the equation is going to look a little different. To make it easier later, we will give the form of the equation above as:

```    inverse (W1 + W2 + W3 ...) * (W1*M1 + W2*M2 + W3*M3 ... )
```

Alternately, you can look at this as:

```  inverse(W1+W2+W3+...) * W1 * M1
+ inverse(W1+W2+W3+...) * W2 * M2
+ inverse(W1+W2+W3+...) * W3 * M3
...
```

In this form it says that the weighted averages are just the amount of each measurement that use for the result.

Geek note: The matrix form for a weighted average looks like:

```To keep the expression simple, we will use s to represent
inverse (W1 + W2 + W3 + ...)

The expression, in matrix form is now:
[ s*W1   s*W2   s*W3   ... ] [ M1 ]
[ M2 ]
[ M3 ]
[ .  ]
[ .  ]
[ .  ]
(Each of the weights, scaled by the sum)
```

If we measure a rod by three different (very sloppy) methods, we could obtain:

```Measurement     Length     Standard deviation
1           10.3m       0.5m
2           10.8m       1.0m
3           10.5m       0.25m
```

Clearly the measurements are different, but the methods of getting them also have different precisions. The "statistically most likely" value for the measurement is the weighted average.

The weights we get are (weight=1.0/(dev**2)) W1 = 4.0, W2 = 1.0, W3 = 16.

The weighted average is therefore (4*10.3 + 1*10.8 + 16*10.5)/(4+1+16) = roughly 10.48

Alternately you could have looked at this as:

```   (Remembering that 4+1+16 is 21)
4/21 * 10.3  =  1.9619
+  1/21 * 10.8  =  0.5142
+ 16/21 * 10.5  =  7.9999
====================================
```

This form makes it explicit that a weighted average is just some amount of each measurement, all added together.

*IF* the measurements really have the standard deviations given, then the most likely value of the length of the rod is roughly 10.48m. (If the given precisions are wrong, then the result is statistically meaningless.)

What has happened here? The sloppiest measurement was given a low weight in the adjustment, and the more precise measurements were given higher weights. This is the essence of adjustment of measurements.

We were only able to do an adjustment because we had different measurements and we had some idea of the precision of each of them. If we only had one measurement we wouldn't need to adjust. However, applying an adjustment procedure wouldn't actually hurt anything. In that case, the weighted average of one value is (W1*M1) divided by the sum of the weights. But since there is only one weight, the sum is just W1, and (W1/W1)*M1 is is just M1.

Digression: If one were doing "Best fit first" adjustments, the answer in this specific case would have been just the best shot, namely 10.5m. If someone were to weight the shot by shot length the weighted average would have been: (10.3*10.3 + 10.8*10.8 + 10.5*10.5)/(10.3+10.8+10.5) or about 10.54 While shot length may, in some cases, be a good estimate of the correct weights, one can see here that it is not the same as using the correct weights.

If we were giving shots equal weight, then the weighted average would just be the average (about 10.53m).

Any set of (positive) weights will give you some number between 10.3m and 10.8m in the above example. One can argue for any number of different weights to form that sum. However, ONLY the weights that are derived from the inverse of the variances have the statistical property of being "statistically most likely". Other weights may have appeal on other grounds, but only the covariance-based weights will have the right properties when we go on to the methods of Least Squares for statistical adjustment.

# Multi-dimensional weights

Unfortunately, most people are unused to doing equations in multiple dimensions, and of handling statistics in multiple dimensions. This leads to the common misconception that the way you do three dimensional weights is to just have a weight for X, a weight for Y, and a weight for Z.

The situation is not, however, simple in that manner. There are also weights that measure how much interrelationship the the values in the various coordinates relate to each other.

For example, if someone is measuring a line running north east, any slop in the distance measurements not only affects both X, and Y, but it also effects them IN THE SAME DIRECTION. Any slop in the angle measurements also affects both X and Y, but in the OPPOSITE direction. Including this effect makes a big difference when computing the weights. An example of how much of a difference this makes practically is shown in the Linearization chapter.

Geek note: If we are working in N dimensional space, then a value is an N vector. Weight matrices on physical grounds have to be symmetric, so the weight has to have N columns and N rows. So you would expect that in N dimensions weights would be an NxN matrix.

Fortunately, mathematicians have worked out all the details of computing three dimensional weights. In one dimension a variance is just a number, and a weight is just its inverse. In two dimensions one has a two by two variance matrix (called a "covariance" matrix to remind people it relates the variances of the coordinates and is not just two numbers). In three dimensions one has a three by three covariance matrix.

Geek note: This matrix gives the uncertainties of X, Y, and Z, but also gives the uncertainty in X being correlated with Y, Y with Z, and X with Z. If X has a correlation with Y, then Y has the same correlation with X, and so on. Because the correlations between pairs is symmetric the matrix is forced to be symmetric also.

This may sound like it is getting ugly, but the use of matrices allows the rules to stay simple. In one dimension a weight is the inverse of a variance. In two dimensions a weight is the inverse of the two by two covariance matrix. In three dimensions a weight is the inverse of the three by three covariance matrix. In all cases weights remain the inverse of covariance.

Geek Note: Since a one by one covariance matrix is just a variance, it is proper in all cases to say that the weight matrix is just the inverse of a covariance matrix.

It is a natural thing to wonder what covariance and weights mean in multiple dimensions. Let's consider the covariance in three dimensions. The matrix has elements as follows:

```[ covariance(x, x)   covariance(y, x)   covariance (z, x) ]
[ covariance(x, y)   covariance(y, y)   covariance (z, y) ]
[ covariance(x, z)   covariance(y, z)   covariance (z, z) ]
```

The covariance(x,y) (for example) is just a measure of how related x and y are. It is zero if the slop in x is totally unrelated to the slop in y. In the real world the covariance(x,y) is the same as the covariance(y,x), because if changing one measurement changes another, changing the other changes the one.

The covariance of a variable with respect to itself is just the variance. So the diagonal elements of the array are just the familiar concept of the amount of "slop" in the measurements of each coordinate.

Geek Note: Since covariance(a,b) = covariance(b,a) the matrix is forced to be symmetric.

A weight matrix is the inverse of a covariance matrix. What then is the meaning of the elements of a weight matrix?

```[ weight(x,x)  weight(y,x)  weight(z,x) ]
[ weight(x,y)  weight(y,y)  weight(z,y) ]
[ weight(x,z)  weight(y,z)  weight(z,z) ]
```

Weight(x,y) is a number that measures how much we trust that when we increase x, y will have a corresponding change. If x and y are unrelated, then this number is zero.

Weight(x,x) is now just a number for how well we trust x.

Geek Note: Since the weight matrix is the inverse of the covariance matrix, and the covariance matrix is symmetric, the weight matrix is forced to be symmetric.

# Multi-dimensional weighted averages

In one dimension the weighted average was

```   inverse (W1 + W2 + W3 + ...) * (W1*M1 + W2*M2 + W3*M3 + ...)
```

In three dimensions the weighted average is EXACTLY THE SAME FORMULA. The addition is now addition of matrices, the multiplication is now multiplication of matrices, and the divide is multiplication by the matrix inverse. BUT, the form is the same, and the meaning is the same. There is also a matrix operation (determinants) that gives a value for the "size" of what the matrix represents.

The adjustment problem is just computing a weighted average of conflicting survey data (in several dimensions). The adjusted data isn't magically somewhere other than where the surveys point, it is just an "appropriately" weighted average.

The usual method of computing the "best" weighted averages is a collection of methods called "Least Squares Methods". Least squares will be covered in a later chapter.

# Two dimensional example

Many people work from examples better than from explanation. A three dimensional example tends to get bogged down in details, so I'll give a two dimensional example of weights.

Consider the case of a cave out in the woods, and directions from a town to the cave as given from two different people. Able goes North on a road 10 miles, and then goes east through the woods about 12 miles. The Baker goes North East through the woods about 15 miles. These two descriptions don't agree on the location, and we want to come up with the most likely location of the cave.

Using the usual coordinates, Able believes the cave is located at (11, 10), and Baker believes it to be at roughly (10,10).

Able is able to gauge the 10 miles north closely, since there is a 10 miles to town sign on the road. And he can get pretty close to east as he walks. But he can only estimate the distance through the woods. We'll assume (for the sake of having something to work with) that the variances for his estimate are

```[ 0.25     0 ]
[ 0     0.10 ]

Which gives a weight of roughly
[ 4  0 ]
[ 0 10 ]
```

Baker is very good at estimating distances, but poor at estimating North East. We will assume that the variances for his estimate are

```[  0.3 -0.2 ]
[ -0.2  0.3 ]
```

The off diagonal elements reflect the fact that any errors he has in X are likely also errors in Y since his angle is likely off. The weight for that variance is roughly:

```[ 6 4 ]
[ 4 6 ]
```

The details of the matrices are NOT the same in the two cases. That is because they reflect differing correlations between the axes.

Able's estimate is an example of X and Y being uncorrelated (unrelated). Baker's estimate is an example of X and Y being correlated.

The second example is an example of x and y being correlated.

OK, back to the problem at hand..

We now have to form a weighted average, which is

`   (W1*P1 + W2*P2) * inverse (W1+W2)`

which is what you'd end up programming. However, someone wanting to see the full ugly details can expand this out as:

```inverse ( [ 4 0 ] + [ 6 4 ] ) * (  [ 4  0 ] * [ 11 ]  +  [ 6 4 ] [ 10 ]  )
[ 0 10]   [ 4 6 ]        [ 0 10 ]   [ 10 ]     [ 4 6 ] [ 10 ]

which is
inverse ( [ 10  4 ] ) * (  [  44 ] + [ 100 ] )
[  4 16 ]        [ 100 ]   [ 100 ]

which reduces to
[  0.111 -0.027 ] * [ 144 ] = [ 10.58 ]
[ -0.027  0.070 ]   [ 200 ]   [ 10.11 ]
```

This gives a final location of roughly [ 10.58 10.11 ]'

If the two observers had overall weights of the same magnitude, it is reasonable to ask why the final value is not the average of the two values. The answer is important.

The two observers have roughly equal overall weight, but they don't know the location in the same manner. One knows the X with less uncertainty, the other knows that specific X with more uncertainty. Therefore one has to give more weight to the individual coordinates of one observer than the other. In one case the observer knows a good deal about the relationship between the uncertainties of the coordinates. Therefore one has to account for this in the adjustment also.

Why isn't Y 10, since that's what they would BOTH guess? Because X and Y are related... so the guessed value of X has an effect on the value of y.

Fortunately, this all falls out automagically from the weighted averages.

# Some Geek Notes

Weight matrices from physical problems are symmetric.

One dimensional weights are positive. Multi-dimensional weights are "positive definite" (and symmetric) matrices. (A one by one matrix is positive definite if the number is positive, so it is proper to say in all cases that they are positive definite.)

Sometimes a weight matrix can't be formed, because the covariance matrix is singular. Covariance matrices that are singular are a sign that the transformation model that produced it had a Jacobian that was not full rank, and the model needs work [I.E. It is not numerically stable, and therefore not statistically significant.]

# Go to ...

```This page is http://www.cc.utah.edu/~nahaj/cave/survey/intro/weighted-averages.html