COVARIANCE AND CONTRAVARIANCE

When studying tensor calculus the distinction between covariance and contravariance may be obscure and is rarely explained visually. A geometric explanation will be exhibited here.

First we will explain the distinction between the covariant and contravariant components of vectors, thinking of vector-fields where a vector is defined at a point rather than as a position vector. This extends naturally to the components of higher order tensors. Strictly speaking, despite usage to the contrary, there is no such thing as a “covariant vector” or a “contravariant vector”. A vector is a vector is a vector. However it may be handled in two ways. Firstly by means of its components parallel to the coordinate directions which form a parallelogram in the two-dimensional case, in the same way that dx and dy are defined as the sides of the parallelogram related to an infinitesimal displacement ds. These components are referred to as its contravariant components. Secondly we may handle it by means of its resolved parts along the coordinate directions, which are its covariant components. The latter are the inner products of the vector with the coordinate unit vectors. The distinction is important e.g. when finding inner products such as F.s for the work done by a force F producing a displacement s. We will follow that up later.

We will work with vectors in two dimensions to illustrate the principles involved. We will use non-orthogonal cartesian coordinates i.e. coordinates defined relative to non-orthogonal axes. However tensors are especially concerned with the use of curvilinear coordinates , where vectors and tensors are referred to curved coordinate lines which approach linearity at infinitesimal distances. The coordinate axes used below should be regarded as the tangents to such coordinate lines in such cases, and vectors as directed magnitudes at an origin O which is a local point in a field. The coordinate directions thus vary as O is varied. This covers cases where both coordinates are of the same type (polar coordinates in two dimensions are an example where they are not). Figure 1

Contravariant Components

The components of a vector in two dimensions are defined in the literature in relation to a change of coordinates from (x,y) to (x',y'), say. The contravariant components are those which transform as follows e.g. for the new coordinate x' in terms of the old (x,y): (1)

and similarly for y'. This is far from obvious at first sight, so we will show how the partial derivatives relate to the geometry.
This is how the coordinates themselves are transformed, and oddly enough vectors defined in this way are referred to as contravariant, which at first sight seems rather perverse. However the comments about inner products below may shed light on this oddity. Figure 2

The vector at O is represented by OV and the parallelogram-component on the axis OX is OA, where VA is parallel to the axis OY. We will only illustrate the situation for the x-components. If we change coordinates to OX', OY' then the new x-component is OA' where VA' is parallel to OY'. Now we join A to P on OX' such that AP is parallel to OY'. Using the sine rule we get (2)

where γ=φ+β-α and μ=180-φ-β.

Noting that OA'=x', OA=x and AV=y, partial differentiation of this with respect to x gives from triangle OAP, holding y constant, and from triangle VQA', holding x constant, giving from (2) as required. A similar argument holds for the new y coordinate. The generalised version of (1) for more than two dimensions, using overlines instead of primes, is or, using the repeated-index summing convention for k, (3)

For the contravariant components it is customary to use superscripts for the indices such as j and k.
Thus our previous x' = x1 and y'=x2.

Useful expressions for the contravariant coordinates of OV are, using the sine rule, (4)

Covariant Coordinates

The covariant components of a vector are defined by the transformation (5)

using subscripts for the indices in the covariant case. For the x-coordinate in two dimensions this is (6)

where the partial derivatives are "inverted" compared with the contravariant case.
We start by assuming we know x, y, α and φ i.e. we know the initial coordinates of the vector rather than its magnitude OV=v or its angle θ to OX. OA=x and OB=y (Figure 3): Figure 3

then (7)

Solving for θ gives (8)

Now which by (7) is which by (8) is We now encounter a subtlety of the meaning of the "inverted" partial derivatives, for they refer to the coordinates which are contravariant, so we must relate this back to them as follows: Figure 4

If OX'=δx', OX=δx and OY=δy then using the sine rule in the infinitesimal case we get showing that (9) is the same as (6), as required. For more than two dimensions the principle is the same but OV is no longer necessarily in a coordinate plane.

We have thus exhibited how the geometrical interpretation of covariance and contravariance relates to the formal definitions when the components are of the same type.

Inner Product

The distinction between contravariance and covariance is important e.g. when finding inner products such as F.s for the work W done by a force F producing a displacement s. We take the inner product of the two vectors which usually means resolving F along the direction of s. The actual evaluation of W amounts to summing the products of the coordinate-system-components of s by the resolved parts of F. That is, we sum the products of the contravariant components of s and the covariant components of F as for an inner vector product. To use instead the contravariant components of F (which are perfectly respectable quantities) would obviously give the wrong result for W. However, we may instead use the covariant components of s multiplied by the contravariant ones of F and get the correct result, but it seems an unnatural way to handle the problem. It is more natural to handle F by means of its covariant components, which is perhaps why the loose description of a force as a “covariant vector” has crept in. Similarly s is most naturally handled by means of its parallelogram components.

We will now show how this works explicitly. Applying (4) to the vector s represented by OV of length s as in Figure 2, but at an angle ψ to OX, gives The covariant components of F represented by OV as in Figure 3 are: and combining the two gives the inner product in tensor form: which is the standard expression for the inner product.

If we change the coordinate system then the covariant components of F will change such that the above inner product remains invariant (and valid!). This may explain the use of covariant for such components.

Generally a tensor is characterised by a set of functions defining how its components vary with the coordinates. A set of functions comprise a tensor if the components satisfy (3) or (5). Another test is to multiply a set of functions by a tensor, and if the result is a tensor then so are those functions. To find out whether the functions are the simplest possible for a tensor is more difficult, remembering that the tensor is an entity that is described by the functions, just as a velocity is an independent physical entity that may be described in various ways. Such an entity exists independently of the coordinates used to describe it since any equations involving it will, in view of (3) and (5), be the same in any coordinate system e.g. work done expressed by an inner product. However the functions may prove to be simpler in one coordinate system than another e.g. a radial electric field is better described in polar coordinates than Cartesian.

In three or more dimensions, the resolved parts are obtained by projecting a vector onto the axes, not onto the coordinate planes.

Higher order tensors are in principle handled similarly, but they may be expressed with mixed coordinate types i.e. some covariant and some contravariant. The metric tensor is gij and is most easily understood when represented by a square matrix. The coordinate types are the same in that case and the infinitesimal distance between two points is s=gijdxidxj. The repeated suffix convention then implies that gij is summed with dxi and independently with dxj. For two dimensions i and j vary from 1 to 2, so that gij is a two-dimensional matrix, whereas for three dimensions they vary from 1 to 3 and the matrix is three-dimensional. This illustrates the power of tensor notation where the same equation applies for any number of dimensions above 1, but of course the number of expressions for the terms of gij increases with increasing dimensionality. 