A Simplified Introduction to General Relativity

Before launching into general relativity, I thought I'd review a few basic derivations in differential geometry. The simplest possible case of interest would probably be:

s2 = x2 + y2

or, over small distances,

ds2 = dx2+ dy2.

As a review exercise, we could use the calculus of variations to derive the equation for the geodesic curve: i.e., the shortest distance between two points through the Euclidean space defined by this Pythgorean theorem distance formula . (It better come out to be a straight line!) The distance to be minimized is:

and the standard variational integral is:

Then the standard, grind-the-crank Euler-Lagrange equations are:


But inasmuch as f isn't an explicit function of either x or y,




which means that



so that



Evaluating fxand fy', and remembering that ds = f dx:


What these equations are saying in their cumbersome way is that the line which minimizes the distance between any two points on a plane makes a constant angle with the horizontal and the vertical...that is, that it's a straight line.

Well, okay, that checked out...like using a cannon to shoot a squirrel, but it worked. Now for something more ambitious, like the shortest distance between two points in space-time. The equation for the shortest distance between two points the geodesic curve becomes:

Then the Euler-Lagrange equations become:

and as before, since f isn't an explicit function of either x or y,


which means that



And what's this briar patch of equations trying to say? Well, the last equation says that along the paths of shortest distance in space-time, the total speed, v, (which is the slope of 

with respect to distance, ct along the time axis), is constant. This means:

(1) that the paths of shortest distance have constant slopes...that is, are straight lines;

(2) that, in the language of physics, the speeds of particles (or planets), which move along paths of shortest distance (geodesics) in ordinary Euclidean space-time, are constant. In other words, the last equation says that energy is a constant of the motion. It says that particles move at constant speeds in the absence of force fields (which we're about to perceive to be space-time vortices regions in which space-time becomes warped and non-Euclidean) not because of some principle of conservation of energy but because particles follow geodesics in Euclidean space-time and the shortest distance between two points in Euclidean space-time is a straight line of constant slope, which is to say: of constant speed. Impressive, huh?

The first two equations say that the x and y components of the velocity will remain constant along straight lines in space-time, and are the basis for conservation of momentum. In effect, we've derived the principle of conservation of energy (or at least of kinetic energy), and Newton's first law of motion, not as postulates but as results.

Now for the same thing in polar coordinates. Our distance to be minimized is:

and the Euler-Lagrange equations are:

and we go through the usual hocus-pocus to arrive at the following quantities which are constant along the paths of least distance in undistorted space-time:

The last equation gives the same result as before, but to me, the interesting equation is the second one. Multiply it by m0 and it becomes the relativistic angular momentum. It states that angular momentum will be conserved. I had never realized before that conservation of angular momentum was a consequence of the coordinate system we select and has nothing to do with central force fields, but I can see now that that's the case. It makes sense once you think about it. In the absence of any forces, the total speed of a particle moving along a straight line isn't going to change. At the same time, if we put a point of origin in there and start to measure dr/dt's and d/dt's, at infinity, the radial velocity will be the total speed, since there won't be any tangential component. At the same time, the radial velocity has to go to zero at the point of closest approach, so the tangential velocity has to take up the slack until, at the point of closest approach, it's equal to the total speed.

A Derivation of General Relativity

    I found this treatment of general relativity in a library book in 1988. Unfortunately, for some unknown reason, I didn't include the book's title or the name of its author in this write-up. Years later, when I went back to the library to find it, I was unable to locate it. (It may have been out on loan.) I wrote this derivation in 1988 in an effort to ensure that I understood the derivation I had found.
    The value of this cut-to-the-chase approach to general relativity is that it arrives at the Schwartzchild solution to two-body central force motion without requiring the machinery of tensor calculus. The author moves directly to the solution.

    The author begins by observing that the most general form for a measure of distance ds would probably be something like:

ds = f(x1, x2,... xn , dx1, dx2,... dxn),

where n is the number of dimensions with which we're working. .

    But in the real world (as opposed to pure mathematics), the actual length of a displacement can't change if we change dimensional units. If we change our units of measurement from meters to centimeters, then when we multiply dx, dy, dz by 100 to change them to centimeters, ds will also have to increase by 100 to keep the actual distance ds unaltered. Saying this quantitatively, if

dx1' = l dx1,
dx2' = 
l dx2,
dxn' = 
l dxn,
and ds' = 
l ds,

then l ds = f(x1, x2,... xn ,dx1l dx2,..., dxn).

    But lcan be factored out if and only if dshas the form of the m'th root of a product of the n dx's:  dx1, dx2,... dxn, taken m at a time, and multiplied by coefficients gi,j,...n(x1, x2,... xn) that are functions of the variables x1, x2,... xn , ...that is, only if ds is a homogeneous function of degree m. To say it with an equation,

(2)  dsmgi,j,...n(x1, x2,... xn)dxdx...dxn,

     For example, if m = 4, there would be 44 or 256 terms in this summation for ds.

     Now at this point, you might well ask,  "How come we didn't put l in front of the x's in  f(x1, x2,... xn dx1l dx2,..., dxn)?" I don't know. However, I can tell you that the g coefficients in Equation (2) are dimensionless. The reason they're dimensionless is that they contain scale factors that cancel out the units of distance that go with the x's. 

For example, the grr term that gives the spatial compression in the neighborhood of a star is given by grr  = 1 - rs/r, where rs is the Schwartzchild radius of the star. The Schwartzchild radius is the radius of the star that it would have if it were to become a black hole. To say it another way, the Schwartzchild radius is the radius at which a star with a given mass would have an escape speed equal to the speed of light.
    You might also ask, "Aren't measures of distance are based upon the Pythagorean Theorem? Shouldn't it be,
    Furthermore, we're dealing with four dimensions: x, y, z, and ct. So shouldn't we have 16 terms?"
      To the best of my knowledge, that's correct. I kept things in this more general form because that's the way they were presented in the book, but in fact, we do come right down to,

( 3)

ds2 =

gxx(x, y, z, ct) dx2

+ gxy(x, y, z, ct) dx dy

+ gxz(x, y, z, ct) dx dz

+ gx(ct)(x, y, z, ct) dx d(ct)


+ gyx(x, y, z, ct) dy dx

+ gyy(x, y, z, ct)dy2

+ gyz(x, y, z, ct) dy dz

+ gy(ct)(x, y, z, ct) dy d(ct)


+ gzx(x, y, z, ct) dz dx

+ gxz(x, y, z, ct) dz dy

+ gxz(x, y, z, ct) dz2

+ gz(ct)(x, y, z, ct) dz d(ct)


g(ct)x(x, y, z, ct) d(ct) dx

+ g(ct)y(x, y, z, ct) d(ct)dy

+ g(ct)z(x, y, z, ct) d(ict) dz

+ gc2t2(x, y, z, ct) d(c2t2)

    As you can see, the terms in Equation (3) are organized like a 4 X 4 matrix. The expression for ds isn't a matrix, since it's the sum of 16 terms (only 10 of which will turn out to be independent). However, to transform a vector at a given point in our curved space-time to another point in our curved space-time, we need only multiply the vector by the 4 X 4 matrix whose elements are the above g's.
    This 4 X 4 matrix that maps from one location in a gravitational field to another location is called the metric tensor. Why is it called the metric tensor instead of the metric matrix? I don't know. A matrix is a tensor of rank 2.

    Next, we note that dxdxj. = dxj.dxi  i. e., they're always commutative. But that means that gi,j = gj,i. (I noticed in the book by the Liebers that this is true for all metrics: the dxdxj's are always commutative and all the tensor elements that have the same set of subscripts i, j,....n have the same value independent of the order of the order in which the subscripts appear. The proof for this is beyond the scope of this paper. A proof is given on pager 240 of the book, "The Einstein Theory of Relativity", by by Lillian R. Lieber and Hugh Gray Lieber, Rinehart Publishing, 1936. (It involves showing that the Christoffel symbols {et, a} and {te, a} are equal, and that

because, as mentioned above, {
et, a} = {te, a}.)

  Now consider the case of a spherically symmetric body.... a star, for instance. Switching to spherical coordinates, we get:

ds2gi j(r, Jj, ct) dxidxj

    But now, because there's spherical symmetry, the g's can't vary with J or j but only with r and t. Then if we say that the gravitational field is time-independent...that is, if its mass and therefore its gravitational field isn't changing as an explicit function of time...the g's can’t vary with time, either, so they can depend only upon r. Thus,

ds2 = gij(r) dxidxj = g11(r)dr2+g22(r) r2 (dq)2+ g33(r)r2sin2q (dj)2- g44(r)dt2

g12(r) r dr dg13(r) r sinqdr dj...

(The g's are functions of r alone only when we use the usual spherical coordinates, dr, r dq, r sindj, and d(ct). Otherwise, the g's that go with terms having a in them have to have to contain a sinfactor.)

            `At this point, all the books on general relativity I've read make the assumption that space-time is isotropic - that things look the same in all directions. If we make this assumption, then ds must remain unaltered if we replace dj by -dj or dq by –dq. But that can be true only if, for example, grqdr dq = - grqdr dq...that is, if all the coefficients of the cross products are zero. The result is that all the off-diagonal terms in the "metric matrix" are zero. The Liebers discuss this on page 235 of their book (cited above) They say,
    "Well, obviously, a term like dr d
q (or dq df or dr df  0 would be different for q (or f or r) positive or negative, and, consequently, the expression for ds would be different if we turn in opposite directions---which would contradict the experimental evidence that the universe is isotropic. And of course, the use of the same expression for ds from any point reflects the idea of homogeneity. And so we see that it is reasonable to have in (61b) only terms involving dq2 df2, dr2, in which it makes no difference whether we substitute +dq or -dq, etc.
    "Similarly, since in getting a measure for ds, we are considering a static condition, and not one that is changing over time, we must therefore not include terms that will have different values for +dt and -dt; in other words we must not include product terms like dr, dt, etc."

    Finally, we end up with the expression

(4)ds2 = g11(r)dr2+g22(r) r2dq2+ g33(r)r2sin2q (df)- g44(r)dt2,

all more or less in one fell swoop, and without the use of tensors.

Now all we have to do is to determine g11(r), g22(r), g33(r), and g44(r). g22(r) and g33(r) can be set equal to each other and can be included in rbecause g11(r) can be chosen so that it incorporates the effects of the other two. The justification for taking this step is supposed to be listed on page 239 of R. C. Tolman's book "Relativity, Thermodynamics and Cosmology". But we pay a price for this maneuver and that is that r is no longer the distance from the origin to a spherical shell of radius r. I presume that's because r shrinks as we get close to the star, so that the r we have to use is some integral of dr', integrated from  to r. Incidentally, what does it really mean when we say that r shrinks when we get close to a star? The answer is that the shrunken r we're talking about is the r that we see when we're looking on from outside the neighborhood of the star. To a person living on the star, things seem perfectly normal. All the covariant "laws" of physics are adjusted so that the star-dweller can't perceive any difference in his environment except that things outside the neighborhood of the star look farther away than they really are...like looking through the wrong end of a telescope. Also, his clocks appear to us to run slower than our clocks and our clocks appear to him to be running faster than his clocks...i. e., we both agree that his clocks are running slower than ours. Note that this is different from the situation in special relativity where A's clock appears to be running slower than B's, and B's clock appears to be running slower than A's and either point of view is equally valid. That's because, in special relativity, we're dealing with "hyperbolic rotations", and the apparent distortions of the measuring rods and the clocks actually stem from the fact that we're projecting their space and time coordinates onto ours and projecting our space and time coordinates onto theirs, and looking at, so to speak, perspective effects. But with general relativity, it's like the "twins paradox": there really is an asymmetry in the situation.

Does the term "general relativity" refer to the fact that we observers sitting outside the distorted space-time around the star may ourselves be (and probably are) in a distorted space-time? If so...if I'm interpreting this situation correctly (and I may not be)...we have no way of knowing it by looking at the local neighborhood around us or by running local physical experiments any more than the star-dweller can tell that he's in a distorted space-time without being able to look outside it. In that sense, there's no way of ever knowing whether we're in the preferred coordinate system. Also, we can't tell from the laws of physics that we're in an accelerated frame because the laws of physics work the same for us as they do for someone in flat space-time. (The reason I'm not sure about what I'm saying is that the Riemann curvature tensor provides a mechanism for determining whether a space-time is curved or flat, so I may be speaking out of school. I'll have to check on that.) So that's what's "relative" about "general relativity".

Anyway, the "r" that's used in the equation for the metric is defined as the radius of curvature of the spherical shell at "radius" r.

Determining g11(r) and g44(r) takes some arithmetic but it's not too bad. I'm going to jump ahead and say that the answer...the Schwartzchild metric...turns out to be fairly simple. It is:


where rSis the Schwartzchild radius, typically of the order of a few kilometers for a star, and is the radius at which a star undergoes gravitational collapse and becomes a black hole.

Now for the derivation.

We can derive the equations for the geodesic curves---the curves which minimize the integrals of the distances ds in the warped space in the neighborhood of the star---with the calculus of variations. If we label each point along the path ds by a parameter z, then

dr/dz = r', dq/dzq', df/dz =f', d(ct)/dz = ct'

Normally, you'd use dt or ds as parameters to measure positions along the path, but in this case, ds is the quantity being minimized and dt is one of the dependent variables, so some other symbol has to be used. But not to worry. This symbol zdrops out of the picture almost immediately.

Next, we'll take the customary step in dealing with central-force motion of saying that we can orient our coordinate system any way we please, so we might as well orient it so that q = 0. That way, the df2 term drops out. All the motion will occur in one plane in general relativity as it does in classical mechanics because, with radial symmetry, in general relativity as is the case in classical mechanics, nature has no preferential reason to drift in one direction or the other out of a plane of orbital motion...no basis for choosing one plane of motion over another. Then our distance to be minimized becomes:

(6)  . The primes here represent derivatives (slopes) with respect to z., where z is the distance along the path. (z plays the role usually played by x or t, since in this situation, x and t are dependent variables.) .
Equation (6) is exactly the same as the equation  we got for finding the shortest distance between two points in empty 4-space when using radial coordinates, except for the presence of the g
11(r)and the g44(r) terms,which are all that remains that we can adjust in the metric to allow for gravitational distortions. Everything else has been ruled out on the basis of symmetry conditions, the assumption that space-time isn't changing with time, and so forth. If we set the g's equal to one, which we have to do in Euclidean space far from a star, then we'll get exactly the same results we did earlier. So far, we haven't put anything into the mathematics about gravitation...that is, about the fact that mass distorts space-time. (It's worthwhile noting the similarities and differences between the Lagrangian for inverse square law motion and the ds above. The first term in Equation (6) is going to wind up acting like a gravitational force term.) The last two terms are like the velocity terms in the relativistic Lagrangian. The Euler-Lagrange equations are the same as they were before:

Since the metric function in Equation (6) has no q or ct terms, we know immediately that  and  = 0, and therefore,  and  = 0, and  and  = constants k1 and k4, respectively. 

However, this time, because of the presence of g11(r) and g44(r), which are functions of r, we won't assume that 

since it isn’t. We'll skip trying to work with the first equation for r, since the g's are unknown functions of r. (Actually, we know what they are but we can't prove it yet.) Turning to the second equation, since = 0,

=constant = angular momentum = Jq = r2 dq/ds = r2q’.(“J” is a symbol that’s frequently used to designate angular momentum.)

the same as it was for special relativity. Hey! We're wheeling and dealing in general relativity as though it were classical mechanics!

Having had such smashing success with our first attempt, let's try the last equation. It's also the same as before except that this time, we're carrying along a g44(r) term.

This looks formidable but it's not. g
44(r) is only

and d(ct)/ds = 1/(ds/d(ct)) is the reciprocal of the speed of the particle along its trajectory, s. At large values of r, where the conventional ds2= c2dt2 - dr2- r2dJ2Minkowski metric is applicable:

and in close to a star, the exact metric is given by

The term has been carried along in order to derive the full equations of motion. However, since the g's are only associated with the dr and dt terms, it will be simpler to evaluate the g's using a simple radial free-fall problem with the angular momentum set to zero. Now the expression for the metric can be written:

Looking just at the time term,



If we define  to be  , then the above equation can be written,




dd[ g44(r) (ct’)2- g11(r)r'2 - r 2q'2]1/2 dzd dz = 0