This page is based off of a LiveJournal post that I made back in 2011. The post lacked a lot of things since it was a brain-dump of some math that I had formulated. I had written a raytracer before with some "fish-eye"-looking graphics. This post had the math that fixed the "fish-eye" distortion, but didn't adequately explain anything else. I'm rather proud of the HTML markup on this page, also. It turned out very presentable. Part of the idea of algorithms and math is that they should be beautiful and easy to understand. So, without further ado...

In our physical world, light travels from the sun or some artificial light source, bounces off all the objects around us, and hits us in the face. More importantly, it enters our eyes and we see the effects of light and its interactions with the geometry surrounding us. Since we can mathematically represent this geometry and simulate rays of light on a computer, it stands to reason that we can generate some pretty realistic graphics by simulating the real-world interactions of light and shape.

Since only an astoundingly small percentage of the light reaches your eyes, when we build our simulation, it will trace light in the opposite direction. We trace the path of a photon from the viewer to a viewed object and then to the area's light source. The entire simulation requires only basic algebra and a minimal amount of trigonometry.

Viewer → Object → Light... can't be that hard, right? Right! There's one last element we need, though, and that's a viewing surface. Without this, our graphics would be very distorted and many calculations would be harder than if a viewing surface did exist. Our viewing surface in the real world is bundled conveniently inside our eye in the form of a retina. In our simulated world, I imagine that you are carrying around a big window in front of your face, or, more accurately, a wire window screen.

Imagine holding this wire window screen a few feet from your face. Imagine
each cell of the screen can only use one color. Light will pass from your
eyes through the hole of the window screen to some object in the distance,
and the *array* of colors defined by the screen creates an image.
Say, for instance, you put a dab of paint in each cell of the wire screen
that was the color of the area behind that cell. This would be hell to do
in real life since moving around to paint the screen would make it difficult
to find what's directly behind it and return to your precise previous viewing
position... but this is just a thought experiment. Your paint-dabs would
paint a picture of the scene behind the screen that you could carry away
and, presumably, hang on your wall by your dinner table.

This wire window screen analogy works almost exactly. So, the strategy is to:

- define an array or picture object to hold our data (the "screen" in the analogy),
- trace a path of light through the screen to the object behind it,
- hold the color of the object that we find,
- trace the path from that point on the object to the nearest light source,
- change the held color based on any obstructions on our previous path, and
- put that color onto the screen object.

The first thing we need to do is define the screen. Here, I mean both
the screen from our analogy and the screen of your computer. The cells of
the wire screen from the analogy become pixels on your computer screen.
For this, we're going to need some simple trigonometry. If you understand
the basic **sin**, **cos**, and **tan** of trig., then you can
safely skip the next few paragraphs.

Otherwise, find a piece of scrap paper and draw a horizontal line on it.
At a right angle to the line, draw a vertical line. Connect the left end of
the horizontal line with the top end of the vertical line. Somewhere in that
set of lines, you'll see a right triangle. Go ahead and darken it. The
left-most point we'll call *A*, the top-right point we'll call *B*,
and the bottom-right point we'll call *C*. The angle ∠*BAC*
we'll call *Θ* (theta, "thay-tah"– it's more-or-less Greek
for the letter "Q").

A long time ago, the inventors of trigonometry noticed that there's a
relationship to the lengths of these lines in a right-triangle and the
angle *Θ*. If we take the length of line connecting *A*
and *B* to be 1, then we can build what's called the **unit circle**.
It's pretty intuitive that if we change the lengths of line *AC* and
line *BC*, we can make *Θ* be anything between nothing at all
and a perfect corner. If we drew our triangle in different orientations
by flipping it horizontally, vertically, or both, then we could generate
any angle in any quadrant using the same proportion of positive or negative
line lengths. This is rather taxing on the imagination, though, so let's
proceed.

These relationships between the lines and angles of a right triangle were
given the names "sine", "cosine" and "tangent". The proportion (fraction)
of line *BC* (the numerator) to line *AB* (the denominator) is
called "the sine of *Θ*". The proportion of *AC* to
*AB* is the cosine of *Θ*, and the proportion of *BC*
to *AC* is the the tangent of *Θ*. Since *BC* is on
the __o__pposite side of the triangle as *Θ*, *AB* is
the __h__ypotenuse of the right triangle, and *AC* is
__a__djacent to *Θ*, the mnemonic "SOH-CAH-TOA" arises for
remembering the relationships: Sine is Opposite over Hypotenuse; Cosine is
Adjacent over Hypotenuse; Tangent is Opposite over Adjacent.

The sine, cosine, and tangent of Θ are written **sin**(
*Θ* ), **cos**( *Θ* ), and **tan**(
*Θ* ), respectively.

These mappings work the other way, too. Assuming you have the angle,
and you want to find the ratio of the opposite and adjacent sides, you
could use the inverse-tangent function. It's written **tan ^{-1}**
or called

For raytracing, we'll need sine and cosine, and to define some
*Θ*s, and we'll use aviation terms to do it. Turn your head
from side to side. We'll call that our **yaw**. Nod your head up and
down. We'll call that angle our **pitch**. Next, twist your head toward
your left or right shoulders. We'll call that our **roll**. Using our
basic knowledge of trigonometric functions, we can take a point and rotate
it anywhere. This following fourteen lines are an "algorithm", or a set
of steps, that accepts an input point (*P*) and three decimals
representing a pitch, roll, and yaw (*p*, *r*, and *y*,
respectively). Notice the capital *P*. For here, anything that's
lower case is a simple number, and capital letters represent vectors–
a simple set of numbers. Completing the algorithm "returns" a new version
of point *P* following its rotation around the origin.

**algorithm** rotate ( Point *P*, angle *p*, angle *r*, angle *y* ) :

(1) **Let** *d* = **√**(
*P _{x}^{2}* +

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

(12)

In the pseudocode, all the variable names are italicized and function names
are put in boldface. The end result of the algorithm is the point *P*,
an altered version of the same *P* that went into the algorithm at the
top. But let's talk about how it works. *Θ _{yaw}*,

Lines 1, 5, and 9 are the simple distance formulas from basic algebra
based on the theorem of Pythagoras. Since our imaginary triangle's hypotenuse
is no longer 1, we'll use that formula to figure out it's true value. Then,
lines 3, 4, 6, 8, 11, and 12 are multiplied by this value accordingly.
The first and last lines are notational conveniences so we can say,
**rotate**( (3,2,3), 1.5, 0.1, 0) and have it meaningful and well-defined.

You'll notice three sets of lines in the algorithm. Lines 1 through 4,
5 through 8, and 9 through 12 are very similar with each other. The first
set, made obvious by the notation, rotate the point around *x* and
*z* axes. If we treat *y* as the up-axis, you should notice changing
the yaw of an object only changes the *x* and *z* values of the
point that we're rotating. The second set takes the point and whirls it around
the *z* axis, and the third set changes the pitch by rotating the point
around the *x* axis.

We have the hard part behind us. Even if you're still reeling to figure
out why **atan** was used, the rest should be pretty easy. First, draw
a rectangle. That's your screen. We'll define the points across the top as
*T* and *U*. They'll have values like *T _{x,y,z}*, etc.
The bottom two points will be

Using all this new terminology and our **rotate** algorithm that
we defined above, let's find our values for *T*, *U*, *V*,
and *W*. Let's also throw in something new: a view-point that's not
at (0, 0, 0). We'll call it our camera *C*. It'll have *x*, *y*,
and *z* values, as well as rotations of its own, which we'll call
*C _{ΘP}*,

(1) **Let** *T* = *C* + **rotate**( *F* - *C*,
*C _{ΘP}* + (

(2)

(3)

(4)

Notice that **rotate** rotates a point around the origin. In order
to accomplish this with our focal point, find the focal point's relation
to the camera with subtraction, rotate that point, and then put it back
with the initial addition. It's tricky, but it makes sense.

The next part is to build a test point, move the point along a straight line, and see if it collides with anything. The simplest object to detect collisions on would be a sphere:

**algorithm** sphere_collide ( Point *P*, Sphere *S* ):

(1) **If** √( (*P _{x}* -

(2)

Notice that line 1 is essentially just the distance formula. If the distance
from our test point is less than the radius of the circle, then it's true
that the point collides (is inside or on the surface) the sphere. In order
for this to be important, though, we need to actually shoot the ray and
see if this collides. So, we'll have an algorithm that "shoots" the ray
from camera *C* to some destination point *D*. Once again, each
line in the algorithm is really three: one for the *x*, *y*, and
*z* parts of each point. We'll also define our screen symbolically
as *S*, for "screen". The last thing we'll need is a "ray resolution",
*r*, which is just a constant number that says how many steps we
want our ray to take from its source (the camera *C*) to its
destination (the point *P*). The algorithm then looks like this:

**algorithm** shoot_ray( Point *C*, Point *D*, World
*Γ* ):

(1) **For** *i* = 0 → *r*:

(2) **Let** *A* = *C* + (*D* - *C*)
× (-*i* / *r* )

(3) **For** *j* = 0 → **size_of**(
*Γ* )

(4) **if** **sphere_collide**( *A*,
*Γ _{j}* ),

(5)

We're getting a little fancy, here, but there's nothing to fear. The
*Γ* is a "gamma", a list of sphere objects that make up our
world. I implicitly defined **size_of**, but it seems pretty intuitive:
it gets the number of items in *Γ*. Also the **For** construct
might seem a bit new. On line 1, I define *i* to equal zero, then
do lines 2, 3, and 4 before returning to 1. When I do go back to line 1,
I increment *i* unless it's greater than *r*. If it is, I skip
down to where the indentation jumps back left: line 5. Likewise, the
**For** on line 3 means do lines 3 and 4 repeatedly until either
we exit the algorithm completely, or *j* is greater than the number
of elements in *Γ*.

So, if *A* collides with some sphere, we'll get the point of collision
out of the algorithm, otherwise we'll get a zero. Let's look more closely
at line 2. Points *C* and *D* are constant, and so is the
ray resolution *r*. The only thing changing is *i*. When *i*
is zero, ( -*i* / *r* ) is zero, too. We'd be multiplying
( *D* - *C* ) by zero, and the end result of the equation would
be *C*. When *i* and *r* are equal, their quotient, that
fraction, becomes one. Now, we're multiplying ( *D* - *C* ) by
one and subtracting it from *C*. *C* + *D* - *C*,
then, equals *D*. What we've done is constructed a gradient of
sorts between *C* and *D* and tested the points in between.

So, now we must return to our screen *S* and the four points that
defined it *T*, *U*, *V*, and *W*. We'll use the same
kind of zero-to-one magic that we used in the ray-shooting algorithm:

**algorithm** make_picture( Point *C*, Point *T*, Point
*U*, Point *V*, Point *W* ):

(1) **For** *j* = 0 → *S _{height}*

(2)

(3)

(4)

(5)

(6)

(7)

Let's look at what's happening here. Just like the **shoot_ray**
algorithm makes a little gradient pathway from one point to another, here
we have that same pattern going on twice on lines 2 and 3 where we're
connecting the top-left (*T*) with the bottom-left (*V*) and
the top-right (*U*) with the bottom-right (*W*). Lastly, we're
doing that same trick one more time connecting these rails that run down
the side of the picture with one that runs across: line 5. After we
build this test point, we shoot a ray to it. If the ray strikes something,
we change the color of *S* to the color of whatever we hit.

On line 7 we see the words "jump to 1". After our ray collides with
an object in the world *Γ*, we stop. Light doesn't travel through
objects in our simulation, so continuing the ray would be a waste of time.
Instead, we'll increment our value of *j* and return on our way.

We already have the means of shooting a ray from a camera to a destination point. We have a screen object that captures colors. In short, we have all the things we need to finish basic ray-tracing. Shadows and reflections are, as we'll see a little later, mere applications of what we've already learned. There are more shapes in this world than just spheres.

An unrotated box is probably the next most obvious thing to determine
a point of collision against: If the test point's (*P*'s) coordinates
are greater than the smallest-numbered coordinates of the box but smaller
than the largest-numbered coordinates of the box, then we have a collision.
To elaborate, imagine a box. The bottom-left-front point of the box we'll
call (-1, -1, -1) and the top-right-back corner we'll call (1, 1, 1),
for simplicity. Now, if we're testing a point to determine it's collision
with this box, then we'll test to see that *P _{x}*,

A tube would be another easy target. The shape is a line segment with
bulbous caps on each end. We could use our gradient approach that we used
for tracing a line and see if the test point is closer to the line than the
radius of a the tube. We'd be mixing the sphere code with the line code.
PovRay calls this shape a "sphere sweep"
because our test would be essentially stretching out a number of spheres.
Let's look at the formal algorithm for this one. We'll define a tube
*T* by its two end points *T _{A}* and

**algorithm** tube_collide( Tube *T*, Point *P* ):

(1) **For** *i* = 0 → *r*:

(2) **Let** *A* = *T _{A}* + (

(3)

(4)

There are others shapes to collide bombard with points, but let's take
another step toward having a complete idea of our world's geometry. Often times
it's convenient to "subtract" one shape from another. We can cut a box
out of another box, or a tube through a sphere. This is simple logic that
most raytracer's accomplish pretty easily. The concept is called
"constructive solid geometry". One really doesn't need a formal algorithm
to explain the concept, and in fact, it would probably make more of a
mess trying to formalize it. So, imagine that box again. Now, imagine another
box jutting through it, carving into it and out the opposite end. We've got
one box impaling another. Let's call the main box *A* and the
intersecting box *B*. If a ray would strike the box *B* by itself,
we don't care. We're not drawing the box. Therefore, we'll leave it out
of our world *Γ*. Instead, when we program a data structure
for our box *A*, we'll sort of include as a note that we've got a
shape being subtracted. Then, when a line strikes *A*, we'll check
if the line also strikes *B*. If it does, we'll ignore the collision.
It's as easy as that! We call this subtraction a "boolean difference".

There is another element of constructive solid geometry that's equally
useful. This is a "boolean intersection". When a ray collides with our
*AB* object, we'll ignore it unless it strikes the space occupied by
both of them. That means if the ray it's *A* but the space isn't occupied
also by *B*, we'll ignore it, and also ignore the collision if the
ray strikes *B* but not *A*. What we're doing, then, is binding
one object by the other. In our previous example with the two boxes, the
end result would be the intersecting long box being bounded by the original
box. So, the result is still a box. If we intersected a sphere with a box,
we'd get a more interesting shape. If we intersected a box with our
sphere sweep, we could get a proper cylinder.

So, in our **shoot_ray** algorithm, on line 4, we'd have to change
the **sphere_collide** to map to the correct function based on what
*Γ _{j}*'s type is. In fact, this is where object-oriented
programming becomes a plus. A "shape" interface could contain the structure
for a shape. Every shape has a material, or at least a color and a function
determining if a test point intersects the shape. The raytracer throws a
bunch of different test points at all of the shapes contained within the world
of the raytracer, and the shapes that use the interface would each define their
own special way of determining collisions.

So, we have a world filled with intersections and differences of spheres, boxes, and tubes. We've got a pretty good thing going for us. In theory, we could already put together a simple architectural model. There are two important features missing, though: shadows and reflections. Both of these things raytracers excel at producing.

To continue, we need to understand color. So far, we've used it fairly
abstractly, assigning *S _{ij}* a color without really defining
what a color is. In most programming languages, colors are defined using
red, green, and blue components. We could define a color in the same way
we've been defining points: 3-tuples. If the word "tuple" seems foreign
to you, don't be afraid. A 3-tuple is just an extension of a tuple, which
we know in informal-talk as an ordered pair. (-1, -1, -1) is an example of
a 3-tuple.

For colors, though, most systems use values from 0 to 255. Some, like PovRay use values from 0 to 1. Anyway, mixing colors for our raytracer is pretty intuitive. Having full values of the red, green, or blue components produce that color. (255, 0, 0) is red. (0, 255, 0) is green. We can mix them using (255, 255, 0) to make yellow. Besides full-and-empty values for each component, we can mix it up to produce dimmer colors. (128, 128, 0) is a dim yellow. (128, 128, 128) is what's called "50% gray". Green and blue can be mixed to make cyan, and red and blue can be mixed together to produce magenta.

The next thing we'll need is the concept of "ambient light". When we have
a shadow, it's very rarely fully black. This is because light reflects and
lightens the area creating the shadow, or, beyond the scope of this writing,
there are multiple light sources in the area. Let's simplify our simulation,
as many popular ray-tracers do by having a color defined to represent the
ambient light in a room. A good candidate would be dark gray. Now, when we
shoot a ray to our object, we'll then trace it to our light source *L*.
Our updated algorithm looks like this:

**algorithm** make_picture( Point *C*, Point *T*, Point
*U*, Point *V*, Point *W* ):

(1) **For** *j* = 0 → *S _{height}*

(2)

(3)

(4)

(5)

(6)

(7)