Bounding Volume Hierarchy and Distribution Ray Tracing

Bounding Volume Hierarchy and Distribution Ray Tracing

Welcome to the second part of my raytracer development blog. Many features have been added compared to the last part and we will go over each of them, explaining the implementation details and comparing outputs with the reference images.
First, let me give you a quick table of contents for this post:
  1. Multisampling
  2. Bounding Volume Hierarchy(BVH)
  3. LookAt Camera
  4. PLY Parsing
  5. Smooth Shading
  6. BackFace Culling on Primary Rays
  7. Distribution Ray Tracing
    • Depth of Field
    • Area Lights and Soft Shadows
    • Glossy Reflections
  8. My Problems with DRT
  9. Final Renders 
  10. Problematic Renders and How I Solved Them

Multisampling

Multisampling is a fairly simple way of improving the quality of rendered images. Basic idea is to send multiple rays for each pixel and color the pixel taking an average of them. This helps with smoothing the edges of a model or a primitive and eliminating sharp transitions among colors. Also known as Anti-Aliasing.
Multisampling can be implemented purely randomized, choosing random positions inside the pixel and creating rays accordingly, however this is not very smart. Also, a uniform sampling inside the pixel will not be very successful as it aliasing can still occur. In my implementation, I used Jittered Multisampling. The idea behind it is to divide the pixel into uniform boxes and randomly sample a position inside each box. This is easy to implement, fast and gives overall good results. As I was looking into multisampling in raytracing, this method was most commonly used. For a comparison Supersampling Methods can be checked.
When you send multiple rays, you need to take the average of them in order to calculate the correct shading. We can do better than that! Remember, before we were only sending a ray to the center of the pixel. It makes sense to assume that the shading at the center of the pixel is more important than the shading at the very edges. That's why a Gaussian Filter can be applied to get the weighted average of the pixels. In our case, we are working with (x, y) coordinates so we use gaussian2D. Although I implemented Gaussian, the reference images use simple box filtering(just average) so, I won't be applying Gaussian either.

Bounding Volume Hierarchy(BVH)

So far, when we generate a ray, we are testing it with each primitive in the scene. Consider a mesh composed of many triangles, but in a small volume. If we could test if our ray intersects this small  volume, we would save time on many rays that are not hitting the object. Even the ones hitting the mesh can be improved.
There are two main ways to go when you want to add an acceleration structure to your raytracer:
  1. Spatial Subdivision
    • Grid
    • Octrees
    • kd-trees
  2. Object Subdivision
    • BVH
The more effective of these options are kd-trees and BVH. In my raytracer I went with BVH, as far as I have seen, it is fairly common to choose BVH.
I have made use of a few resources. I found the book raytracingthenextweek (by Peter Shirley, available on github) to be extremely clear and concise. I also inspected Realistic Ray Tracing book that I have mentioned in previous post.

Implementation

I started by adding a new class called BBox(Bounding Box). It is simply a 3D box with min, max coordinates as member variables and hit as its only method. This method implements the ray-box intersection based on Liang-Barsky Algorithm. BBox will be created for each primitive and these boxes will hierarchically, grow resulting in a single BVH for the scene.
class BVH has two pointers, left and right and a BBox as its member variables. BVH inherits from abstract base class Shape. Thus has to implement Shape interface (hit, shadowHit, boundingBox).
Important notes on BVH implementations are that when checking for an intersection, both left and right should be checked and if both return a hit, the closer hit should be taken. This is done to cover some special cases where some points of the primitive reside in two separate BBoxes. The construction of BVH is probably the most crucial detail. For each primitive, bounding boxes are calculated on the fly and in BVH constructor, these boxes are merged according to space partitioning. Space partitioning aims to partition the boxes equivalently by dividing the space in two and constructing the left and right BVH according to this division. The alternative is to do this division object-wise. According to the book Realistic Ray Tracing, it is found out to be more effective to divide the space. Lastly, to cover the case where only a single primitive is left, the primitive is duplicated and its pointer is put in both the left and the right of the BVH. This is done in order to eliminate the need for a nullptr.
Note that in space partitioning, we are finding the midpoint according to some axis and alternating the axes on each new branch.
I added calculation of bounding boxes for each primitive. It is fairly simple as well, just calculate the rectangle in each dimension that completely encloses the primitive and you have got yourself the bounding box.
After implementing BVH, the raytracer runs considerably faster. The scenes that have an immense number of triangles take only seconds whereas without BVH we would be waiting for many minutes.

LookAt Camera

We are extending our options in scene xml files. Look at camera is a common way of describing your camera, libraries such as glm contain methods to calculate the directions(basis) of a look at camera.
Instead of having l, r, b, t variables to describe the image plane, it has angle fovy and assumes the camera is centered according to the image plane. It also has gaze point instead of gaze vector.
Since I already had a working 'simple' camera implemented, I decided to convert the given lookat camera to the simple camera during parsing of the scene.
Conversion is very simple, requires basic trigonometry and goes something like this:
  1. top = tan(fovy / 2) * near_distance
  2. right = (width / height) * top
  3. left = -right
  4. bottom = -top
  5. gaze = gazepoint - position

PLY Parsing

PLY parsing was one of the more frustrating parts of this homework. There are a few third party parsers that I have come across. An informative benchmark of different options can be found here ply_io_benchmark.
Looking at the results tinyply and happly seem to require less code to get working and amongst them, tinyply2.2 seems to be almost 5 times faster. Thus I decided to use tinyply.
The problem with tinyply was that it is unable to parse variable length lists(a quad coming after a triangle and such). But, I need that functionality as well. The developer of tinyply commented in an issue that this will be added in version 2.3. Thus, I switched to happly. It is able to parse variable length primitives and I can check if a quad is read or a triangle is read. On a quad read, I need to "cut" it into two separate triangles, keeping winding order in mind(indices: 0,1,2 and 0,2,3 creates two ccw oriented triangles).
The great thing about having implemented PLY parsing is that, now I have the ability to find complex scene data on the internet. https://people.sc.fsu.edu/~jburkardt/data/ply/ply.html has many ply files that I am able to use, the only problem is that I need to manually add the lights at correct positions to be able to see a nice looking output, at least the vertex data is there.
airplane model rendered in 0.242s

Smooth Shading

So far, I have been doing what is called flat shading. It is the method of using the normal vector of the primitive(triangle) when calculating shading on any point of the primitive. An alternative approach is to perform smooth shading. This way, we calculate a normal vector for each vertex and interpolate these normal vectors according to barycentric coordinates of the hit position. This helps increase visual complexity without adding much computational complexity.
In order to integrate this to my raytracer program, I had to rewrite the API a little bit. I was storing vertex positions in a vector as Vec3. I needed to add a new class called vertex so that I could store normal of the 'vertex' along with its position. The term "normal of a vertex" was a little confusing to me when I first heard it. It simply means the average of the normals of the triangles that contain this vertex, thus I needed to also count how many triangles this vertex is adjacent to as I was parsing the scene file.

BackFace Culling on Primary Rays

BackFace culling is the process of discarding the primitives that are facing the opposite direction from the camera.
You may remember the problems I had in obtaining a more similar image the reference image for scienceTree.xml scene. Adding BackFace culling on every ray actually helped me to get very close to it but wasn't able to get the exact one nonetheless.
One of the new inputs is tap_0200.xml. This is a scene that I came very close to, but some parts of it are too shadowy. When I talked to my professor from before, he suggested that I do backface culling only on primary rays. Upon doing so, I noticed the shadowy parts are no longer there and it matches almost 100%. However, going back to scienceTree, I see it being too bright once again. I may need to look into this while working on the next homework.
dimmer one is with all rays culled, brighter with only primary rays culled

Distribution Ray Tracing

Main idea behind distribution ray tracing is perform multisampling while adding some randomization to some choices in order to achieve certain effects. This will allow us to have different sort of effects at almost no cost.
I would like to say this part has been extremely hard to get correct(my results still differ from references) while being theoretically and practically the simplest.

Depth of Field

Depth of Field is a nice looking effect that models the real life camera. The new attributes added to the camera are focus distance and aperture size. The main idea is that at focus distance, rays sent from the same origin are bundled together, however moving closer or away from that distance gives us a more blurry view. Thus, the light is focused at some point(compare it to pinhole camera, where everything is in focus).
The implementation requires some trigonometry, but it can be found on many books. For each ray that should be sent, we randomly choose a point in an area of aperture size². Although cameras usually have a circular aperture, it has been found that rectangle is faster and results do not differ that much. The output has some artifacts, which I did not like and spent many hours making experimental changes but to no avail. I should add, for each scene requiring distribution ray tracing(DRT), I have had artifacts in general. I will explain how I solved those in the next section, although the method I use is not the expected way of doing DRT.
spheres_dof.xml rendered in 4.198s

Area Lights and Soft Shadows

Up to now, we have been using PointLights for shading calculations. A point is left in shadow if an object exists between the light and the point and else it is not. This means that the transition between a shaded pixel and shadow pixel is sharp.
In order to introduce soft shadows, we add a new sort of light, namely AreaLight. As the name suggests, it has an area(although in the images it is still invisible) and a point may be in shadow or not depending on multiple samples. When a ray hits a point, we create a shadow ray and randomly choose a point inside the area light. We were already sending multiple rays, so some of them will get shaded, some not, depending on random samples on the area and we will obtain soft transitions between light and dark areas.
cornellbox_area.xml rendered in 7.600s

Glossy Reflections 

The last part of DRT for this homework is glossy reflections. It tries to simulate mirror like objects that are rough or rugged surface. The idea is the same as the others. For each ray hitting the glossy object, we create a reflection ray just as in the mirror reflection case, however, randomly deviate the direction according to the roughness attribute specific to glossy objects. With the help of multisampling, these rays will intersect different locations, causing a more of a blurry view instead of a mirror like view.
There are some noticeable differences with the reference image, on glossy surface, reference is more blurry while my outputs are less blurry. This might be related to deviation amount but I wasn't able to figure it out.
cornellbox_area_brushed_metal rendered in 31.843s

My Problems with DRT

As I have mentioned previously, and can be seen from the images above, I have not been able to remove these artifacts. I noticed that increasing the number of samples per pixel definitely helped with the quality. Then, for each ray, when I need to randomly sample a point(this sample point could be camera point, arealight point or a glossy reflection deviation), I also apply jittered multisampling and generate more points and rays. For depth of field and glossy reflections, this situation inevitably increases the execution time(also for arealights but I will come to that).
For DOF, this means that each randomly sampled point on the image plane is calculated as the average of multiple calculations resulting from different ray origins, then they are in turn calculated as in multisampling part and averaged. I sample points on the camera according to its size.
Similarly, for glossy reflections, each time a glossy hit happens, multiple deviated reflection rays are created and shaded. I sample deviations according to some preset glossy depth value.
Lastly, for arealights, I sample shadow rays depending on the size of the arealight. Although the other two increase the execution time by adding more rays to calculate, this one doesn't effect that much. The shadow rays immediately discarded if they hit an object along the path and else just diffuse and specular calculations are done. The sample point on the area light is treated like a point light.
As far as I understand, I shouldn't have been doing something like this and just multisampling should be enough. Maybe, I will find out what I have been doing wrong at a later time.

Final Renders
cube.xml rendered in 0.159s

chinese_dragon.xml rendered in 2.350s

other_dragon.xml rendered in 3.231s


tap_0200.xml rendered in 4.515s

spheres_dof.xml rendered in 16.630s (recursive MS)

cornellbox_area.xml rendered in 37.792s (recursive MS)

cornellbox_brushed_metal.xml rendered in 3m25.705s (recursive MS)

Problematic Renders and How I Solved Them
cornellbox_area rendered in 9.020s

As you can see, the colors in this picture are "correct" but looking extremely dim. The problem was that when sampling from arealight, I needed to multiply the intensity with size² because we are simulating the arealight.
cornellbox_area rendered in 7.687s
The problem is with the light. I was clamping each raycolor to 256 whereas I should have let them over saturate and clamp only after averaging them(or before write to image)

See you in the next post!

Comments