Thursday, November 6, 2014

Thoughts on an alternative approach to distortion correction in the OpenGL pipeline

Despite some of the bad press it's gotten lately, I quite like OpenGL.  However, it has some serious limitations when dealing with the kind of distortion required for VR.

The problem

VR distortion is required because of the lenses in Ouclus Rift style VR headsets.  Put (very) simply, the lenses provide a wide field of view even though the screen isn't actually that large, and make it possible to focus on the screen even though it's very close to your eyes.

However, the lenses introduce curvature into the images seen through them.  If you render a cube in OpenGL that takes up 40° of your field of view, and look at it through the lenses of the Rift, you'll see curvature in the sides, even though they should be straight.

In order to correct for this, the current approach to correction is to render images to textures, and then apply distortion to the textures.  Think of it as painting a scene on a canvas of latex and then stretching the latex onto a curved surface.  The curvature of the surface is the exact inverse of the curvature introduced by the lenses, so when you look at the result through the lens, it no longer appears distorted.

However, this approach is extremely wasteful.  The required distortion magnifies the center of the image, while shrinking the outer edges.  In order to avoid loss of detail at the center, the source texture you're distorting has to have enough pixels so that at the area of maximum magnification, there is a 1:1 ratio of texture pixels to screen pixels.  But towards the edges, you're shrinking the image, so all your extra rendered pixels are essentially going to waste.  A visual representation of this effect can be seen in my video on dynamic framebuffer scaling below, at about 1:12.

A possible solution...

So how do we render a scene with distortion but without the cost of all those extra pixels that never make it to the screen?  What if we could modify the OpenGL pipeline so that it renders only the pixels actually required?

The modern OpenGL pipeline is extremely configurable, allowing clients to write software for performing most parts of it.  However, one critical piece of the pipeline remains fixed: the rasterizer.  During rendering, the rasterizer is responsible for taking groups of normalized devices coordinates (where the screen is represented as a square with X and Y axes going from -1 to 1) representing a triangle and converting them to lists of pixels which need to be rendered by the fragment shaders.  This is still a fixed function because it's the equivalent of picking 3 points on a piece of graph paper and deciding which boxes are inside the triangle.  It's super easy to implement in hardware, and prior to now there hasn't been a compelling reason to mess with it.

But just as the advent of more complex lighting and surface coloring models made the fixed function vertex and fragment shaders in the old pipeline led to the rise the current model, the needs of VR give us a reason to add programmability to the rasterizer.  

What we need is a way to take the rasterizers traditional output (a set of pixel coordinates) and displace them based on the required distortion.  

What would such a shader look like?  Well, first lets assume that the rasterizer operates in two separate steps.  The first takes the normalized devices coordinates (which are all in the range [-1,1] on both axes) and outputs a set of N values that are still in normalized devices coordinates.  The second step displaces the output of the first step based on the distortion function.

In GLSL terms, the first step takes three vec3 values (representing a triangle) and outputs N vec3 coordinates.  How many N depends on how much of the screen the triangle covers and also the specific resolution of the rasterization operation.  This would not be the same resolution as the screen for the same reason that we render to a larger than screen resolution texture in the current distortion method.  This component would remain in the fixed function pipeline.  It's basically the same as the graph paper example, but with a specific coordinate system.  

The second step would be programmable.  It would consist of a shader with a single vec2 input and a single vec2 output, and would be run for every output of the first step (the vec3's become vec2's because at this point in the pipeline we aren't interacting with depth, so we only needs the xy values of the previous step).  

in vec2 sourceCoordinate;
out vec2 distortedCoordinate;

void main() {
  // Use the distortion function (or a pre-baked structure) to 
  // compute the output coordinate based on 
  // the input coordinate

Essentially this is just a shader that says "If you were going to put this pixel on the screen here, you should instead put it here".  This gives the client the displace the pixels that make up the triangle in exactly the same way they would be displaced using the texture distortion method currently used, but without the cost of running so many extra pixels through the pipeline.  

Once OpenGL has all the output coordinates, it can map them to actual screen coordinates.  Where more than one result maps to a single screen coordinate, OpenGL can blend the source pixels together based on each's level of contribution, and send the results as a single set of attributes to the fragment shader.  

The application of such a rasterization shader would be orthogonal to the vertex/fragment/geometry/tesselation shaders, similar to the way compute shaders are independent.   Binding and unbind a raster shader would have no impact on the currently bound vertex/fragment/geometry/tesselation shader, and vice versa.  

Chroma correction

Physical displacement of the pixels is only one part of distortion correction.  The other portion is correction for chromatic aberration, which this approach doesn't cover.

One approach would be to have the raster shader output three different coordinates, one for each color channel.  This isn't appealing because the likely outcome is that the pipeline then has to run the fragment shader multiple times, grabbing only one color channel from each run.  Since avoiding running the fragment shader operations more than we have to is the whole point of this exercise, this is unappealing.

Another approach is to add an additional shader to the program that specifically provides the chroma offset for each pixel.  In the same way you must have both a vertex and a fragment shader to create a rendering program in OpenGL, a distortion correction shader might require both a raster and a chroma shader.  This isn't ideal, because only the green channel would be perfectly computed for the output pixel it covers, while the red and blue pixels would be covering either slightly more or slightly less of the screen than they actually should be.  Still it's likely that this imperfection would be well below the level of human perception, so maybe it's a reasonable compromise.


You want to avoid situations where two pixels are adjacent in the raster shader but the outputs have a gap between them when mapped to the screen pixels.  Similar to the way we use a higher resolution than the screen for textures now, we would use a higher resolution than the screen for the rasterization step, thus ensuring that at the area of greatest magnification due to distortion, no two two adjacent input pixels cease to be adjacent when mapped to actual physical screen resolution

An unavoidable consequence of distortion, even without the above resolution increase is that pixels that are adjacent in the raster shader inputs will end up with their outputs mapping to the same pixel.  

Depending on the kind of distortion required for a given lens, the calculations called for in the raster shader might be quite complex, and certainly not the kind of thing you'd want to be doing for every pixel of every triangle.  However, that's a fairly easy problem to solve.  When binding a distortion program, the OpenGL driver could precompute the distortion for every pixel, as well as precompute the weight for each rasterizer output pixel relative to the physical screen pixel it eventually gets mapped to.  This computation would only need to be done once for any given raster shader / raster raster resolution / viewport resolution required.  If OpenGL can be told about symmetry even more optimization is possible.  

You end up doing a lot more linear interpolation among vertex attributes during the rasterization state, but all this computation is still essentially the same kind of work the existing rasterization stage already does, and far less costly than a complex lighting shader executed for a pixel that never gets displayed. 

Next steps

  • Writing up something less off the cuff
  • Creating a draft specification for what the actual OpenGL interface would look like
  • Investigating a software OpenGL implementation like Mesa and seeing how hard it would be to prototype an implementation
  • Pester nVidia for a debug driver I can experiment with
  • Learn how to write a shader compiler
  • Maybe figure out some way to make someone else do all this

1 comment:

  1. I used an Oculous once in a exhibition, I think it has more potential as a product. Graphics wisely it was bit off the perfection but it was ok. OpenGL was there I cant believe it was that, I mean the graphics were better than 1st gen video games. Thanks