Friday, August 2, 2013

Digging Into the Oculus SDK - Part 1: The Worker Thread and the Command Queue

When working with new technology, it is not always immediately obvious where one should start when debugging an issue, attempting to experiment, or even just following the code to understand what's going on.  This is the first in a series of guides meant to share some hard-won understanding of some of the internals of the Oculus VR SDK; with luck you might find advice on where you need to focus your efforts to find a specific bit of functionality.

The Oculus VR SDK is multi-threaded and as a result can be tricky to debug.


The SDK relies on a worker thread approach to do most of it's work.  Virtually all of the calls that a client makes result in a command being queued and then executed in another thread.  Even if you're making a call that will return some data, you command is queued, and the function blocks on the queued function being finished, before returning the results.  This approach drastically simplifies certain aspects of multi-threaded programming.

The biggest concern with multi-threaded programs and the majority of text on the subject is about dealing with synchronization and race conditions.  By adopting the worker thread and command queue approach, the SDK designers ensure that the bulk of the code they write will only ever be executed by a single thread.  You might ask then why the SDK is multi-threaded at all.

First off, even if the SDK didn't launch it's own thread to do work, there's no guarantee that every call into the SDK from client code would be from the same thread.  There's no practical way for it to force single-threaded access to itself.  Second, the SDK must be multi-threaded in order to properly process the head tracker messages in a timely fashion.  The only alternative would be to either require the client to deal with the headache of calling the tracker processing code frequently enough, or to implement his own thread to do so.  The latter would expose the client to all the same issues of thread safety we mention above.  Even though that is a burden many application developers choose to take on anyway, there is no reason for the SDK to make it more difficult by requiring special attention to be paid to which functions must be called in the same thread and which can be called anywhere.  The end result is an interface which doesn't care what the external threading model is and generally speaking, doesn't need to internally worry about race conditions.  The only downside is that if your application were to flood the command queue faster than the thread could drain it, you'd effectively execute a denial of service attack against the hardware message handling code.  So don't do that.

Tricky to debug

The upshot of all this is that it can be very tricky to debug the SDK.  Tracing into methods you call from outside the SDK leaves you with no way to drill down into the implementation.  Instead, you hit a wall where you see a command queued, and then, depending on the method you called, you see the function wait for the result of the command.  The only way to see the actual work being done would be to have a priori knowledge of where the implementation was and put a break point there so it could be hit by the worker thread.  If you're tracing into methods in order to find out where the implementation is, you're out of luck.

The most important loop

In order to start drilling down into implementations, we need to put a breakpoint inside the worker thread.   I'd love to provide a link to the thread, but as it happens I have to provide three.  Although the SDK contains some amount of abstraction of threading in the Kernel, the exact details of the waiting mechanism are sufficiently different on each platform that it requires a completely different loop for each operating system.

They are, for the three supported operating systems

int OVR::Win32::DeviceManagerThread::Run()

int OVR::OSX::DeviceManagerThread::Run()

int OVR::Linux::DeviceManagerThread::Run()

In each case, the general outline of the function is the same.  Control enters a loop that isn't terminated until IsExiting() is true.  Inside the loop, check if there is a queued command waiting to be handled.  If there is, handle it and iterate.  Otherwise enter another inner loop that's designed to handle messages from the open handles to the hardware devices.  The details of those vary drastically by platform, but we can disregard them for the moment.  What we're interested in is finding the implementations of methods called from outside the SDK.

However, if you want to know what's going on with the command processing, it's fairly easy to put a breakpoint on the command.Execute() line near the top of each function.  This breakpoint won't be hit until some function in the SDK is called and the queued command is about to be executed.  At this point it's possible to start stepping down into the code to get to the actual implementation of whatever was called.  It does take a few steps down the stack to get past the command queue boilerplate, but as long as you keep stepping down you'll eventually find it.

Being able to find these implementations allows you to place more specific breakpoints on them and more easily diagnose the problem when you have issues with the SDK not finding the tracker or the display.  It won't help you in diagnosing issues with SensorFusion or other handling of messages from the hardware, because that's handled in the other inner loop.  That code will be the subject of our next installment.

No comments:

Post a Comment