Fixes a few bottlenecks that were encountered in the Cascading Shadow
Maps demo from the Microsoft SDK. Performance is now slightly better
than wined3d with CSMT, MESA_NO_ERROR and mesa_glthread enabled.
Command submission now does not synchronize with the device every single
time. Instead, the command list and the fence that was created for it are
added to a queue. A separate thread will then wait for the execution to
complete and return the command list to the device.