Mobile VR Performance
Be conservative on performance. Even though two threads are dedicated to the VR application, a lot happens on Android systems that we can’t control, and performance has more of a statistical character than we would like. Some background tasks even use the GPU occasionally. Pushing right up to the limit will undoubtedly cause more frame drops, and make the experience less pleasant.
You aren’t going to be able to pull off graphics effects under these performance constraints that people haven’t seen years ago on other platforms, so don’t try to compete there. The magic of a VR experience comes from interesting things happening in well-composed scenes, and the graphics should largely try not to call attention to themselves.
Even if you consistently hold 60 FPS, more aggressive drawing consumes more battery power, and subtle improvements in visual quality generally aren’t worth taking 20 minutes off the battery life for a title.
Keep rendering straightforward. Draw everything to one view, in a single pass for each mesh. Tricks with resetting the depth buffer and multiple camera layers are bad for VR, regardless of their performance issues. If the geometry doesn’t work correctly – all rendered into a single view (FPS hands, et cetera) – then it will cause perception issues in VR, and you should fix the design.
You can’t handle a lot of blending for performance reasons. If you have limited navigation capabilities in the title and can guarantee that the effects will never cover the entire screen, then you will be okay.
Don’t use alpha tested / pixel discard transparency — the aliasing will be awful, and performance can still be problematic. Coverage from alpha can help, but designing a title that doesn’t require a lot of cut out geometry is even better.
Most VR scenes should be built to work with 16 bit depth buffer resolution and 2x MSAA. If your world is mostly pre-lit to compressed textures, there will be little difference between 16 and 32 bit color buffers.
Favor modest “scenes” instead of “open worlds”. There are both theoretical and pragmatic reasons why you should, at least in the near term. The first generation of titles should be all about the low hanging fruit, not the challenges.
The best-looking scenes will be uniquely textured models. You can load quite a lot of textures — 128 Megs of textures is okay. With global illumination baked into the textures, or data actually sampled from the real world, you can make reasonably photo realistic scenes that still run 60 FPS stereo. The contrast with much lower fidelity dynamic elements may be jarring, so there are important stylistic decisions to be made.
Panoramic photos make excellent and efficient backdrops for scenes. If you aren’t too picky about global illumination, allowing them to be swapped out is often nice. Full image-based lighting models aren’t performance-practical for entire scenes, but are probably okay for characters that can’t cover the screen.
Thanks to the asynchronous TimeWarp, looking around in the Gear VR will always be smooth and judder-free at 60 FPS, regardless of how fast or slow the application is rendering. This does not mean that performance is no longer a concern, but it gives a lot more margin in normal operation, and improves the experience for applications that do not hold perfectly at 60 FPS.
If an application does not consistently run at 60 FPS, then animating objects move choppier, rapid head turns pull some black in at the edges, player movement doesn’t feel as smooth, and gamepad turning looks especially bad. However, the asynchronous TimeWarp does not require emptying the GPU pipeline and makes it easier to hold 60 FPS than without.
Drawing anything that is stuck to the view will look bad if the frame rate is not held at 60 FPS, because it will only move on eye frame updates, instead of on every video frame. Don’t make heads up displays. If something needs to stay in front of the player, like a floating GUI panel, leave it stationary most of the time, and have it quickly rush back to center when necessary, instead of dragging it continuously with the head orientation.
Per scene targets:
- 50k to 100k triangles
- 50k to 100k vertices
- 50 to 100 draw calls
An application may be able to render more triangles by using very simple vertex and fragment programs, minimizing overdraw, and reducing the number of draw calls down to a dozen. However, lots of small details and silhouette edges may result in visible aliasing despite MSAA.
It is good to be conservative! The quality of a virtual reality experience is not just determined by the quality of the rendered images. Low latency and high frame rates are just as important in delivering a high quality, fully immersive experience, if not more so.
Keep an eye on the vertex count because vertex processing is not free on a mobile GPU with a tiling architecture. The number of vertices in a scene is expected to be in the same ballpark as the number of triangles. In a typical scene, the number of vertices should not exceed twice the number of triangles. To reduce the number of unique vertices, remove vertex attributes that are not necessary for rendering.
Textures are ideally stored with 4 bits per texel in ETC2 format for improved rendering performance and an 8x storage space reduction over 32-bit RGBA textures. Loading up to 512 MB of textures is feasible, but the limited storage space available on mobile devices needs to be considered. For a uniquely textured environment in an application with limited mobility, it is reasonable to load 128 MB of textures.
Baking specular and reflections directly into the textures works well for applications with limited mobility. The aliasing from dynamic shader based specular on bumped mapped surfaces is often a net negative in VR, but simple, smooth shapes can still benefit from dynamic specular in some cases.
Dynamic lighting with dynamic shadows is usually not a good idea. Many of the good techniques require using the depth buffer, which is particularly expensive on mobile GPUs with a tiling architecture. Rendering a shadow buffer for a single parallel light in a scene is feasible, but baked lighting and shadowing usually results in better quality.
To be able to render many triangles, it is important to reduce overdraw as much as possible. In scenes with overdraw, it is important that the opaque geometry is rendered front-to-back to significantly reduce the number of shading operations. Scenes that will only be displayed from a single viewpoint can be statically sorted to guarantee front-to-back rendering on a per triangle basis. Scenes that can be viewed from multiple vantage points may need to be broken up into reasonably sized blocks of geometry that will be sorted front-to-back dynamically at run-time.
Due to distortion from the optics, the perceived size of a pixel on the screen varies across the screen. Conveniently, the highest resolution is in the center of the screen where it does the most good, but even with a 2560×1440 screen, pixels are still large compared to a conventional monitor or mobile device at typical viewing distances.
With the current screen and optics, central pixels cover about 0.06 degrees of visual arc, so you would want a 6000 pixel long band to wrap 360 degrees around a static viewpoint. Away from the center, the gap between samples would be greater than one, so mipmaps should be created and used to avoid aliasing.
For general purpose, rendering this requires 90 degree FOV eye buffers of at least 1500×1500 resolution, plus the creation of mipmaps. While the system is barely capable of doing this with trivial scenes at maximum clock rates, thermal constraints make this unsustainable.
Most game style 3D VR content should target 1024×1024 eye buffers. At this resolution, pixels will be slightly stretched in the center, and only barely compressed at the edges, so mipmap generation is unnecessary. If you have lots of performance headroom, you can experiment with increasing this a bit to take better advantage of the display resolution, but it is costly in power and performance.
Dedicated “viewer” apps (e-book reader, picture viewers, remote monitor view, et cetera) that really do want to focus on peak quality should consider using the TimeWarp overlay plane to avoid the resolution compromise and double-resampling of distorting a separately rendered eye view. Using an sRGB framebuffer and source texture is important to avoid “edge crawling” effects in high contrast areas when sampling very close to optimal resolution.