Tuesday, January 7, 2014

Visual performance analysis

Eventually, every game needs one of these:

In order to get the game running at at least 30fps, it's important to know where time is being spent and where to focus optimization efforts.  Early on in the development of Solar Escape, I implemented a timer system to log how long various tasks take - it's a simple piece of code that I think I must have written dozens of times over the years.
  The coloured bars each represent an event in rendering the frame. These can be nested so more general events (update, rendering, etc) are shown higher up and more fine grain events (drawing an individual object, for example) are show further down.  When hovering over the mouse over a bar, the name and duration of the events is displayed and the event itself is high lighted. The yellow bars represent time blocks of 10 milliseconds. The green bars represent targets for 30fps and 60fps (if our general's ribbons are to left of both green bars, we're in a 60fps happy place). Solar Escape is a fast moving game and a nice smooth framerate makes all the difference when playing.
  This is all pretty straight forward but it is an invaluable tool and I wish I'd implemented the visualiser earlier on in the project - the real time feedback and whole-frame visualization allows us to rapidly determine where we're spending most of our time each frame and also to spot, at the time of the event, when issues arise. Debugging performance issues can be a pain partly because you can't debug these issues as you would other errors - setting a break point on a potential issue distorts timing calculations and doesn't often yield any useful answers when debugging performance issues.
  Also, printing out timing information in the console is often not very useful unless that data is formatted well and even then, it quickly becomes overwhelming. Furthermore, outputting a lot of text to the console tends to be more expensive than drawing the visualization seen above. I fill a vertex buffer each frame containing the vertex information for around 4000 events - that takes much less time than sending all that information textually to the console and it's easier to read, too. One issue that the performance timer revealed was that rendering the UI was as expensive as rendering all of the per-pixel shaded, post processed in game graphics. This was due to older code not feeding the GPU correctly and failing to batch data - if you rendered each rectangle individually instead of filling a large vertex buffer, you'd spend all of your time rendering the performance visualization which would kind of ruin the exercise.
 From the above screen shot, there are a few areas that need attention.  The large blocks of cyan and red represent Box2D's physics processing. It seems some physics objects are not being disabled when they leave out of the visible play zone and we're spending a lot of time there. The white and green blocks to the right represent older code performing scenegraph and partitioning recalculation - both of which have a lot of obvious areas in need of optimization.

  The first mistake I usually re-make each time I implement one of these is not using a high resolution timer. Fortunately both windows and Unix based systems support high resolution timers in the order of microseconds or millionths of a second (or even nano seconds). The standard resolution timer on windows only returns values in milliseconds (thousandths of a second). If you're implementing your own visualiser, make sure you use 'QueryPerformanceCounter()' instead of 'TimeGetTime()' on windows. Unixes can use the 'clock_gettime()'  function/

The most useful optimization that these kind of tools provide is of development time - making games is hellishly time consuming!