Monday, October 13, 2014

Silly r/e tool nonsense hacks

In the process of reverse engineering work for freedreno, I've cobbled together some interesting tools.  The earliest and most useful of which is cffdump.  (Named after some command-stream dumping debug code in the old kgsl android kernel driver, upon which it was originally inspired.)  The cffdump tool knows how to parse out the "toplevel" command-stream stored as an .rd (re-dump) file, finding packets that load state memory, write registers, IB (indirect branch), etc.  The .rd file contains snapshots of gpu buffers, in order to chase gpu pointers at decode time.  It links in librnn from the nouveau envytools project for the decoding of individual registers, and a few other things.  It also calls out to the freedreno disassembler code to show inline disassembly of shaders, decodes vertex and constant (uniform) buffers, etc.  And even generates pretty color output (thanks to librnn):

A few months back, I added some basic lua scripting support to cffdump, mostly to assist in r/e work for adreno a4xx.  When invoked with the --script argument, cffdump would load the specific lua script, and call the 'draw' function it defines on each CP_DRAW_INDX opcode.  The choice of lua was mostly because it seemed fairly easy to integrate with .c code.

Since then, I've had the thought in the back of my mind that adding script bindings to integrate rnn register decode to lua would be useful for much more.  Such as writing a command-stream validator to check for inconsistent programming.  There are a number of places where inconsistencies between various register settings and such will result in gpu lockup.  The general adreno design philosophy appears to be to not ever dedicate transistors to making the driver writer's life easier... which for a SoC gpu is certainly the right choice, but it doesn't make things any easier for me.  Over time, I've discovered many of these of these rules, but they are mostly all in my head at the moment.  And from time to time, when adding new features to the gallium driver, I inadvertently break one or more of the rules and end up wasting time studying cmdstream dumps from the freedreno gallium driver to figure out what I did wrong.

So, on the way to XDC2014 I started hacking up support for register decoding from lua scripts.  It turns out that time in airports and airplanes, where I can't exactly break out an ifc6410 and hdmi monitor to do some driver work, is a good time to catch up on these sort of projects.  Now I can do nifty things like:

-- load rnn database file for a320:
r = rnn.init("a320")

function start_cmdstream(name)
  io.write("START: " .. name .. "\n")

function draw(primtype, nindx)
  -- simple full register access:
  io.write("GRAS_CL_VPORT_XOFFSET: " .. r.GRAS_CL_VPORT_XOFFSET .. "\n")
  -- access boolean bitfield Z_ENABLE in RB_DEPTH_CONTROL register:
  io.write("RB_DEPTH_CONTROL.Z_ENABLE: " .. tostring(r.RB_DEPTH_CONTROL.Z_ENABLE) .. "\n")
  -- access ROP_CONTROL bitfield inside CONTROL register inside RB_MRT[] array:
  io.write("RB_MRT[0].CONTROL.ROP_CODE: " .. r.RB_MRT[0].CONTROL.ROP_CODE .. "\n")

function end_cmdstream()

function finish()

which will generate output like:

[robclark@thunkpad:~/src/freedreno (master)]$ ./cffdump --script test.lua piglit.rd
Reading piglit.rd...
START: piglit.rd


Currently it should handle all of the rnndb constructs that are used for adreno.  Ie. simple registers, arrays of simple registers, arrays of groups of registers, etc.  No support for "stripes" yet since those are not used for freedreno.

At the moment, all the script bindings are in freedreno.git/util/script.c but if there is some interest in this from nouveau or anyone else using librnn then it would be a good idea to try to refactor some of this into more generic code in librnn.  It would still need a bit of glue from the tool linking librnn to get at the actual register values.

Still needed are a few more script hooks (such as CP_LOAD_STATE) to do everything I need for a validator script.  Hopefully I find some time to work on that before the next conference ;-)

PS. I hope this post is at least a bit coherent.. I am still a bit jetlagged..

Saturday, October 4, 2014

Freedreno Update

A number of people have recently asked what is new with freedreno.  It had been a while since posting an update.. and, well, not everyone watches mesa commit logs for fun, or watches #freedreno on freenode, so it seemed like time for another semi-irregular freedreno blog post.

The tl;dr version: recently it has been a lot of robustness, and bug fixes and smaller feature implementation for piglit, etc.  No one big exciting feature this time.. but lots of little things adding up to make freedreno on a3xx more complete and mature.

And an obligatory screenshot, just because:

(Yeah, webgl should probably be faster in chrome/chromium.. but not packaged for fedora, and chrome build system was invented by someone who wants to make compiling their src as difficult as possible.)


On the mesa/gallium driver front, the big news is that earlier this week we finally achieved a 90% pass ratio for piglit.  (In fact, 90.4%)  To put this in perspective, a little over six months ago freedreno was at just 50% pass.  Since June, we have added around 600 passing tests.  In fact in the last week, an additional ~50 tests are passing, which bumps us up to 91% pass.

For those who are not familiar with it, piglit is an open source OpenGL test suite.  Since the mesa developers are quite good about adding new test cases to piglit whenever adding a new feature/extension to mesa, it is a very comprehensive test suite.  The down side, if you could call it that, is that it has a lot more OpenGL tests compared to OpenGLES (at least for GLES < 3.0).  So getting the pass ratio up involved implementing (and in some cases emulating) a number of features that the blob ES-only driver does not support.  Fortunately enough of the registers and bitfields are known at this point that trial and error with educated guesses (and then see which guesses make piglit tests pass) has worked out reasonably well for some features.  Other features, like GL_CLAMP and two sided color, we need to emulate in the shader, which was implemented as a TGSI to TGSI pass in order to hopefully be useful for other gallium drivers for GLES class hardware.  (And, in fact both of those are things that at least some of the desktop drivers need to emulate as well.)

And big thanks to Ilia Mirkin for a lot of advice and some patches for the failing piglits.  Ilia has also started sending a lot of patches for the compiler to flesh out integer support, add new instructions (in particular texture sample instructions), and other things that will be needed for GL3/GLES3.  In fact as a result of his work, we are already at ~85% pass for GL3 despite missing some bullet-point features!


On the xf86-video-freedreno front, over the last few months we have gained server managed fd's and OutputClass support (so that a sufficiently new xserver can auto-pick the correct driver, like we have had for a long time on desktop/pci systems).  And a hot-off-the-presses 1.3.0 release with a handful of robustness fixes.  I strongly recommend to upgrade.


These last few kernel releases have seen a significant improvement in the state of apq8064/ifc6410 support upstream.  As of the 3.17 kernel, the main things missing to work on a pure-upstream[1] kernel are the rpm/rpm-regulators iommu drivers.  The linaro folks have been a big help there.  In particular, their integration branch, which consists of latest upstream plus in-flight patches, is significantly easier than tracking all the relevant kernel mailing lists.

For drm/msm, the last few kernel releases have seen:  some basic gpu perf and logging debugfs features, DT support for mdp4 (display controller version in apq8064), LVDS and multi-monitor support for mdp4, and mdp5 v1.3 support from qcom for upcoming devices.  And of course bug fixes!

[1] Ie. Linus's tree... kernel-msm or AOSP is not upstream, for any android type's who were confused about that.