Skip to main content

Posts

Showing posts from 2015

LD_DEBUG

Posting this mainly as a reminder to myself... If you ever find yourself needing to figure out a dynamic library loading problem on Linux, LD_DEBUG can be a massive help. This is an environment variable you can set to make the dynamic linker print out a ton of useful diagnostic info. There are a number of different values which control the amount and type of diagnostics printed. One of the values is help; if you set LD_DEBUG to this and run executable it will print out a list of all the available options along with brief descriptions. For example, on my Linux workstation at the office: > LD_DEBUG=help cat Valid options for the LD_DEBUG environment variable are: libs display library search paths reloc display relocation processing files display progress for input file symbols display symbol table processing bindings display information about symbol binding versions display version dependencies all all previous options combi...

Octree node identifiers

Let's say we have an octree and we want to come up with a unique integer that can identify any node in the tree - including interior nodes, not just leaf nodes. Let's also say that the octree has a maximum depth no greater than 9 levels, i.e. the level containing the leaf nodes divides space into 512 parts along each axis. The encoding The morton encoding of a node's i,j,k coordinates within the tree lets us identify a node uniquely if we already know it's depth. Without knowing the depth, there's no way to differentiate between cells at different depths in the tree. For example, the node at depth 1 with coords 0,0,0 has exactly the same morton encoding as the node at depth 2 with coords 0,0,0. We can fix this by appending the depth of the node to the morton encoding. If we have an octree of depth 9 then we need up to 27 bits for the morton encoding and 4 bits for the depth, which still fits nicely into a 32-bit integer. We'll shift the morton code up so t...

Common bit prefix length for two integers

Here's a neat trick I discovered a couple of months back: given two signed or unsigned integers of the same bit width, you can calculate the length of their common prefix very efficiently: int common_prefix_length(int a, int b) { return __builtin_clz(a ^ b); } What's this doing? Let's break it down: a ^ b is the bitwise-xor of a and b. The boolean xor operation is true when one of it's inputs is true and the other is false; or false if both have the same value. Another way to put this is that xor returns true when its inputs are different and false if they're the same. The bitwise-xor operation then, returns a value which has zeros for every bit that is the same in both a and b; and ones for every bit that's different. __builtin_clz is a GCC intrisinc function which counts the number of leading zero bits of its argument. It compiles down to a single machine code instruction on hardware that supports it (which includes every Intel chip made i...

Image based lighting notes

These are some of my notes from implementing image based lighting, a.k.a. IBL. I thought I understood it pretty well until I started implementing it. Now, after a lot of reading, discussing and trying things out, I'm finally getting back to the stage where I tentatively believe I understand it. Hopefully these notes will save future me, or anyone else reading them, from going through the same difficulty. I'll update them if I find out anything further. Some helpful references GPU-Based Importance Sampling , GPU Gems 3, Chapter 20 . Real-Time Computation of Dynamic Irradiance Maps , GPU Gems 2, Chaper 10. Physically Based Shading in Theory and Practice , SIGGRAPH 2013 Course. In particular, the course notes for "Real Shading in Unreal Engine 4" are very helpful. The radiance integral $$L_o(v) = L_e(v) + \int_\Omega L_i(l) f(l, v) (n \circ l) dl$$ The amount of light reflected at some point on a surface is the sum, over all incoming directions, o...

Faster morton codes with compiler intrinsics

Today I learned that newer Intel processors have an instruction which is tailor-made for generating morton codes: the PDEP instruction. There's an instruction for the inverse as well, PEXT . These exist in 32- and 64-bit versions and you can use them directly from C or C++ code via compiler intrinsics: _pdep_u32/u64 and _pext_u32/u64 . Miraculously, both the Visual C++ and GCC versions of the intrinsics have the same names. You'll need an Intel Haswell processor or newer to be able to take advantage of them though. Docs for the instructions: Intel's docs GCC docs Visual C++ docs This page has a great write up of older techniques for generating morton codes: Jeroen Baert's blog ...but the real gold is hidden at the bottom of that page in a comment from Julien Bilalte, which is what clued me in to the existence of these instructions. Update: there's some useful info on Wikipedia about these intructions too.

Awesome tools for Windows users

I moved back to Windows on my home computer a few months back. There are a few amazing free tools I've found since then that have been making my life better and I thought they deserved a shout-out. They are: SumatraPDF A fantastic PDF reader. Does everything I want and nothing I don't. RapidEE A sane way to edit environment variables. The simple joy of just being able to resize the window is... incredible. 7-Zip The best tool for dealing with compressed files on windows, bar none. GPU-Z A really handy way to see details about your GPU(s). XnView An amazingly good image viewer, which can also do bulk file format conversions. If you haven't already got these... go get them!

Whole program lexical analysis

I was thinking about parsing and lexical analysis of source code recently (after all who doesn't... right??). Everywhere I've looked - which admittedly isn't in very many places - parsers still seem to treat input as a stream of tokens. The stream abstraction made sense in an era where memory was more limited. Does it still make sense now, when 8 Gb of RAM is considered small? What if, instead, we cache the entire token array for each file? So we mmap the file, lex it in place and store the tokens in an in-memory array. Does this make parsing any easier? Does it lead to any speed savings? Or does it just use an infeasible amount of memory? Time for some back-of-the napkin analysis. Let's say we're using data structures kind of like this to represent tokens: enum TokenType { /* an entry for each distinct type of token */ }; struct Token { TokenType type; // The type of token int start; // Byte offset for the start of the token int end; ...