whizzter a day ago

I wrote an non-RTX on-GPU raytracer a while back (naive compared to this) and it's super-interesting to read about the advances in compressing BVH structures.

But the changes also highlights a change in focus from just implementing this naively(RDNA3 technically not too much removed from the naive raytracer I wrote) to moving it to something carefully engineered and optimized for memory bandwidth (with savings circuits even built into silicon?).

  • ahartmetz 20 hours ago

    Seems very likely that the hardware decompresses the data more or less on the fly. The acceleration structures are for the hardware, arithmetics hardware is cheap (compared to memory access), and they could use the compressed structures on older hardware with new drivers if hardware support wasn't necessary.

vardump a day ago

Smaller data is where it’s at when optimizing nowadays. Less bandwidth required and higher cache hit rate.

You can compute a ton per bit transferred from DRAM. On both CPUs and GPUs.