By "down under," I mean as close to the kernel as possible, before you get too excited, it's not really that close, but it's as close as system libraries. I finally got audio to work! Not perfectly, but still functional. Turns out I just needed to use the right library.
This was mostly thanks to this YouTube video explaining what all the libraries are. At first, I thought PulseAudio was like an abstraction over ALSA, similar to how GTK was for X11. That was quite the wrong assumption.
Yesterday, when trying to figure out why I couldn't get ALSA to work well with Zig, I discovered that ALSA was in turn calling PulseAudio, which was also the reason I couldn't access the hardware device directly. It turns out only one program can access the audio device memory at a time which makes sense. It's not too bad though, as unknowingly trying to wrestle with PulseAudio via ALSA made it super easy to switch to it directly.
It also helps that they have stayed consistent with the Linux APIs and have a "simple" version for simple people. It's the same as the complex one but just simpler. There are other details, and the complex API does give you a lot more control if you want it. I might switch to it later if I want direct memory access, but for now it'll do.
Some patterns
There are a few patterns that I've observed. I wonder if it's because Linux is used in servers so much, but both the window and the audio APIs follow a server-client format when defining and then using them. This was quite strange to see at first when I tried to open a window using X11.
The second pattern is my favorite: all three X11, ALSA, and PulseAudio had simple and complex versions of their APIs. The PulseAudio one goes as far as to have a whole different binary for the simple API. I learned this after getting stumped by a build error for a while. The good thing is the simple ones can give you an overview, which you can then use to jump into the complex API. Although that's just another assumption of mine, as I haven't done it yet. Maybe tomorrow.
Finally, and my most favorite pattern, is that everything is just bytes. Just provide a buffer of bytes, let the API know how much there is, and it'll write it to memory. It's so simple and elegant that I quite enjoy it. You get total control, and as long as you can produce a buffer of data, you can output it to a window or audio.
Just some good programming fun
Overall, this has been quite enjoyable for me, also frustrating at times because I don't really know what I'm doing, but I'm so glad I decided to do this. I'd recommend it to any programmer who wants to build some fun stuff while programming. You'll learn a ton too.
For example, X11 is a library from the 80s, and apart from the strange naming convention, the API is super useful and gives you as much control as you would like. It's a window into programs that ran on much stricter constraints. I've been really enjoying seeing how people programmed back in the day.