First thoughts on Vulkan Video API’s in the context of a post production pipeline

Vulkan has some new extensions for decoding video. Like everything with Vulkan, they seem to be kind of a pain / kind of awesome. I don’t have experience using them in practice yet, but I have poked through the extensions and some sample code.

So… Disclaimer: I’m not *good* at Vulkan. At best, I know just barely enough to have an opinion. A ton of people have spent way more time with the API than I have. And they have done way more interesting things than me. That said, a lot of the information out there is focused on gamedev. Vulkan has a ton of functionality that can be handy for offline image processing type tasks, but you have to figure out some of the details yourself. Hopefully, some of my notes may be handy for you if you are going down this path. Since I have been playing with Vulkan, a few people have asked me questions about it, and I have started making some notes that I figured I may as well share in case they prove useful to anybody. The target audience here is admittedly very narrow — people who know enough about Vulkan to want to do stuff with it, but not enough to just go read the extension specifications themselves.

How to use Vulkan Video

An Introduction to Vulkan Video - The Khronos Group Inc
A diagram of stuff cool you can’t actually use until you learn low level FFmpeg API’s. A maintainer of the wildly under-appreciated FFMS2 library said, “Wrangling the FFmpeg API is truly a fate worse than death.”

Before you get to any of the Vulan API’s, you need to use something like FFMPEG to read the MP4 file on disk, parse it, and demux it to get the raw compressed video frames to feed into the GPU. So, you are already using a library that can decode the video file you want to access. By the time you have gotten far enough to start using Vulkan Video, you have already learned enough FFMPEG that you can absolutely just use FFMPEG and ignore Vulkan Video. Using that library with Vulkan means poking into some slightly odd, low level usage, and coordinating the GPU yourself. There is no simple VkCmdReadVideoFile(“C:/”); So, consider Vulkan Video as a fast-path optimization that will be useful in some contexts. Don’t think of it as an all-in-one floor topping and desert wax that you can learn in isolation to solve every problem.

If that hasn’t scared you off, Khronos has an introduction to get you started.

Vulkan Video doesn’t work like QuickTime — You can’t have a program encode or decode with codecs that are detected at runtime, or that didn’t exist when a program was written. With an API like QuickTime, the end-user can install codec plugins that are referenced by name, and an old application can present a user with a dialog with a bunch of codec options that the application developer never heard of. In theory, software written in the 90’s can still access R3D footage from a camera that comes out tomorrow using QuickTime with no source modifications to the application. The application just asks QuickTime to figure out the right codec for the file, and QuickTime will give the application decoded frames if there is a suitable codec installed in the system.

Vulkan Video uses explicit extensions with enums to reference H.264 and H.265 that need to be hard coded in the application. In theory, somebody could make a Vulkan extension for a more obscure format like R3D. Unfortunately, it doesn’t look like it would be trivial for an end-user to drop a codec plugin for Vulkan R3D decoding into a directory and get support for it by magic. The application would need to explicitly load the hypothetical VK_EXT_video_decode_R3D extension, and know how to use something analogous to the VkVideoDecodeH264PictureInfoEXT data structure. That structure comes from a codec specific header like vulkan_video_codec_h264std_decode.h which is needed at compile time. Vulkan is extensible. But it’s extensible at program creation time, not at runtime.

All of that said, if you get support for decoding one codec working, adding another shouldn’t be hard. The actual decode step is done with a single command that is not codec specific: vkCmdDecodeVideoKHR. So if you have command buffer creation that decodes a video frame, that part doesn’t need to be modified when you add support for “H.266” decoding a few years from now. The codec-specific setup code can be well contained in a specific module. If your internal API’s are object oriented, it’s easy to imagine an abstract VulkanVideoSetup class with some virtual methods, and you just need to make a new derived class when you want to support a new codec.


Nope. And F#&*$ you for asking.

As far as I can tell, you can only decode video with this API. The API includes a VkVideoPictureResourceKHR type, but nothing like a VkVideoAudioResourceKHR. If your goal is to decode some audio from an MP4, and run a compute shader on it, you need to decode the audio on the CPU side and upload the uncompressed samples to the GPU where you can run VkFFT or whatever on it. Audio samples are unlikely to saturate your upload queue between host and GPU, but obviously uploading the compressed audio would be less bandwidth.

As a result, a plausible scenario for a GPU accelerated multimedia application looks like:

  • Use FFMPEG to read an MP4 file and decode audio on the CPU.
  • Upload full audio samples to GPU.
  • Run a compute shader for whatever audio processing you want.
  • Download the resulting processed samples back to the CPU/Host memory.
  • Copy those samples into an audio playback API like QAudioOutputDevice, or platform specific equivalent.
  • Under the hood, let that API upload those same audio samples back to the GPU, because your sound is actually coming over HDMI to the speakers built into your monitor.

Like I said, audio is smaller than high resolution image data so this is unlikely to break your application. But it does mean a lot of copies, concurrency and coordination happening — for something that needs to be real time and low latency. We can see a potential improvement even if we ignore the benefits of using the GPU’s hardware decoder to decode the audio more quickly than doing it in software on the CPU. It would certainly be closer to optimal if you could get the decoded audio samples directly on the GPU just because of having to make fewer copies. That said, Vulkan is unlikely ever to have a vkCmdPlayAudio() function, so this scenario is never going to be perfect. You can potentially reduce the number of copies by just using VkMapMemory on the audio samples and passing a pointer to that mapped memory into QAudioOutputDevice, bypassing the step of copying it into host memory. But you then need to worry a little more about how the audio API is doing copies, and the lifetime of the mapping is one more thing to keep track of. So you might save a copy but the things that can go wrong are further out of your control. Sigh, computers are terrible and good audio code is impossible if that code needs to run on a computer.


So, Vulkan Video is absolutely awesome for 99% of users that just need to encode/decode something like H.264. If you are rendering a 3D scene and want to make a live stream, you can go straight from a VkImage to the GPU’s hardware encoder without extra copies of the data. It’s just not flexible enough to build something like a video editor that needs to be able to consume arbitrary codecs. You absolutely need a software or alternate API fallback path for general purpose applications like pipeline ingest tooling.

Vulkan 1.2.175 specification brings provisional 'Vulkan Video' extensions -  Neowin

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s