Chi-Tai, Research, Software and Music

Computer Science, Software and Afro Cosmic Music

OpenCL: kernel optimization of ARGB to YUV

As a sidenote to the recent post, the therein presented kernel is already superfast, but guess what :-)

There are ways to make it even faster by virtue of memory access optimizations. Let’s consider the memory access of U and V plane. They both access the same bits within the same dimension, thus can be consolidated into the same work item to access the memory only once (global memory access is the slowest memory access type).

Futhermore, “flattening” the work-group from 2D to 1D enables faster sequential memory access instead of the presented 2D access, hence benefit much better from prefetching and probably help avoiding bank conflicts…

So far for optimizations.. If there is demand on an appropriate kernel, then drop me an email..

2 comments

2 Comments so far

  1. Antoine Martin August 26th, 2013 3:34 pm

    Hi,

    We may need OpenCL code to do exactly what you did: ARGB to YUV – can you specify the license for the code you published? (It would need to be GPLv2+ compatible for us to be able to use it..)
    FYI: newer versions of x264 can encode ARGB directly, but we have other plans ;)

    Thanks
    Antoine
    xpra.org

  2. Chi-Tai August 24th, 2014 5:26 am

    Hi Antoine,

    i’m sorry for the really late response. Of course, you can use it under the GPL. It’s published under the GPL as part of the Environs-framework http://hcm-lab.de/environs

    Chi-Tai