Chi-Tai, Research, Software and Music

Computer Science, Software and Afro Cosmic Music

OpenCL: kernel optimization of ARGB to YUV

As a sidenote to the recent post, the therein presented kernel is already superfast, but guess what :-)

There are ways to make it even faster by virtue of memory access optimizations. Let’s consider the memory access of U and V plane. They both access the same bits within the same dimension, thus can be consolidated into the same work item to access the memory only once (global memory access is the slowest memory access type).

Futhermore, “flattening” the work-group from 2D to 1D enables faster sequential memory access instead of the presented 2D access, hence benefit much better from prefetching and probably help avoiding bank conflicts…

So far for optimizations.. If there is demand on an appropriate kernel, then drop me an email..

1 comment

OpenCL: ARGB to YUV conversion

Just submitted a post at stackoverflow which would also have a nice place here, so.. here we go :-)

It was concerning the topic how to convert ARGB to YUV using the GPU. Some time ago I’ve developed and used the following OpenCL kernel to convert ARGB (typical windows bitmap pixel layout) to the y-plane (full sized), u/v-half-plane (quarter sized) memory layout as input for libx264 encoding.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
__kernel void ARGB2YUV (
                        __global unsigned int * sourceImage,
                        __global unsigned int * destImage,
                        unsigned int srcHeight,
                        unsigned int srcWidth,
                        unsigned int yuvStride // must be srcWidth/4 since we pack 4 pixels into 1 Y-unit (with 4 y-pixels)
                    )
{
    int i;
    unsigned int RGBs [ 4 ];
    unsigned int posSrc, RGB, Value4 = 0, Value, yuvStrideHalf, srcHeightHalf, yPlaneOffset, posOffset;
    unsigned char red, green, blue;
 
    unsigned int posX = get_global_id(0);
    unsigned int posY = get_global_id(1);
 
    if ( posX < yuvStride ) { 
        // Y plane - pack 4 y's within each work item
        if ( posY >= srcHeight )
            return;
 
        posSrc = (posY * srcWidth) + (posX * 4);
 
        RGBs [ 0 ] = sourceImage [ posSrc ];
        RGBs [ 1 ] = sourceImage [ posSrc + 1 ];
        RGBs [ 2 ] = sourceImage [ posSrc + 2 ];
        RGBs [ 3 ] = sourceImage [ posSrc + 3 ];
 
        for ( i=0; i<4; i++ ) { 
            RGB = RGBs [ i ]; blue = RGB & 0xff; green = (RGB >> 8) & 0xff; red = (RGB >> 16) & 0xff;
 
            Value = ( ( 66 * red + 129 * green + 25 * blue ) >> 8 ) + 16;
            Value4 |= (Value << (i * 8)); 
        } 
        destImage [ (posY * yuvStride) + posX ] = Value4; 
        return; 
    } 
    posX -= yuvStride; yuvStrideHalf = yuvStride >> 1; 
 
    // U plane - pack 4 u's within each work item 
    if ( posX <= yuvStrideHalf )
        return;
 
    srcHeightHalf = srcHeight >> 1;
    if ( posY < srcHeightHalf ) {
        posSrc = ((posY * 2) * srcWidth) + (posX * 8);
 
        RGBs [ 0 ] = sourceImage [ posSrc ];
        RGBs [ 1 ] = sourceImage [ posSrc + 2 ];
        RGBs [ 2 ] = sourceImage [ posSrc + 4 ];
        RGBs [ 3 ] = sourceImage [ posSrc + 6 ];
 
        for ( i=0; i<4; i++ ) { 
            RGB = RGBs [ i ]; 
            blue = RGB & 0xff; green = (RGB >> 8) & 0xff; red = (RGB >> 16) & 0xff;
 
            Value = ( ( -38 * red + -74 * green + 112 * blue ) >> 8 ) + 128;
            Value4 |= (Value << (i * 8));
        } 
        yPlaneOffset = yuvStride * srcHeight; 
        posOffset = (posY * yuvStrideHalf) + posX; 
        destImage [ yPlaneOffset + posOffset ] = Value4; 
        return; 
    } 
    posY -= srcHeightHalf; 
    if ( posY >= srcHeightHalf )
        return;
 
    // V plane - pack 4 v's within each work item
    posSrc = ((posY * 2) * srcWidth) + (posX * 8);
 
    RGBs [ 0 ] = sourceImage [ posSrc ];
    RGBs [ 1 ] = sourceImage [ posSrc + 2 ];
    RGBs [ 2 ] = sourceImage [ posSrc + 4 ];
    RGBs [ 3 ] = sourceImage [ posSrc + 6 ];
 
    for ( i=0; i<4; i++ ) { 
        RGB = RGBs [ i ]; 
        blue = RGB & 0xff; green = (RGB >> 8) & 0xff; red = (RGB >> 16) & 0xff;
 
        Value = ( ( 112 * red + -94 * green + -18 * blue ) >> 8 ) + 128;
        Value4 |= (Value << (i * 8));
    }
 
    yPlaneOffset = yuvStride * srcHeight;
    posOffset = (posY * yuvStrideHalf) + posX;
 
    destImage [ yPlaneOffset + (yPlaneOffset >> 2) + posOffset ] = Value4;
    return;
}

This code performs only global 32-bit memory access while 8-bit processing happens within each work item.

Oh.. and the proper code to invoke the kernel

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
unsigned int width = 1024;
unsigned int height = 768;
 
unsigned int frameSize = width * height;
const unsigned int argbSize = frameSize * 4; // ARGB pixels
 
const unsigned int yuvSize = frameSize + (frameSize >> 1); // Y,U,V planes
 
const unsigned int yuvStride = width >> 2; // since we pack 4 RGBs into "one" YYYY
 
// Allocates ARGB buffer
ocl_rgb_buffer = clCreateBuffer ( context, CL_MEM_READ_WRITE, argbSize, 0, &amp;error );
// ... error handling ...
 
ocl_yuv_buffer = clCreateBuffer ( context, CL_MEM_READ_WRITE, yuvSize, 0, &amp;error );
// ... error handling ...
 
error = clSetKernelArg ( kernel, 0, sizeof(cl_mem), &amp;ocl_rgb_buffer );
error |= clSetKernelArg ( kernel, 1, sizeof(cl_mem), &amp;ocl_yuv_buffer );
 
error |= clSetKernelArg ( kernel, 2, sizeof(unsigned int), &amp;height);
error |= clSetKernelArg ( kernel, 3, sizeof(unsigned int), &amp;width);
 
error |= clSetKernelArg ( kernel, 4, sizeof(unsigned int), &amp;yuvStride);
// ... error handling ...
 
const size_t local_ws[] = { 16, 32 };
const size_t global_ws[] = { yuvStride + (yuvStride>>1), height };
 
error = clEnqueueNDRangeKernel ( queue, kernel, 2, NULL, global_ws, local_ws, 0, NULL, NULL );
// ... error handling ...

Note: have a look at the work item calculations. Some additional code needs to be added (e.g. using mod so as to add sufficient spare items) to make sure that work item sizes fit to local work sizes.

Comments are off for this post

Canada – ITS 2009 Conference



I’ve been to banff/canada for a week to attend the international conference on interactive tabletops and surfaces 2009. Apart from the high quality of that conference, there were a lot of social happenings with the researchers, e.g. having interesting and productive talks while enjoying canadian beer in some of the beautiful pubs and restaurants in banff. Also the banquet with banquet speaker Chia Shen was great. I haven’t had such a delicious dinner so far and i’m pretty sure that it was quite expensive.. Thank goodness it was paid by the conference hehe

Oh and the view onto toronto at night (from the airplane) is soo beautiful.

On tuesday i presented results of my research from the university of augsburg and of course i was quite nervous hehe. Here are some pictures of my presentation and talk afterwards.
CIMG1245

CIMG1244

This is an elk in the gray of dawn. Unfortunatelly there wasn’t sufficient light for a perfect picture…
CIMG1248

A great diner burger :)
CIMG1258

The conference hotel – the fairmont springs banff!
CIMG1226

Banff at night…
CIMG1240

A view from the bridge in banff..
CIMG1195

Some mountains…
CIMG1198
CIMG1217

Comments are off for this post

PhD Trip to Israel

I was lucky enough to attend the “German-Israeli Minerva School for Ubiquitous Display Environments” from end of august on and it was a great experience to learn from the lectures and different cultures. It was also an experience to see how other researchers move towards possible solutions within the workshop and collaboratively create and work on ideas. Many thanks to all the responsible staff at this place. Oh and really delicious food…
And of course (a little late) here are some impressions i took with my camera hehe..

The minerva crew:
Minvera Team

Some impressions from Israel, Haifa, Jerusalem, …:
Israel1032
Israel0985
Israel0986
Israel1027
Israel1028

Israel1035

Israel1036
Israel1037
Israel1038
Israel1039
Israel1043
Israel1047
Israel1096
Israel1104
Israel1115
Israel1130
Israel1133

Comments are off for this post

USA 2009 in Michigan – Some Impressions

I’ve been to america for some weeks to visit my sister, take a vacation, and see my niece for the first time. Except for the great weather, it was a wonderful experience to live like and with americans and have time with my very cute and already sooo smart niece :)
I’ve also been to a nice club in rochester where a dj from 95.5, wearing sunglasses, broadcasted live to michigan statewide. While the club-music was kinda used (but great though), the imagination that anything of his act is broadcasted statewide was… interesting hehe..
Now vacation is over and guess what… somehow i miss the automatic in cars, ac everywhere, and so on hehe
During this great time i took a lot of pictures and here are some of them showing impressions of that weeks. In all america is a great country and i guess i’ll be back :D

  • That’s me sitting somewhere at the beach of lake huron.. a little windy…
    usa01
  • usa02
  • Townsend Hotel
    usa03
  • The corner at townsend hotel
    usa04
  • Way to tunnel to canada.. oh clearance is such a keyword i like for another reason hehe..
    usa05
  • Near Detroit Windsor Tunnel
    usa06
  • That’s me in front of the bridge to canada
    usa07
  • Caesars casino in canada
    usa08
  • The bridge to canada
    usa09
  • Me again and the hard-rock cafe behind (in detroit)
    usa10
  • usa11
  • An ongoing baseball game in the stadion of detroit
    usa12
  • Detroit wayne state university
    usa13
  • Ghostbusters in action found at the annual Woodworth Cruise… there were so many old and crazy cars… unbelieveable..
    usa14
  • Exit to 8 Mile
    usa15
Comments are off for this post

Next Page »