GLSL Optimizations

Many of the optimizations in this article are done automatically by some implementations, but often they are not. Therefore it helps to use these code optimizations, and they neither makes your code more complicated to read.

Use Swizzle

Swizzle masks are essentially free in hardware. Use them where possible.

Swizzling

You can access the components of vectors using the following syntax:

vec4 someVec;
someVec.x + someVec.y;

This is called swizzling. You can use x, y, z, or w, referring to the first, second, third, and fourth components, respectively.

The reason it has that name “swizzling” is because the following syntax is entirely valid:

vec2 someVec;
vec4 otherVec = someVec.xyxx;
vec3 thirdVec = otherVec.zyy;

You can use any combination of up to 4 of the letters to create a vector (of the same basic type) of that length. So otherVec.zyy is a vec3, which is how we can initialize a vec3 value with it. Any combination of up to 4 letters is acceptable, so long as the source vector actually has those components. Attempting to access the ‘w’ component of a vec3 for example is a compile-time error.

Swizzling also works on l-values:

vec4 someVec;
someVec.wzyx = vec4(1.0, 2.0, 3.0, 4.0); // Reverses the order.
someVec.zx = vec2(3.0, 5.0); // Sets the 3rd component of someVec to 3.0 and the 1st component to 5.0

However, when you use a swizzle as a way of setting component values, you cannot use the same swizzle component twice. So someVec.xx = vec2(4.0, 4.0); is not allowed.

Additionally, there are 3 sets of swizzle masks. You can use xyzw, rgba (for colors), or stpq (for texture coordinates). These three sets have no actual difference; they’re just syntactic sugar. You cannot combine names from different sets in a single swizzle operation. So “.xrs” is not a valid swizzle mask.

In OpenGL 4.2 or ARB_shading_language_420pack, scalars can be swizzled as well. They obviously only have one source component, but it is legal to do this:

float aFloat;
vec4 someVec = aFloat.xxxx;
in vec4 in_pos;
// The following two lines:
gl_Position.x = in_pos.x;
gl_Position.y = in_pos.y;
// can be simplified to:
gl_Position.xy = in_pos.xy;

Swizzle can both make your shader faster, and the code becomes more readable.

Avoid changing precision while swizzling

Converting from one precision type to another in a Shader can be a costly operation, but converting the precision type while simultaneously swizzling can be particularly painful. If we have mathematical operations that use swizzling, ensure that they don’t also convert the precision type. In these cases, it would be wiser to simply use the high-precision data type from the very beginning or reduce precision across the board to avoid the need for changes in precision.

Get MAD

MAD is short for multiply, then add. It is generally assumed that MAD operations are “single cycle”, or at least faster than the alternative.

// A stupid compiler might use these as written: a divide, then add.
vec4 result1 = (value / 2.0) + 1.0;
vec4 result2 = (value / 2.0) - 1.0;
vec4 result3 = (value / -2.0) + 1.0;

// There are most likely converted to a single MAD operation (per line).
vec4 result1 = (value * 0.5) + 1.0;
vec4 result2 = (value * 0.5) - 1.0;
vec4 result3 = (value * -0.5) + 1.0;

The divide and add variant might cost 2 or more cycles.

One expression might be better than the other. For example:

result = 0.5 * (1.0 + variable);
result = 0.5 + 0.5 * variable;

The first one may be converted into an add followed by a multiply. The second one is expressed in a way that more explicitly allows for a MAD operation.

Assignment with MAD

Assume that you want to set the output value ALPHA to 1.0. Here is one method :

  myOutputColor.xyz = myColor.xyz;
  myOutputColor.w = 1.0;
  gl_FragColor = myOutputColor;

The above code can be 2 or 3 move instructions, depending on the compiler and the GPU’s capabilities. Newer GPUs can handle setting different parts of gl_FragColor, but older ones can’t, which means they need to use a temporary to build the final color and set it with a 3rd move instruction.

You can use a MAD instruction to set all the fields at once:

  const vec2 constantList = vec2(1.0, 0.0);
  gl_FragColor = mycolor.xyzw * constantList.xxxy + constantList.yyyx;

This does it all with one MAD operation, assuming that the building of the constant is compiled directly into the executable.

Fast Built-ins

There are a number of built-in functions that are quite fast, if not “single-cycle” (to the extent that this means something for various different hardware).

Linear Interpolation

Let’s say we want to linearly interpolate between two values, based on some factor:

vec3 colorRGB_0, colorRGB_1;
float alpha;
resultRGB = colorRGB_0 * (1.0 - alpha) + colorRGB_1 * alpha;

// The above can be converted to the following for MAD purposes:
resultRGB = colorRGB_0  + alpha * (colorRGB_1 - colorRGB_0);

// GLSL provides the mix function. This function should be used where possible:
resultRGB = mix(colorRGB_0, colorRGB_1, alpha);

Dot products

It is reasonable to assume that dot product operations, despite the complexity of them, will be fast operations (possibly single-cycle). Given that knowledge, the following code can be optimized:

vec3 fvalue1;
result1 = fvalue1.x + fvalue1.y + fvalue1.z;
vec4 fvalue2;
result2 = fvalue2.x + fvalue2.y + fvalue2.z + fvalue2.w;

This is essentially a lot of additions. Using a simple constant and the dot-product operator, we can have this:

const vec4 AllOnes = vec4(1.0);
vec3 fvalue1;
result1 = dot(fvalue1, AllOnes.xyz);
vec4 fvalue2;
result2 = dot(fvalue2, AllOnes);

This performs the computation all at once.