vec4<\/span> someVec<\/span>;<\/span>\r\nsomeVec<\/span>.<\/span>x<\/span> +<\/span> someVec<\/span>.<\/span>y<\/span>;<\/span>\r\n<\/pre>\n<\/div>\nThis is called swizzling<\/i>. You can use x, y, z, or w, referring to the first, second, third, and fourth components, respectively.<\/p>\n
The reason it has that name \u201cswizzling\u201d is because the following syntax is entirely valid:<\/p>\n
\nvec2<\/span> someVec<\/span>;<\/span>\r\nvec4<\/span> otherVec<\/span> =<\/span> someVec<\/span>.<\/span>xyxx<\/span>;<\/span>\r\nvec3<\/span> thirdVec<\/span> =<\/span> otherVec<\/span>.<\/span>zyy<\/span>;<\/span>\r\n<\/pre>\n<\/div>\nYou can use any combination of up to 4 of the letters to create a vector (of the same basic type) of that length. So otherVec.zyy<\/span><\/code> is a vec3, which is how we can initialize a vec3 value with it. Any combination of up to 4 letters is acceptable, so long as the source vector actually has those components. Attempting to access the \u2018w\u2019 component of a vec3 for example is a compile-time error.<\/p>\n
Swizzling also works on l-values:<\/p>\n
\nvec4<\/span> someVec<\/span>;<\/span>\r\nsomeVec<\/span>.<\/span>wzyx<\/span> =<\/span> vec4<\/span>(<\/span>1.0<\/span>,<\/span> 2.0<\/span>,<\/span> 3.0<\/span>,<\/span> 4.0<\/span>);<\/span> \/\/ Reverses the order.<\/span>\r\nsomeVec<\/span>.<\/span>zx<\/span> =<\/span> vec2<\/span>(<\/span>3.0<\/span>,<\/span> 5.0<\/span>);<\/span> \/\/ Sets the 3rd component of someVec to 3.0 and the 1st component to 5.0<\/span>\r\n<\/pre>\n<\/div>\nHowever, when you use a swizzle as a way of setting component values, you cannot<\/i> use the same swizzle component twice. So someVec.xx = vec2(4.0, 4.0);<\/span> is not allowed.<\/p>\n
Additionally, there are 3 sets of swizzle masks. You can use xyzw<\/b>, rgba<\/b> (for colors), or stpq<\/b> (for texture coordinates). These three sets have no actual difference; they\u2019re just syntactic sugar. You cannot combine names from different sets in a single swizzle operation. So \u201c.xrs\u201d is not a valid swizzle mask.<\/p>\n
In OpenGL 4.2 or ARB_shading_language_420pack, scalars can be swizzled as well. They obviously only have one source component, but it is legal to do this:<\/p>\n
\nfloat<\/span> aFloat<\/span>;<\/span>\r\nvec4<\/span> someVec<\/span> =<\/span> aFloat<\/span>.<\/span>xxxx<\/span>;<\/span><\/pre>\n<\/div>\n\nin<\/span> vec4<\/span> in_pos<\/span>;<\/span>\r\n\/\/ The following two lines:<\/span>\r\ngl_Position<\/span>.<\/span>x<\/span> =<\/span> in_pos<\/span>.<\/span>x<\/span>;<\/span>\r\ngl_Position<\/span>.<\/span>y<\/span> =<\/span> in_pos<\/span>.<\/span>y<\/span>;<\/span>\r\n\/\/ can be simplified to:<\/span>\r\ngl_Position<\/span>.<\/span>xy<\/span> =<\/span> in_pos<\/span>.<\/span>xy<\/span>;<\/span>\r\n<\/pre>\n<\/div>\nSwizzle can both make your shader faster, and the code becomes more readable.<\/p>\n
Avoid changing precision while swizzling<\/h3>\nConverting from one precision type to another in a Shader can be a costly operation, but converting the precision type while simultaneously swizzling can be particularly painful. If we have mathematical operations that use swizzling, ensure that they don\u2019t also convert the precision type. In these cases, it would be wiser to simply use the high-precision data type from the very beginning or reduce precision across the board to avoid the need for changes in precision.<\/p>\n
Get MAD<\/span><\/h2>\nMAD is short for multiply, then add. It is generally assumed that MAD operations are \u201csingle cycle\u201d, or at least faster than the alternative.<\/p>\n
\n\/\/ A stupid compiler might use these as written: a divide, then add.<\/span>\r\nvec4<\/span> result1<\/span> =<\/span> (<\/span>value<\/span> \/<\/span> 2.0<\/span>)<\/span> +<\/span> 1.0<\/span>;<\/span>\r\nvec4<\/span> result2<\/span> =<\/span> (<\/span>value<\/span> \/<\/span> 2.0<\/span>)<\/span> -<\/span> 1.0<\/span>;<\/span>\r\nvec4<\/span> result3<\/span> =<\/span> (<\/span>value<\/span> \/<\/span> -<\/span>2.0<\/span>)<\/span> +<\/span> 1.0<\/span>;<\/span>\r\n\r\n\/\/ There are most likely converted to a single MAD operation (per line).<\/span>\r\nvec4<\/span> result1<\/span> =<\/span> (<\/span>value<\/span> *<\/span> 0.5<\/span>)<\/span> +<\/span> 1.0<\/span>;<\/span>\r\nvec4<\/span> result2<\/span> =<\/span> (<\/span>value<\/span> *<\/span> 0.5<\/span>)<\/span> -<\/span> 1.0<\/span>;<\/span>\r\nvec4<\/span> result3<\/span> =<\/span> (<\/span>value<\/span> *<\/span> -<\/span>0.5<\/span>)<\/span> +<\/span> 1.0<\/span>;<\/span>\r\n<\/pre>\n<\/div>\nThe divide and add variant might cost 2 or more cycles.<\/p>\n
One expression might be better than the other. For example:<\/p>\n
\nresult<\/span> =<\/span> 0.5<\/span> *<\/span> (<\/span>1.0<\/span> +<\/span> variable<\/span>);<\/span>\r\nresult<\/span> =<\/span> 0.5<\/span> +<\/span> 0.5<\/span> *<\/span> variable<\/span>;<\/span>\r\n<\/pre>\n<\/div>\nThe first one may be converted into an add followed by a multiply. The second one is expressed in a way that more explicitly allows for a MAD operation.<\/p>\n
Assignment with MAD<\/span><\/h3>\nAssume that you want to set the output value ALPHA to 1.0. Here is one method\u00a0:<\/p>\n
\n  myOutputColor<\/span>.<\/span>xyz<\/span> =<\/span> myColor<\/span>.<\/span>xyz<\/span>;<\/span>\r\n  myOutputColor<\/span>.<\/span>w<\/span> =<\/span> 1.0<\/span>;<\/span>\r\n  gl_FragColor<\/span> =<\/span> myOutputColor<\/span>;<\/span>\r\n<\/pre>\n<\/div>\nThe above code can be 2 or 3 move instructions, depending on the compiler and the GPU\u2019s capabilities. Newer GPUs can handle setting different parts of gl_FragColor<\/span><\/code>, but older ones can\u2019t, which means they need to use a temporary to build the final color and set it with a 3rd move instruction.<\/p>\n
You can use a MAD instruction to set all the fields at once:<\/p>\n
\n  const<\/span> vec2<\/span> constantList<\/span> =<\/span> vec2<\/span>(<\/span>1.0<\/span>,<\/span> 0.0<\/span>);<\/span>\r\n  gl_FragColor<\/span> =<\/span> mycolor<\/span>.<\/span>xyzw<\/span> *<\/span> constantList<\/span>.<\/span>xxxy<\/span> +<\/span> constantList<\/span>.<\/span>yyyx<\/span>;<\/span>\r\n<\/pre>\n<\/div>\nThis does it all with one MAD operation, assuming that the building of the constant is compiled directly into the executable.<\/p>\n
Fast Built-ins<\/span><\/h2>\nThere are a number of built-in functions that are quite fast, if not \u201csingle-cycle\u201d (to the extent that this means something for various different hardware).<\/p>\n
Linear Interpolation<\/span><\/h3>\nLet\u2019s say we want to linearly interpolate between two values, based on some factor:<\/p>\n
\nvec3<\/span> colorRGB_0<\/span>,<\/span> colorRGB_1<\/span>;<\/span>\r\nfloat<\/span> alpha<\/span>;<\/span>\r\nresultRGB<\/span> =<\/span> colorRGB_0<\/span> *<\/span> (<\/span>1.0<\/span> -<\/span> alpha<\/span>)<\/span> +<\/span> colorRGB_1<\/span> *<\/span> alpha<\/span>;<\/span>\r\n\r\n\/\/ The above can be converted to the following for MAD purposes:<\/span>\r\nresultRGB<\/span> =<\/span> colorRGB_0<\/span>  +<\/span> alpha<\/span> *<\/span> (<\/span>colorRGB_1<\/span> -<\/span> colorRGB_0<\/span>);<\/span>\r\n\r\n\/\/ GLSL provides the mix function. This function should be used where possible:<\/span>\r\nresultRGB<\/span> =<\/span> mix<\/span>(<\/span>colorRGB_0<\/span>,<\/span> colorRGB_1<\/span>,<\/span> alpha<\/span>);<\/span>\r\n<\/pre>\n<\/div>\nDot products<\/span><\/h3>\nIt is reasonable to assume that dot product operations, despite the complexity of them, will be fast operations (possibly single-cycle). Given that knowledge, the following code can be optimized:<\/p>\n
\nvec3<\/span> fvalue1<\/span>;<\/span>\r\nresult1<\/span> =<\/span> fvalue1<\/span>.<\/span>x<\/span> +<\/span> fvalue1<\/span>.<\/span>y<\/span> +<\/span> fvalue1<\/span>.<\/span>z<\/span>;<\/span>\r\nvec4<\/span> fvalue2<\/span>;<\/span>\r\nresult2<\/span> =<\/span> fvalue2<\/span>.<\/span>x<\/span> +<\/span> fvalue2<\/span>.<\/span>y<\/span> +<\/span> fvalue2<\/span>.<\/span>z<\/span> +<\/span> fvalue2<\/span>.<\/span>w<\/span>;<\/span>\r\n<\/pre>\n<\/div>\nThis is essentially a lot of additions. Using a simple constant and the dot-product operator, we can have this:<\/p>\n
\nconst<\/span> vec4<\/span> AllOnes<\/span> =<\/span> vec4<\/span>(<\/span>1.0<\/span>);<\/span>\r\nvec3<\/span> fvalue1<\/span>;<\/span>\r\nresult1<\/span> =<\/span> dot<\/span>(<\/span>fvalue1<\/span>,<\/span> AllOnes<\/span>.<\/span>xyz<\/span>);<\/span>\r\nvec4<\/span> fvalue2<\/span>;<\/span>\r\nresult2<\/span> =<\/span> dot<\/span>(<\/span>fvalue2<\/span>,<\/span> AllOnes<\/span>);<\/span>\r\n<\/pre>\n<\/div>\nThis performs the computation all at once.<\/p>\n
\u00a0<\/p>\n<\/body>","protected":false},"excerpt":{"rendered":"
Many of the optimizations in this article are done automatically by some implementations, but often they are not. Therefore it<\/p>\n","protected":false},"author":1,"featured_media":481,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[7,66],"tags":[],"class_list":["post-586","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-coding","category-computer-graphics"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2017\/05\/glsl.png?fit=225%2C225&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8J21V-9s","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts\/586","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/comments?post=586"}],"version-history":[{"count":1,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts\/586\/revisions"}],"predecessor-version":[{"id":588,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts\/586\/revisions\/588"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/media\/481"}],"wp:attachment":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/media?parent=586"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/categories?post=586"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/tags?post=586"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}