{"id":586,"date":"2018-05-24T07:53:19","date_gmt":"2018-05-24T07:53:19","guid":{"rendered":"http:\/\/imalogic.com\/blog\/?p=586"},"modified":"2018-05-24T07:53:19","modified_gmt":"2018-05-24T07:53:19","slug":"glsl-optimizations","status":"publish","type":"post","link":"https:\/\/imalogic.com\/blog\/2018\/05\/24\/glsl-optimizations\/","title":{"rendered":"GLSL Optimizations"},"content":{"rendered":"<body><p>Many of the optimizations in this article are done automatically by some implementations, but often they are not. Therefore it helps to use these code optimizations, and they neither makes your code more complicated to read.<\/p>\n<h2><span id=\"Use_Swizzle\" class=\"mw-headline\">Use Swizzle<\/span><\/h2>\n<p>Swizzle masks are essentially free in hardware. Use them where possible.<\/p>\n<h4><span id=\"Swizzling\" class=\"mw-headline\">Swizzling<\/span><\/h4>\n<p>You can access the components of vectors using the following syntax:<\/p>\n<div class=\"mw-highlight mw-content-ltr\" dir=\"ltr\">\n<pre><span class=\"k\">vec4<\/span> <span class=\"n\">someVec<\/span><span class=\"p\">;<\/span>\r\n<span class=\"n\">someVec<\/span><span class=\"p\">.<\/span><span class=\"n\">x<\/span> <span class=\"o\">+<\/span> <span class=\"n\">someVec<\/span><span class=\"p\">.<\/span><span class=\"n\">y<\/span><span class=\"p\">;<\/span>\r\n<\/pre>\n<\/div>\n<p>This is called <i>swizzling<\/i>. You can use x, y, z, or w, referring to the first, second, third, and fourth components, respectively.<\/p>\n<p>The reason it has that name \u201cswizzling\u201d is because the following syntax is entirely valid:<\/p>\n<div class=\"mw-highlight mw-content-ltr\" dir=\"ltr\">\n<pre><span class=\"k\">vec2<\/span> <span class=\"n\">someVec<\/span><span class=\"p\">;<\/span>\r\n<span class=\"k\">vec4<\/span> <span class=\"n\">otherVec<\/span> <span class=\"o\">=<\/span> <span class=\"n\">someVec<\/span><span class=\"p\">.<\/span><span class=\"n\">xyxx<\/span><span class=\"p\">;<\/span>\r\n<span class=\"k\">vec3<\/span> <span class=\"n\">thirdVec<\/span> <span class=\"o\">=<\/span> <span class=\"n\">otherVec<\/span><span class=\"p\">.<\/span><span class=\"n\">zyy<\/span><span class=\"p\">;<\/span>\r\n<\/pre>\n<\/div>\n<p>You can use any combination of up to 4 of the letters to create a vector (of the same basic type) of that length. So <code><span style=\"font-family: Courier New;\">otherVec.zyy<\/span><\/code> is a vec3, which is how we can initialize a vec3 value with it. Any combination of up to 4 letters is acceptable, so long as the source vector actually has those components. Attempting to access the \u2018w\u2019 component of a vec3 for example is a compile-time error.<\/p>\n<p>Swizzling also works on l-values:<\/p>\n<div class=\"mw-highlight mw-content-ltr\" dir=\"ltr\">\n<pre><span class=\"k\">vec4<\/span> <span class=\"n\">someVec<\/span><span class=\"p\">;<\/span>\r\n<span class=\"n\">someVec<\/span><span class=\"p\">.<\/span><span class=\"n\">wzyx<\/span> <span class=\"o\">=<\/span> <span class=\"k\">vec4<\/span><span class=\"p\">(<\/span><span class=\"mf\">1.0<\/span><span class=\"p\">,<\/span> <span class=\"mf\">2.0<\/span><span class=\"p\">,<\/span> <span class=\"mf\">3.0<\/span><span class=\"p\">,<\/span> <span class=\"mf\">4.0<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ Reverses the order.<\/span>\r\n<span class=\"n\">someVec<\/span><span class=\"p\">.<\/span><span class=\"n\">zx<\/span> <span class=\"o\">=<\/span> <span class=\"k\">vec2<\/span><span class=\"p\">(<\/span><span class=\"mf\">3.0<\/span><span class=\"p\">,<\/span> <span class=\"mf\">5.0<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ Sets the 3rd component of someVec to 3.0 and the 1st component to 5.0<\/span>\r\n<\/pre>\n<\/div>\n<p>However, when you use a swizzle as a way of setting component values, you <i>cannot<\/i> use the same swizzle component twice. So <span class=\"tpl-code\">someVec.xx = vec2(4.0, 4.0);<\/span> is not allowed.<\/p>\n<p>Additionally, there are 3 sets of swizzle masks. You can use <b>xyzw<\/b>, <b>rgba<\/b> (for colors), or <b>stpq<\/b> (for texture coordinates). These three sets have no actual difference; they\u2019re just syntactic sugar. You cannot combine names from different sets in a single swizzle operation. So \u201c.xrs\u201d is not a valid swizzle mask.<\/p>\n<p>In OpenGL 4.2 or ARB_shading_language_420pack, scalars can be swizzled as well. They obviously only have one source component, but it is legal to do this:<\/p>\n<div class=\"mw-highlight mw-content-ltr\" dir=\"ltr\">\n<pre><span class=\"k\">float<\/span> <span class=\"n\">aFloat<\/span><span class=\"p\">;<\/span>\r\n<span class=\"k\">vec4<\/span> <span class=\"n\">someVec<\/span> <span class=\"o\">=<\/span> <span class=\"n\">aFloat<\/span><span class=\"p\">.<\/span><span class=\"n\">xxxx<\/span><span class=\"p\">;<\/span><\/pre>\n<\/div>\n<div class=\"mw-highlight mw-content-ltr\" dir=\"ltr\">\n<pre><span class=\"k\">in<\/span> <span class=\"k\">vec4<\/span> <span class=\"n\">in_pos<\/span><span class=\"p\">;<\/span>\r\n<span class=\"c1\">\/\/ The following two lines:<\/span>\r\n<span class=\"n\">gl_Position<\/span><span class=\"p\">.<\/span><span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">in_pos<\/span><span class=\"p\">.<\/span><span class=\"n\">x<\/span><span class=\"p\">;<\/span>\r\n<span class=\"n\">gl_Position<\/span><span class=\"p\">.<\/span><span class=\"n\">y<\/span> <span class=\"o\">=<\/span> <span class=\"n\">in_pos<\/span><span class=\"p\">.<\/span><span class=\"n\">y<\/span><span class=\"p\">;<\/span>\r\n<span class=\"c1\">\/\/ can be simplified to:<\/span>\r\n<span class=\"n\">gl_Position<\/span><span class=\"p\">.<\/span><span class=\"n\">xy<\/span> <span class=\"o\">=<\/span> <span class=\"n\">in_pos<\/span><span class=\"p\">.<\/span><span class=\"n\">xy<\/span><span class=\"p\">;<\/span>\r\n<\/pre>\n<\/div>\n<p>Swizzle can both make your shader faster, and the code becomes more readable.<\/p>\n<h3>Avoid changing precision while swizzling<\/h3>\n<p>Converting from one precision type to another in a Shader can be a costly operation, but converting the precision type while simultaneously swizzling can be particularly painful. If we have mathematical operations that use swizzling, ensure that they don\u2019t also convert the precision type. In these cases, it would be wiser to simply use the high-precision data type from the very beginning or reduce precision across the board to avoid the need for changes in precision.<\/p>\n<h2><span id=\"Get_MAD\" class=\"mw-headline\">Get MAD<\/span><\/h2>\n<p>MAD is short for multiply, then add. It is generally assumed that MAD operations are \u201csingle cycle\u201d, or at least faster than the alternative.<\/p>\n<div class=\"mw-highlight mw-content-ltr\" dir=\"ltr\">\n<pre><span class=\"c1\">\/\/ A stupid compiler might use these as written: a divide, then add.<\/span>\r\n<span class=\"k\">vec4<\/span> <span class=\"n\">result1<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"n\">value<\/span> <span class=\"o\">\/<\/span> <span class=\"mf\">2.0<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"mf\">1.0<\/span><span class=\"p\">;<\/span>\r\n<span class=\"k\">vec4<\/span> <span class=\"n\">result2<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"n\">value<\/span> <span class=\"o\">\/<\/span> <span class=\"mf\">2.0<\/span><span class=\"p\">)<\/span> <span class=\"o\">-<\/span> <span class=\"mf\">1.0<\/span><span class=\"p\">;<\/span>\r\n<span class=\"k\">vec4<\/span> <span class=\"n\">result3<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"n\">value<\/span> <span class=\"o\">\/<\/span> <span class=\"o\">-<\/span><span class=\"mf\">2.0<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"mf\">1.0<\/span><span class=\"p\">;<\/span>\r\n\r\n<span class=\"c1\">\/\/ There are most likely converted to a single MAD operation (per line).<\/span>\r\n<span class=\"k\">vec4<\/span> <span class=\"n\">result1<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"n\">value<\/span> <span class=\"o\">*<\/span> <span class=\"mf\">0.5<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"mf\">1.0<\/span><span class=\"p\">;<\/span>\r\n<span class=\"k\">vec4<\/span> <span class=\"n\">result2<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"n\">value<\/span> <span class=\"o\">*<\/span> <span class=\"mf\">0.5<\/span><span class=\"p\">)<\/span> <span class=\"o\">-<\/span> <span class=\"mf\">1.0<\/span><span class=\"p\">;<\/span>\r\n<span class=\"k\">vec4<\/span> <span class=\"n\">result3<\/span> <span class=\"o\">=<\/span> <span class=\"p\">(<\/span><span class=\"n\">value<\/span> <span class=\"o\">*<\/span> <span class=\"o\">-<\/span><span class=\"mf\">0.5<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"mf\">1.0<\/span><span class=\"p\">;<\/span>\r\n<\/pre>\n<\/div>\n<p>The divide and add variant might cost 2 or more cycles.<\/p>\n<p>One expression might be better than the other. For example:<\/p>\n<div class=\"mw-highlight mw-content-ltr\" dir=\"ltr\">\n<pre><span class=\"n\">result<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.5<\/span> <span class=\"o\">*<\/span> <span class=\"p\">(<\/span><span class=\"mf\">1.0<\/span> <span class=\"o\">+<\/span> <span class=\"n\">variable<\/span><span class=\"p\">);<\/span>\r\n<span class=\"n\">result<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.5<\/span> <span class=\"o\">+<\/span> <span class=\"mf\">0.5<\/span> <span class=\"o\">*<\/span> <span class=\"n\">variable<\/span><span class=\"p\">;<\/span>\r\n<\/pre>\n<\/div>\n<p>The first one may be converted into an add followed by a multiply. The second one is expressed in a way that more explicitly allows for a MAD operation.<\/p>\n<h3><span id=\"Assignment_with_MAD\" class=\"mw-headline\">Assignment with MAD<\/span><\/h3>\n<p>Assume that you want to set the output value ALPHA to 1.0. Here is one method\u00a0:<\/p>\n<div class=\"mw-highlight mw-content-ltr\" dir=\"ltr\">\n<pre>  <span class=\"n\">myOutputColor<\/span><span class=\"p\">.<\/span><span class=\"n\">xyz<\/span> <span class=\"o\">=<\/span> <span class=\"n\">myColor<\/span><span class=\"p\">.<\/span><span class=\"n\">xyz<\/span><span class=\"p\">;<\/span>\r\n  <span class=\"n\">myOutputColor<\/span><span class=\"p\">.<\/span><span class=\"n\">w<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">1.0<\/span><span class=\"p\">;<\/span>\r\n  <span class=\"n\">gl_FragColor<\/span> <span class=\"o\">=<\/span> <span class=\"n\">myOutputColor<\/span><span class=\"p\">;<\/span>\r\n<\/pre>\n<\/div>\n<p>The above code can be 2 or 3 move instructions, depending on the compiler and the GPU\u2019s capabilities. Newer GPUs can handle setting different parts of <code><span style=\"font-family: Courier New;\">gl_FragColor<\/span><\/code>, but older ones can\u2019t, which means they need to use a temporary to build the final color and set it with a 3rd move instruction.<\/p>\n<p>You can use a MAD instruction to set all the fields at once:<\/p>\n<div class=\"mw-highlight mw-content-ltr\" dir=\"ltr\">\n<pre>  <span class=\"k\">const<\/span> <span class=\"k\">vec2<\/span> <span class=\"n\">constantList<\/span> <span class=\"o\">=<\/span> <span class=\"k\">vec2<\/span><span class=\"p\">(<\/span><span class=\"mf\">1.0<\/span><span class=\"p\">,<\/span> <span class=\"mf\">0.0<\/span><span class=\"p\">);<\/span>\r\n  <span class=\"n\">gl_FragColor<\/span> <span class=\"o\">=<\/span> <span class=\"n\">mycolor<\/span><span class=\"p\">.<\/span><span class=\"n\">xyzw<\/span> <span class=\"o\">*<\/span> <span class=\"n\">constantList<\/span><span class=\"p\">.<\/span><span class=\"n\">xxxy<\/span> <span class=\"o\">+<\/span> <span class=\"n\">constantList<\/span><span class=\"p\">.<\/span><span class=\"n\">yyyx<\/span><span class=\"p\">;<\/span>\r\n<\/pre>\n<\/div>\n<p>This does it all with one MAD operation, assuming that the building of the constant is compiled directly into the executable.<\/p>\n<h2><span id=\"Fast_Built-ins\" class=\"mw-headline\">Fast Built-ins<\/span><\/h2>\n<p>There are a number of built-in functions that are quite fast, if not \u201csingle-cycle\u201d (to the extent that this means something for various different hardware).<\/p>\n<h3><span id=\"Linear_Interpolation\" class=\"mw-headline\">Linear Interpolation<\/span><\/h3>\n<p>Let\u2019s say we want to linearly interpolate between two values, based on some factor:<\/p>\n<div class=\"mw-highlight mw-content-ltr\" dir=\"ltr\">\n<pre><span class=\"k\">vec3<\/span> <span class=\"n\">colorRGB_0<\/span><span class=\"p\">,<\/span> <span class=\"n\">colorRGB_1<\/span><span class=\"p\">;<\/span>\r\n<span class=\"k\">float<\/span> <span class=\"n\">alpha<\/span><span class=\"p\">;<\/span>\r\n<span class=\"n\">resultRGB<\/span> <span class=\"o\">=<\/span> <span class=\"n\">colorRGB_0<\/span> <span class=\"o\">*<\/span> <span class=\"p\">(<\/span><span class=\"mf\">1.0<\/span> <span class=\"o\">-<\/span> <span class=\"n\">alpha<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">colorRGB_1<\/span> <span class=\"o\">*<\/span> <span class=\"n\">alpha<\/span><span class=\"p\">;<\/span>\r\n\r\n<span class=\"c1\">\/\/ The above can be converted to the following for MAD purposes:<\/span>\r\n<span class=\"n\">resultRGB<\/span> <span class=\"o\">=<\/span> <span class=\"n\">colorRGB_0<\/span>  <span class=\"o\">+<\/span> <span class=\"n\">alpha<\/span> <span class=\"o\">*<\/span> <span class=\"p\">(<\/span><span class=\"n\">colorRGB_1<\/span> <span class=\"o\">-<\/span> <span class=\"n\">colorRGB_0<\/span><span class=\"p\">);<\/span>\r\n\r\n<span class=\"c1\">\/\/ GLSL provides the mix function. This function should be used where possible:<\/span>\r\n<span class=\"n\">resultRGB<\/span> <span class=\"o\">=<\/span> <span class=\"n\">mix<\/span><span class=\"p\">(<\/span><span class=\"n\">colorRGB_0<\/span><span class=\"p\">,<\/span> <span class=\"n\">colorRGB_1<\/span><span class=\"p\">,<\/span> <span class=\"n\">alpha<\/span><span class=\"p\">);<\/span>\r\n<\/pre>\n<\/div>\n<h3><span id=\"Dot_products\" class=\"mw-headline\">Dot products<\/span><\/h3>\n<p>It is reasonable to assume that dot product operations, despite the complexity of them, will be fast operations (possibly single-cycle). Given that knowledge, the following code can be optimized:<\/p>\n<div class=\"mw-highlight mw-content-ltr\" dir=\"ltr\">\n<pre><span class=\"k\">vec3<\/span> <span class=\"n\">fvalue1<\/span><span class=\"p\">;<\/span>\r\n<span class=\"n\">result1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">fvalue1<\/span><span class=\"p\">.<\/span><span class=\"n\">x<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fvalue1<\/span><span class=\"p\">.<\/span><span class=\"n\">y<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fvalue1<\/span><span class=\"p\">.<\/span><span class=\"n\">z<\/span><span class=\"p\">;<\/span>\r\n<span class=\"k\">vec4<\/span> <span class=\"n\">fvalue2<\/span><span class=\"p\">;<\/span>\r\n<span class=\"n\">result2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">fvalue2<\/span><span class=\"p\">.<\/span><span class=\"n\">x<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fvalue2<\/span><span class=\"p\">.<\/span><span class=\"n\">y<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fvalue2<\/span><span class=\"p\">.<\/span><span class=\"n\">z<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fvalue2<\/span><span class=\"p\">.<\/span><span class=\"n\">w<\/span><span class=\"p\">;<\/span>\r\n<\/pre>\n<\/div>\n<p>This is essentially a lot of additions. Using a simple constant and the dot-product operator, we can have this:<\/p>\n<div class=\"mw-highlight mw-content-ltr\" dir=\"ltr\">\n<pre><span class=\"k\">const<\/span> <span class=\"k\">vec4<\/span> <span class=\"n\">AllOnes<\/span> <span class=\"o\">=<\/span> <span class=\"k\">vec4<\/span><span class=\"p\">(<\/span><span class=\"mf\">1.0<\/span><span class=\"p\">);<\/span>\r\n<span class=\"k\">vec3<\/span> <span class=\"n\">fvalue1<\/span><span class=\"p\">;<\/span>\r\n<span class=\"n\">result1<\/span> <span class=\"o\">=<\/span> <span class=\"n\">dot<\/span><span class=\"p\">(<\/span><span class=\"n\">fvalue1<\/span><span class=\"p\">,<\/span> <span class=\"n\">AllOnes<\/span><span class=\"p\">.<\/span><span class=\"n\">xyz<\/span><span class=\"p\">);<\/span>\r\n<span class=\"k\">vec4<\/span> <span class=\"n\">fvalue2<\/span><span class=\"p\">;<\/span>\r\n<span class=\"n\">result2<\/span> <span class=\"o\">=<\/span> <span class=\"n\">dot<\/span><span class=\"p\">(<\/span><span class=\"n\">fvalue2<\/span><span class=\"p\">,<\/span> <span class=\"n\">AllOnes<\/span><span class=\"p\">);<\/span>\r\n<\/pre>\n<\/div>\n<p>This performs the computation all at once.<\/p>\n<p>\u00a0<\/p>\n<\/body>","protected":false},"excerpt":{"rendered":"<p>Many of the optimizations in this article are done automatically by some implementations, but often they are not. Therefore it<\/p>\n","protected":false},"author":1,"featured_media":481,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[7,66],"tags":[],"class_list":["post-586","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-coding","category-computer-graphics"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/imalogic.com\/blog\/wp-content\/uploads\/2017\/05\/glsl.png?fit=225%2C225&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8J21V-9s","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts\/586","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/comments?post=586"}],"version-history":[{"count":1,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts\/586\/revisions"}],"predecessor-version":[{"id":588,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/posts\/586\/revisions\/588"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/media\/481"}],"wp:attachment":[{"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/media?parent=586"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/categories?post=586"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/imalogic.com\/blog\/wp-json\/wp\/v2\/tags?post=586"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}