**Quake III Arena shader tricks** 2024.09.29 first publication
2024.10.01 last edit Introduction ================================================================================ The game *Quake III Arena*, which I'll refer to as Q3 from now on, came out before hardware pixel shaders became available. Everything in this article will be about writing Q3 "shaders" (which we'll define and quickly explain in the next section) that the *original* Q3 engine (1.32c) can use. Some of its limitations are absolutely trivial to lift (e.g. changing one number) but that is not the point. There is still content out there made to work with every version of the engine. Besides, if you make a map that only works on subset of engines that doesn't include the original, you haven't produced a Q3 map. Note that I think it to be perfectly fine on its own (there's plenty of great Doom and Quake content built for improved engines), it's just a matter of not misrepresenting one's work. The Q3 Shader Manual can be found here:
https://garux.github.io/shaderManual/contents.html The shader system ================================================================================ Overview -------------------------------------------------------------------------------- Q3 shaders are more like a simple configuration system than actual code. You can't write your own expressions in them at all because the fixed-function hardware of the time didn't allow it. Instead, you configure a list of render passes called "stages" where each stage can only read one texture and one color from interpolated vertex attributes. Each render pass configures the rendering pipeline: texture, vertex color generation, texture coordinate generation/transformation, rasterizer state, etc. A Q3 shader takes this form: ```q3shader ShaderName { GeneralDirective1 GeneralDirective2 { StageDirective1 StageDirective2 ... } { StageDirective1 StageDirective2 ... } ... } ``` General directives specify items such as: - `cull` for the culling mode (back, front, two-sided) - `polygonOffset` to set the constant depth bias and the slope-scaled depth bias rasterizer states (glPolygonOffset in OpenGL) such that decals don't Z-fight the surfaces they're on - `deformVertexes` for animating vertex positions Stage directives specify items such as: - `map` to bind a texture in repeat mode and `clampMap` for clamped mode - `animMap` to bind a series of textures that loops over time in repeat mode (there is no clampAnimMap) - `rgbGen` and `alphaGen` to set or generate animations of the RGB & A channels of the stage's vertex color attribute - `tcGen` to generate new texture coordinates for e.g. fake environment mapping - `tcMod` to transform and animate texture coordinates - `depthFunc` for the type of depth test (less equal or less, assuming a non-reversed depth buffer) - `depthWrite` to force depth writes when the shader's first stage isn't opaque - `alphaFunc` to select one of three alpha test conditions (`> 0`, `< 0.5`, `>= 0.5`) A simple Q3 shader that renders a lightmapped surface would look like this: ```q3shader textures/cpm3b_b1/conc_floor_0017 { { // first stage: render_target = diffuse * white map textures/cpm3b_b1/conc_floor_0017.tga // diffuse texture rgbGen identity // let's assume it returns pure white for this example // no blendFunc specified -> alpha blending is disabled } { // second stage: render_target *= lightmap * white map $lightmap // lightmap texture rgbGen identity // let's assume it returns pure white for this example blendFunc filter // multiplicative blending, same as blendFUnc GL_DST_COLOR GL_ZERO } } ``` If you've ever used Direct3D FX or CgFX, then this should all look very familiar to you. A Q3 shader is like a D3D/Cg technique and a Q3 shader stage is like a D3D/Cg pass. References: - Direct3D FX: https://learn.microsoft.com/en-us/windows/win32/direct3d9/writing-an-effect - CgFX: https://developer.download.nvidia.com/CgTutorial/cg_tutorial_appendix_c.html A single stage can correspond to a single draw call but some advanced hardware at the time was able to read from multiple textures and apply certain operations on the samples. This means that certain consecutive stages could be "collapsed" (the term used in the source code) into a single multi-textured stage. All color values we work with are in the normalized [0;1] range and results are clamped to it. With that out of the way, I will now quickly recap the most important stage directives that we'll be using. animMap -------------------------------------------------------------------------------- `animMap <FPS> <texture1> [texture2 ... texture8]` This cycles the specified (1 to 8) textures as a looping animation at the specified framerate. The full animation's duration is `frameCount / FPS` and its frequency `FPS / frameCount`. blendFunc -------------------------------------------------------------------------------- `blendFunc <sourceFunc> <destFunc>` Sets the alpha blending mode such that the final output is computed like so:
`dest = source * sourceFunc + dest * destFunc`
where dest is the render target and source the color computed by the render pass
`source = textureColor * vertexColor`
In other words:
`rtColor = textureColor * vertexColor * sourceFunc + rtColor * destFunc` sourceFunc can be one of: `GL_ONE, GL_ZERO, GL_DST_COLOR, GL_ONE_MINUS_DST_COLOR, GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA, GL_DST_ALPHA, GL_ONE_MINUS_DST_ALPHA` destFunc can be one of: `GL_ONE, GL_ZERO, GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA, GL_DST_ALPHA, GL_ONE_MINUS_DST_ALPHA, GL_SRC_COLOR, GL_ONE_MINUS_SRC_COLOR` I think you can easily tell what graphics API informed the name of these symbols. Shorthands: - `blendFunc add` is additive blending: `blendFunc GL_ONE GL_ONE` - `blendFunc blend` is standard alpha blending: `blendFunc GL_SRC_ALPHA GL_ONE_MINUS_SRC_ALPHA` - `blendFunc filter` is multiplicative blending: `blendFunc GL_DST_COLOR GL_ZERO` or `blendFunc GL_ZERO GL_SRC_COLOR` There is no pre-multiplied alpha blending shorthand but you can still use it just fine: `blendFunc GL_ONE GL_ONE_MINUS_SRC_ALPHA` rgbGen and alphaGen -------------------------------------------------------------------------------- `rgbGen` specifies the vertex color generation mode for the RGB channels Key `rgbGen` modes: - `rgbGen identity` generates pure white - `rgbGen const ( <R G B> )` uses the specified normalized constant color - `rgbGen wave <function base amplitude phase frequency>` generates an animated color based on the waveform function and its parameters `alphaGen` specifies the vertex color generation mode for the A channel Key `alphaGen` modes: - `alphaGen const <A>` uses the specified normalized constant value - `alphaGen wave <function base amplitude phase frequency>` generates an animated alpha based on the waveform function and its parameters Waveform functions: - `sin` is a sine wave
-> value range: `[base - amplitude;base + amplitude]` - `triangle` increases linearly then decreases linearly (the function is weirdly shifted on the X axis, don't ask me why)
-> value range: `[base - amplitude;base + amplitude]` - `square` sits at the minimum value for the first half of the cycle and then sits at the maximum value for the second half of the cycle
-> value range: `[base - amplitude;base + amplitude]` - `sawtooth` linearly increases from the minimum value to the maximum value and then drops to the minimum value instantly
-> value range: `[base;base + amplitude]` - `inverseSawtooth` linearly decreases from the maximum value to the minimum value and then drops to the maximum value instantly
-> value range: `[base;base + amplitude]` Limitations -------------------------------------------------------------------------------- Here are a number of really annoying limitations of the original Q3 shader system: - shaders can't be loaded from memory
-> the mod can't generate a shader on the fly without writing its "code" to a file - you can't make shader templates
-> need to copy-paste shaders all over the place when the logic is the same but the textures are different - animMap sequences can only be 8 frames long
-> at 32 FPS, that's 0.25 second only - rgbGen/alphaGen wave square can't have different durations for both values
-> square waves used for masking purposes can only be used for splitting a cycle into 2 equal halves - clamped texture repeat mode for animMap isn't available (clampedAnimMap doesn't exist)
-> animations can never be zoomed out, otherwise we'll see the pattern repeat
-> to be able to zoom out, you have to waste texture space by leaving the borders "empty", which only works for textures that don't tile - you can't select a destination with a stage directive (e.g. 4 destinations: main, temp1, temp2, temp3)
-> fewer math expressions can be implemented - blendFunc is for all channels, there was no RGB/A split at the time (like e.g. OpenGL's 2.0's glBlendFuncSeparate)
-> fewer math expressions can be implemented Desynchronized animations ================================================================================ The more easily you can tell that an effect has a time loop, the less natural and believable it will look. A single shader can have multiple animations such as the image sequence, vertex positions, texture coordinates, vertex colors, alpha value, etc. A great way to improve the shader's final look is to simply tweak the durations of your animations such that: - 2+ given animations start at the same time after the longest possible time - the number of animations that start at the same time is minimized The most effective method for this is to map animation durations to prime numbers. Two animations with prime number durations in seconds `d1` and `d2` can only start at the same time after `d1 * d2` seconds. Let's pick a simple dummy shader as example with a few lines omitted: ```q3shader { // the last parameter of deformVertexes/rgbGen/tcMod is the frequency deformVertexes wave 40 sin -1.5 1.5 0.07 0.14 { map textures/cpm3b_b1/lavapool.jpg rgbGen wave sin 0.75 0.25 0 0.14 tcMod turb 0 0.1 0 0.14 } } ``` We pick a trio of prime numbers that we will scale linearly later: `11`, `13`, `17`. This shader uses the same frequency `0.14` three times. We decide that the middle prime (`13`) maps to the current duration of `1 / 0.14` seconds. We can then compute the frequencies by multiplying all 3 primes by `0.14 / 13` to desynchronize the animations effectively. ```q3shader { // the last parameter of deformVertexes/rgbGen/tcMod is the frequency deformVertexes wave 40 sin -1.5 1.5 0.07 0.11846153846153846153846153846153 { map textures/cpm3b_b1/lavapool.jpg rgbGen wave sin 0.75 0.25 0 0.14 tcMod turb 0 0.1 0 0.18307692307692307692307692307692 } } ``` The interpolated animMap ================================================================================ Shaders with this trick have shipped with the game's pak0.pk3 file, so it has been used since 1999 or earlier. Let's assume we want an additive blended rocket explosion shader with an animation that lasts 0.5 second. Since animMap is limited to 8 images, we have to target a framerate of 8 / 0.5 = 16 FPS. That's obviously not enough to look acceptable. What if we could smooth things out be linearly interpolating the frames?
`lerp(A, B, t)`
where: - A is the previous animation frame's texture - B is the next animation frame's texture - t is the normalized time since we started displaying A: - `t = 0` is the time we switch to frame A as the previous frame - `t = 1` is the time we switch to frame B as the previous frame - `t = 1` is therefore the full duration of a frame, i.e. 1 / FPS = 1 / 16 second. Let's break the expression down:
`lerp(A, B, t) = A * (1 - t) + B * t`
Since we can only read 1 texture per stage, we now have 2 stages that need to be added: `A * (1 - t)` and `B * t`. A and B are specified through animMap, so we just need to figure out how to generate `(1 - t)` and `t` using vertex colors since each stage multiplies the texture color by the vertex color. It turns out to be trivial to do with `rgbGen wave` since `f(t) = t` is a sawtooth waveform and `f(t) = 1 - t` is an inverse sawtooth waveform. The result is therefore: ```q3shader { { // A * (1 - t) = previousFrame * inverseSawtooth(t) animMap 16 01.tga 02.tga 03.tga 04.tga 05.tga 06.tga 07.tga 08.tga blendFunc add rgbGen wave inverseSawtooth 0 1 0 16 // base=0 amplitude=1 phase=0 frequency=FPS=16 } { // B * t = nextFrame * sawtooth(t) // since we specify the next frames, we start with image #2 instead of #1 in the animMap animMap 16 02.tga 03.tga 04.tga 05.tga 06.tga 07.tga 08.tga 01.tga blendFunc add rgbGen wave sawtooth 0 1 0 16 // base=0 amplitude=1 phase=0 frequency=FPS=16 } } ``` That's pretty much what the CPMA "rocketExplosion" shader does. The original Q3 "rocketExplosion" shader (in baseq3/pak0.pk3/scripts/gfx.shader) is the same except the animation duration was 1 second. The 16-frame animMap ================================================================================ Let's assume we want an additive blended flame shader with an animation that lasts 0.5 second. We have seen that we can do a 16 FPS animation with 2 animMap stages that interpolate 8 unique frames. But what if we could somehow use 2 shader stages to use 16 unique frames for a 32 FPS animation? Here's a naive first stab at the problem: ```q3shader { { animMap 32 01.tga 02.tga 03.tga 04.tga 05.tga 06.tga 07.tga 08.tga blendFunc add } { animMap 32 09.tga 10.tga 11.tga 12.tga 13.tga 14.tga 15.tga 16.tga blendFunc add } } ``` Well, that obviously doesn't work since we're adding pairs of frames: (1, 9), (2, 10), etc. But can we fix it? Only the first stage should be visible for frameCount / FPS = 8 / 32 = 0.25 second and then only the second stage should be visible for the 0.25 second. We thus need the following waveform functions for the 2 shader stages: - 1 for 0.25 second and then 0 for 0.25 second - 0 for 0.25 second and then 1 for 0.25 second That's just 2 square waves with different time offsets! So all we need is to use the same `rgbGen wave square` directive but with different phases to implement the time offset: ```q3shader { // the phase value is normalized, so 0.5 is half the duration of a full waveform cycle // the frequency is 2 because the 0.5 second duration is for the full cycle (min value and max value) // it means the value is `base - amplitude = 0` for 0.25 second and `base + amplitude = 1` for 0.25 second { animMap 32 01.tga 02.tga 03.tga 04.tga 05.tga 06.tga 07.tga 08.tga blendFunc add rgbGen wave square 0.5 0.5 0 2 // base=0.5 amplitude=0.5 phase=0 frequency=2 } { animMap 32 09.tga 10.tga 11.tga 12.tga 13.tga 14.tga 15.tga 16.tga blendFunc add rgbGen wave square 0.5 0.5 0.5 2 // base=0.5 amplitude=0.5 phase=0.5 frequency=2 } } ``` I've come up with that trick independently as I've never seen it used or mentioned anywhere but I don't claim to be the first. It's definitely not a well known trick in the Q3 mapping community like the interpolation one, hence this article. This trick can be used with other blending modes and the alphaGen directive as well. Unfortunately, `rgbGen/alphaGen wave square` doesn't allow setting the duration of the min/max signals separately. If it were possible, we could have trivially added even more shader stages for more unique frames of animation. Can we do a 32-frame animation by combining 2 square waves to selectively mask the right 8-frame group? Doing this would require 4 stages for the animMap directives and high-frequency waveforms and 2 extra stages fow low-frequency waveforms:
16 frames: `A*sqr(t) + B*sqr(t + 0.5)` -> 2 stages
32 frames: `(A*sqr(t) + B*sqr(t + 0.5)) * sqr(t/2) + (C*sqr(t) + D*sqr(t + 0.5)) * sqr(t/2 + 0.5)` -> 6 stages
where sqr generates the square wave and corrects the input to be normalized and repeating, mapping e.g. 1.25 to 0.25 and -1.25 to 0.75.
6 stages is a lot but what's even more problematic is that we don't have anything to store/load the results of sub-expressions, so this specific idea doesn't work. General advice ================================================================================ The simplest advice -------------------------------------------------------------------------------- Here is some advice that I think should help anyone writing shaders: - Make sure you fully understand how the shader system returns `textureColor*vertexColor` for each stage, how vertex colors can be generated and how the stages can be combined using blend functions. - Write what you're trying to do as a formula and then see how you can map that to the Q3 shader system with the tools available to you. When you can't, you can get creative by pre-computing expressions, simplifying your formula, approximating using incorrect but close enough blend modes, etc. - When you create your own textures for alpha-blended effects, use pre-multiplied alpha instead of the standard alpha blend.
Tom Forsyth writes about it [here](https://tomforsyth1000.github.io/blog.wiki.html#%5B%5BPremultiplied%20alpha%5D%5D) and [here](https://tomforsyth1000.github.io/blog.wiki.html#%5B%5BPremultiplied%20alpha%20part%202%5D%5D), Inigo Quilez writes about it [here](https://iquilezles.org/articles/premultipliedalpha/). Opaque for more freedom -------------------------------------------------------------------------------- Opaque surfaces allow for more advanced formulas of the form `A + B*C*D*E*...` where each letter is the result of a shader stage. All that's needed is outputting B in the first opaque/replace stage, then C/D/E/... in multiplicative stages afterward. You then finalize with an additive pass that returns A. In other words, it would look like this for `A + B*C*D`: ```q3shader { { // render_target = B map B } { // render_target = B*C map C blendFunc filter } { // render_target = B*C*D map D blendFunc filter } { // render_target = A + B*C*D map A blendFunc add } } ``` Single-stage expression -------------------------------------------------------------------------------- Remember that you can't store the result of the blending of 2 consecutive stages in temporary storage. However: - If you can pre-compute some expression in the texture data to simplify the formula enough that it can be implemented, go for it. - If you can apply the same blend operation to color and alpha, you can re-use the alpha in a later stage with GL_DST_ALPHA or GL_DST_ONE_MINUS_ALPHA. It is unfortunately pretty difficult to find scenarios where this works out because you need the same operation to yield RGB you keep *and* useful A you can use for a follow-up operation. - If the intermediate expressions are multiplications, you might be able to use vertex colors to generate the value you want. `A*B + C*D` is not possible to implement if each letter is the result of its own stage, but if you can implement either `A*B` or `C*D` as a single stage, you're good to go. If you can compute `A*B` in 1 stage thanks to vertex color math but `C*D` requires 2 stages: ```q3shader { { // render_target = C map C } { // render_target = C*D map D blendFunc filter } { // render_target = A*B + C*D map A rgbGen ... // returns B blendFunc add } } ``` This is exactly what was done in both animMap tricks where the vertex colors were generated using (inverse) sawtooth and square waveforms. Results and conclusion ================================================================================ Variants of these tricks can be found in the new CPMA beta maps: https://www.playmorepromode.com/maps Here's a short breakdown video of an original *id Software* flame shader and multiple new shaders from the aforementioned CPMA maps: ![](https://www.youtube.com/watch?v=NcmXlue66aU) The brand new looping image sequences were generated from a small C++ application I wrote that uses modified GLSL code ported from [ShaderToy](https://www.shadertoy.com/). Using ShaderToy for live editing effects is really great if you don't already have your own solution for that. The GLM library (https://github.com/g-truc/glm) was used to make the porting of GLSL code to C++ easier. The methodoloy used for making the animations loop and tile was the one I decribed in my [previous article about repeating shaders](../shader_repeat). All these ideas are intuitively obvious in retrospect and pretty easy to apply. It didn't take much effort to put them in practice and I think the small time investment was well worth it. The majority of the time was spent tweaking the code that generates the image sequences but when you can use ShaderToy, it's not a chore. --------------------------------------------------------------------------------