Saturday, April 9, 2011

Final Report

Introduction

NVidia’s Compute Unified Device Architecture (CUDA) is able to provide parallel computation for physically based systems of millions of particles directly on a graphics card. Specifically on a Geforce GTX 260 I’ve been able to simulate and render over a million interactive fluid particles. This document will include topics that I’ve researched, learned and implemented through the semester in CIS 4900 Special Topics. Knowledge was gained in various areas including developing kernels with CUDA for C, implementing algorithms for physically based fluid effects and experience with less common functions of OpenGL and C.

Setting up CUDA within Microsoft Visual Studio requires addition of the CUDA runtime API and a link to the nvcc compiler. Multiple CUDA utility libraries must be added to a project before it is possible to compile a CUDA kernel function to run on the GPU. Also, Dynamic Link Libraries must be present in the same folder as the compiler output in order to run.

CUDA kernel functions can be declared as either global or device kernels. Global kernels can be called from C whereas device kernels can only be called from within other CUDA kernels. Global kernels must be passed a GPU grid size and block size which refer to the number of threads that will be allocated for each block of data. When calling a global CUDA kernel you can pass it primitive data types and full data structures.


Objectives

The objectives of this course were to create water, fire and smoke simulations on a GPU using parallel computation for the effects. They were to be animated and interactive in real-time and allow for output of the effects to a set of textures. These textures could then be used on a memory constrained device to display high quality animated effects. The final program should be a demo of a desktop-based particle effect generator that can output animated textures.


Algorithm Overview

Water simulation consists of four major steps: addition of forces, advection and diffusion of velocity, and advection of particles. Addition of forces is simply determining velocities from user input. Then, a velocity grid is used to store data about each particles movement and this is traced through time using bilinear interpolation at the velocity advection step. Velocity diffusion then takes the velocity grid and applies viscosity and wave properties to restrict particle motion. Finally these resulting velocities are applied to the particles during particle advection at each time step.

Smoke and fire simulation are broken down simply into the following components: particle initialization, particle simulation and particle rendering. Initialization of particles handles randomizing properties so that colours, size, velocity and position vary for each particle. The simulation is based on a reduction in each particles lifetime that affects its transparency, size, colour, velocity and position. The particles are then either drawn if they are still “alive” or reinitialized if they are not.


Implementation Details

For water simulation an array of floats is passed to each kernel function and for smoke simulation an array of particle structures is used. All data must be copied into CUDA memory on the graphics card before running a kernel that operates on the data.

Initially with water simulation I began by adapting a 2D fluid simulation from NVidia. To move the simulation into 3D required adding a depth (z) component for the points in each kernel. The result was a projected plane of points in 3D. Then, these points were rendered to a texture attached to a FrameBuffer. Finally, the texture was applied to a plane intersecting a terrain map to appear as water.

Smoke and fire simulation can be performed using fewer particles than required for water due to the low density of smoke and fire. Maintaining a particle structure that contains the lifetime, speed, size and colour of each particle allowed the simulation kernels to easily modify each particle in the array in parallel.

Positions of particles slowly move upwards on each step and velocities increase over time. The lifetime of each particle is reduced based on a “fade” value that can be modified in the user interface. The user interface also contains size and starting colour parameters. For smoke simulation this simply represents the upper bound grey value before randomization. For fire simulation this represents the starting value of red and green and adjusts the affect of blue’s colour contribution.


Results

Attempting to simply render the points to appear as water within a terrain map was not possible since the points needed to ignore the depth buffer. The solution to this was rendering to a texture using a Framebuffer. This worked very well and allowed for animated water textures to be exported from the application easily. After adjusting the viscosity level and colour of the water particles they became much more realistic in the texture. Adding alpha blending and GLSL bump mapping produced the best result while the water particles were stationary.

In smoke and fire simulations the performance gain compared to calculating each step on the CPU was almost double. However, performance was limited by requiring a random number generator to re-initialize particles. This was solved using a table of random numbers in CUDA to increase performance.


Limitations

Computing normal maps for water particles in real-time ended up decreasing system performance below a manageable frame-rate. The best result performance-wise and visually was to not calculate a normal map while water is animating – only when textures are written to files.

Writing textures to a file is a costly operation due to the time required to write to a hard disk. To prevent slowdowns while rendering I set up record and print keys to handle when to record textures from each frame in memory and then write them to files all at once.

Since all CUDA kernels operate in parallel it is not possible to access standard C library methods from within a kernel. This means that random number generators and other convenience utilities need to be recreated as a kernel function. I chose to use a lookup table of random values to have apparent randomness in the smoke simulation for simplicity. The random table is then accessed in a device kernel from the global kernel for updating smoke particles.


Experience Gained

I gained a great amount of knowledge in calculating normal maps, DOT3 bump mapping techniques and GLSL bump mapping for this part of the project. I became much more familiar with OpenGL techniques that I previously didn't have much experience with including rendering to a texture, writing textures to a file, GLUI interfaces, and multi-texturing with transparency. Rendering to a texture is very useful in situations where you would like to modify textures during runtime.

GLUI simplified setting many parameters especially for the smoke simulation – allowing me to find the parameters that give the best result visually. Multi-texturing gave much more realistic looking water, but care must be taken to ensure that each texture uses the same alpha value.

I learned that limiting the number of memory copies and CPU side calculations greatly increases the performance gained from using CUDA. I also grasped the CUDA for C syntax and parallel computing methods while applying algorithms for physically based effects. This was a useful application of knowledge I’ve gained in other Computer Science and Physics courses. I also feel much more confident in my ability to use OpenGL and GLSL in a Visual Studio project.


Conclusion

In conclusion, the course has tested my ability to maintain a reasonably large sized project in Visual Studio while allowing me to apply knowledge of physically based effects in a parallel GPU computing environment. Also, I've strengthened my knowledge of OpenGL, GLSL and managing input and output of files in Windows. I am now able to apply the skills I've gained to create CUDA functions that avoid the need to iterate over arrays while computing. The application will now serve as part of my portfolio and also as a tool to produce animated texture effects for use on mobile devices.

Wednesday, April 6, 2011

CUDA Fluid Effects In Action

Here's a short video I put together using the effects:

Fire Effect

The only major differences in rendering a fire particle effect compared to the previous smoke particle effect was in the colour and density of the particles. Fire particles are smaller and more dense than smoke particles and vary in colour from yellow towards red at the end of their lifespan. The change in colour at each step is calculated as follows:


particle[i].r *= 1.1;
particle[i].g *= (particle[i].life/2);
particle[i].b += (1/particle[i].life);

This makes the particles become more red as they age - starting from yellow moving through orange.  The best results visually are when the fade time of particles are set very low so they disappear quickly.

A screenshot of fire and smoke side by side:

Tuesday, March 22, 2011

Update Smoke Particles in a CUDA Kernel

The smoke update step is now performed on the GPU using the following method:

updateSmoke<<<gpuGridSize, gpuBlockSize>>>(gpuParticleArray, MAX_PARTICLES, interval );

to the kernel function:


/*Update smoke particle positions*/
__global__ void
updateSmoke(particle *particle, int timeInterval) {

int i = blockIdx.x * blockDim.x + threadIdx.x;

//Pos += speed.x * lifetime/interval
particle[i].x += ((particle[i].xv * (1.0f-particle[i].life))/LIFE_TIME) * timeInterval;// Greater dispersion for older particles
particle[i].y += (particle[i].yv/LIFE_TIME) * timeInterval;
particle[i].z += ((particle[i].zv * (1.0f-particle[i].life))/LIFE_TIME) * timeInterval;  // Greater dispersion for older particles

// Speed change based on gravity
particle[i].xv += particle[i].xg * timeInterval;
particle[i].yv += particle[i].yg * timeInterval;
particle[i].zv += particle[i].zg * timeInterval;

//Update the lifetime based on the fade/mass value.
particle[i].life -= particle[i].fade * timeInterval;
}

which was adapted from Encelo's smoke tutorial (http://encelo.netsons.org/programming/opengl) and modified into a CUDA kernel for parallel access of particles.

The demo can now render up to about 8,000 smoke particles (combined with the water particles and terrain) and stay around 30 frames per second. The limitation that I've encountered now is in handling the re-initialization of dispersed particles. Since the method I use relies on the standard random number generator in C, it cannot be used in a CUDA kernel. I've looked at implementing a basic random number generator kernel in CUDA, but have yet to obtain desirable results. 

Here's a shot of the particles in action:
Next week I plan on finishing the random number generator in CUDA to improve re-initialization of particles and then proceeding to fire simulation. 

Wednesday, March 16, 2011

Smoke Particle Effect

The current particle smoke calculations are performed on a single processor and limited to 2000 particles per smoke instance.  After 10,000 particles are on screen the performance starts to decrease much below 30 fps. I'm currently working on implementing a 3D CUDA function that will perform the smoke step operation, but I'm not getting proper data out of it yet:

pos.x += speed.x * lifetime/interval
pos.y += speed.y * interval
pos.z += speed.z * lifetime/interval

and then update the lifetime based on the fade/mass value.

I currently have a few GLUI controls set up for changing parameters of the smoke. They are fade/mass, size and colour. Below are some screenshots with different settings:




I'm hoping for the ability to render up to 30,000 particles (or more) with CUDA step operation compared to the current frame-rate using single instructions. There are also many more GLUI controls that could be added to change the affect of gravity on the smoke and its speed.

Tuesday, March 1, 2011

CUDA Smoke Simulation

The main differences in a fluid solver for smoke compared to water are as follows:

  • Decrease in viscosity
  • Addition of density, mass and temperature factors
  • Buoyancy based on the additional factors
Obviously buoyancy is related to water as well, but since the water I rendered was on a fixed plane in 2D this factor was not necessary. Buoyancy will be needed for smoke movement affecting its position along the Y-axis. GPU Gems (Chpt. 38)  refers to the buoyancy force as follows: 

where k is the mass factor and d is the smoke density - s  a constant time scale factor and T the temperature compared to To the ambient temperature.

I plan to implement this function in my current CUDA kernel by adding the value at each fragment to the velocity grid in the Y-direction. This should simulate the effect of gravity based on the weight of the smoke.

GLSL Bump Mapping

Following the 'Swiftless' GLSL tutorial at http://www.swiftless.com/tutorials/glsl/8_bump_mapping.html bump mapping is now working on the water while stationary. Generating a normal map while animating still gives a frame rate less than 1 however, making the demo unusable.

While stationary, the bump map is convincing:


















I'm thinking it would be possible to calculate and output the normal map only when saving the actual textures to disk. That way the animation would be less choppy while interacting with it.

At this point I've added  GLUI controls to adjust the water level and colour, as well as assigning record and print keys for animation. I'm going to move onto smoke simulation next, and likely work out the normal map output at some point.

GLSL Multitexturing

After a bit of a refresh on OpenGL 2's GLSL, I managed to get a multitexturing fragment shader working for water. The fragment code is as follows:


varying vec3 lightDir,normal;
uniform sampler2D color_texture;
uniform sampler2D color_texture2;

void main() {
float diffuse, alpha;

diffuse = max(dot(lightDir,normalize(normal)),0.0);

diffuse = diffuse * gl_FrontMaterial.ambient.rgb;
alpha = gl_FrontMaterial.diffuse.a;

vec3 color = diffuse * (texture2D(color_texture, gl_TexCoord[0].st).rgb +
           texture2D(color_texture2, gl_TexCoord[0].st).rgb));

gl_FragColor = vec4(color, alpha);
}

The results for various coloured textures are as follows:



Wednesday, February 16, 2011

More Bump Mapping woes

One item I was previously missing in the equation was adding a set of texture coordinates for the second texture unit.  But, even after correcting this the bump mapping still doesn't appear as desired.

From what I've continued to research after trying several methods is that I may be missing a tangent space vertex light calculation that I didn't think was necessary for Dot3 bump mapping.  As you can see in the screenshot below, specular light is included in the scene, but is not shown on the water.  Also, detail in the original texture appears to be lost and just replaced by the bumps in the normal map.. rather than being combined.


Still the best result is without using bump mapping:

Depending on how much time I have over the break I may take a look at rendering a bump map using a GLSL shader instead of Dot3. Otherwise, I think it's about time to move onto smoke simulation since I'm reasonably happy with the texture output and appearance for water so far.

Other benefits of GLSL include..:
-More straightforward bump mapping/maybe easier to debug
-Easier reflection mapping onto water
-Easy to add a cosine function for 3D waves and/or Perlin noise

Tuesday, February 15, 2011

Alpha Blend and Normal improvements

As it turns out, the main reason why textures (and light) weren't looking quite right was that there weren't enough vertices and normals on the plane I was rendering onto.  Recreating the plane to be basically a flat slab of 256x256 terrain made a big improvement in lighting, alpha blending and texturing.  The alpha blending now properly shows the terrain below water and properly obscures areas of higher velocity.

Tuesday, February 8, 2011

Colour blend v.s. Normal Mapping

Progress this week includes optimizations to the output of rendered textures and comparisons between using colour blending v.s. a normal mapped texture combination.

By storing rendered textures and delaying writing to disk until a key is pressed, the demo now continues to animate at 30 fps.  This also allows better control over animation and produces more natural looking animated textures.

Currently, using colour blended with the rendered texture rather than using multiple textures (the rendered texture and a normal map) seems to achieve a better result visually.  Also, rendering a proper normal map as animation occurs proves to be quite difficult.  I encountered texture flickering issues which I'll continue to work out, but for now it appears the best method to continue with Fire and Smoke simulation will be to use blended colours with the rendered texture.

Normal mapped texture combination - appears dark, but additional detail is visible.  I'm not convinced this is working properly yet.

Blended colours - still currently gives the best visual result

Tuesday, February 1, 2011

Keyframe Animated Textures

Rendered textures can now be saved to a file using a simple image writing library from http://www.lonesock.net/soil.html.  This takes care of creating Bitmap header info and writing the data from a texture.  It works very well with OpenGL since it uses the same RGBA byte ordering.

Below is the output of 16 keyframes over roughly 30 seconds:

16 keyframes each scaled to 256x256

Unfortunately, attempting to write the texture to file once every second (or ideally once every frame) decreases the frame-rate to a point where it is difficult to interact with the water.  Next I will focus on allocating and rendering to an array of textures and performing all write operations after rendering is complete. Also, the large white alpha areas showing on the rendered textures will have to be coloured/handled differently. The smaller alpha areas actually seem to appear as a reflection which is interesting, but I hope to achieve a better effect using a normal map.

Flowing River & Render to Texture v.s. Rendering Points

After much manipulation, the velocity grid is now rendering very well to a 1024x1024 texture attached to a plane intersecting the terrain.  Rendering a moving velocity grid as points v.s. rendering off-screen to a texture and applying the texture to a plane resulted in the following findings:

512x512 grid points to screen: 60+ fps
512x512 grid to 512x512 texture map: 60+ fps

1024x1024 grid points to screen: 20+fps
1024x1024 grid to 1024x1024 texture map: 30+ fps

As the rendering to texture method is performed off-screen it seems to be the more resolution scalable solution.  It also results in a convincing water effect when combined with terrain:

Wednesday, January 26, 2011

Rendering to a Texture

Maintaining the correct viewport and projection/model matrices proved to be the most difficult task in rendering particles to a texture.  The viewport must be set to the same size as the texture attached to the framebuffer when rendering to a texture and this causes variations in the perspective from the default window framebuffer.  The velocity grid must be aligned such that it intersects the (texture framebuffer's) viewing frustrum perfectly.  Unfortunately, an orthogonal projection will not work in this case since z values are needed for the depth render buffer attached to the texture framebuffer.

I managed to get the particles rendering to a texture (still animating in real-time) and apply them to a simple cube:

















Next I will focus on rendering and animating this texture in the terrain scene and continuing to modify colour values for the output textures.  Also, currently only one texture is rendered at the end of the velocity computation, but ideally multiple textures will be rendered and combined later on.

Tuesday, January 25, 2011

Framebuffer Objects

Rather than rendering individual points for each position within our velocity grid we can render this data into a texture using a Framebuffer Object.  Specifying an additional Framebuffer (other than the default OpenGL Framebuffer) can be set up as follows:



Once the FBO is set up and we bind it during rendering, all draw calls will be written to the off-screen buffer and copied to the specified texture.  If the particle grid is rendered at each stage of the velocity and pressure calculation the resulting textures can then be combined to form an animated texture at each time step.  Applying this texture to a surface will make it appear to be fluid-like even though it is only 2D.  With more ability for post processing, rendering to texture seems like a better approach than simply rendering particles as points.

Tuesday, January 18, 2011

Point Based Water Progress

I've managed to manipulate the perspective, size and viscosity of the NVidia Fluids CUDA sample code:
http://developer.download.nvidia.com/compute/cuda/sdk/website/Physically-Based_Simulation.html

It is now displayed in a 3D perspective and similar size to the terrain that is below it.  The colour has been changed and the viscosity has been reduced to feel more like water.

Next I will take a look at changing the individual point colour values in another CUDA kernel function to try to make the colour more realistic.  Also, work needs to be done to figure out how to properly display the terrain within the water (depth issues).

Introduction

This blog will serve as record keeping for work using NVidia's CUDA technology to simulate fluid effects including: water, smoke and fire.

Numerous resources have been helpful in grasping CUDA concepts located at:
[1]http://developer.nvidia.com/object/gpucomputing.html

Two chapters from GPU Gems 1 & 3 have been the most helpful for this application:
[2]http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html - this chapter deals with creating a grid based Navier-Stokes equation solver that writes data to textures for fluid simulation and extensions for cloud or smoke simulation
[3]http://http.developer.nvidia.com/GPUGems3/gpugems3_ch30.html - this chapter outlines a more dynamic and complex approach to render different types of 3D fluids

The classic method of solving fluids is by using either a 2D or 3D Navier-Stokes equation solver:



where p is the pressure, r is the mass density, f represents any external forces (such as gravity), and U2207.GIF is the gradient differential operator [3]


In short, solving for the velocity u at positions within a grid can be used to simulate different types of fluid based on the other parameters.