Saturday, April 9, 2011

Final Report

Introduction

NVidia’s Compute Unified Device Architecture (CUDA) is able to provide parallel computation for physically based systems of millions of particles directly on a graphics card. Specifically on a Geforce GTX 260 I’ve been able to simulate and render over a million interactive fluid particles. This document will include topics that I’ve researched, learned and implemented through the semester in CIS 4900 Special Topics. Knowledge was gained in various areas including developing kernels with CUDA for C, implementing algorithms for physically based fluid effects and experience with less common functions of OpenGL and C.

Setting up CUDA within Microsoft Visual Studio requires addition of the CUDA runtime API and a link to the nvcc compiler. Multiple CUDA utility libraries must be added to a project before it is possible to compile a CUDA kernel function to run on the GPU. Also, Dynamic Link Libraries must be present in the same folder as the compiler output in order to run.

CUDA kernel functions can be declared as either global or device kernels. Global kernels can be called from C whereas device kernels can only be called from within other CUDA kernels. Global kernels must be passed a GPU grid size and block size which refer to the number of threads that will be allocated for each block of data. When calling a global CUDA kernel you can pass it primitive data types and full data structures.


Objectives

The objectives of this course were to create water, fire and smoke simulations on a GPU using parallel computation for the effects. They were to be animated and interactive in real-time and allow for output of the effects to a set of textures. These textures could then be used on a memory constrained device to display high quality animated effects. The final program should be a demo of a desktop-based particle effect generator that can output animated textures.


Algorithm Overview

Water simulation consists of four major steps: addition of forces, advection and diffusion of velocity, and advection of particles. Addition of forces is simply determining velocities from user input. Then, a velocity grid is used to store data about each particles movement and this is traced through time using bilinear interpolation at the velocity advection step. Velocity diffusion then takes the velocity grid and applies viscosity and wave properties to restrict particle motion. Finally these resulting velocities are applied to the particles during particle advection at each time step.

Smoke and fire simulation are broken down simply into the following components: particle initialization, particle simulation and particle rendering. Initialization of particles handles randomizing properties so that colours, size, velocity and position vary for each particle. The simulation is based on a reduction in each particles lifetime that affects its transparency, size, colour, velocity and position. The particles are then either drawn if they are still “alive” or reinitialized if they are not.


Implementation Details

For water simulation an array of floats is passed to each kernel function and for smoke simulation an array of particle structures is used. All data must be copied into CUDA memory on the graphics card before running a kernel that operates on the data.

Initially with water simulation I began by adapting a 2D fluid simulation from NVidia. To move the simulation into 3D required adding a depth (z) component for the points in each kernel. The result was a projected plane of points in 3D. Then, these points were rendered to a texture attached to a FrameBuffer. Finally, the texture was applied to a plane intersecting a terrain map to appear as water.

Smoke and fire simulation can be performed using fewer particles than required for water due to the low density of smoke and fire. Maintaining a particle structure that contains the lifetime, speed, size and colour of each particle allowed the simulation kernels to easily modify each particle in the array in parallel.

Positions of particles slowly move upwards on each step and velocities increase over time. The lifetime of each particle is reduced based on a “fade” value that can be modified in the user interface. The user interface also contains size and starting colour parameters. For smoke simulation this simply represents the upper bound grey value before randomization. For fire simulation this represents the starting value of red and green and adjusts the affect of blue’s colour contribution.


Results

Attempting to simply render the points to appear as water within a terrain map was not possible since the points needed to ignore the depth buffer. The solution to this was rendering to a texture using a Framebuffer. This worked very well and allowed for animated water textures to be exported from the application easily. After adjusting the viscosity level and colour of the water particles they became much more realistic in the texture. Adding alpha blending and GLSL bump mapping produced the best result while the water particles were stationary.

In smoke and fire simulations the performance gain compared to calculating each step on the CPU was almost double. However, performance was limited by requiring a random number generator to re-initialize particles. This was solved using a table of random numbers in CUDA to increase performance.


Limitations

Computing normal maps for water particles in real-time ended up decreasing system performance below a manageable frame-rate. The best result performance-wise and visually was to not calculate a normal map while water is animating – only when textures are written to files.

Writing textures to a file is a costly operation due to the time required to write to a hard disk. To prevent slowdowns while rendering I set up record and print keys to handle when to record textures from each frame in memory and then write them to files all at once.

Since all CUDA kernels operate in parallel it is not possible to access standard C library methods from within a kernel. This means that random number generators and other convenience utilities need to be recreated as a kernel function. I chose to use a lookup table of random values to have apparent randomness in the smoke simulation for simplicity. The random table is then accessed in a device kernel from the global kernel for updating smoke particles.


Experience Gained

I gained a great amount of knowledge in calculating normal maps, DOT3 bump mapping techniques and GLSL bump mapping for this part of the project. I became much more familiar with OpenGL techniques that I previously didn't have much experience with including rendering to a texture, writing textures to a file, GLUI interfaces, and multi-texturing with transparency. Rendering to a texture is very useful in situations where you would like to modify textures during runtime.

GLUI simplified setting many parameters especially for the smoke simulation – allowing me to find the parameters that give the best result visually. Multi-texturing gave much more realistic looking water, but care must be taken to ensure that each texture uses the same alpha value.

I learned that limiting the number of memory copies and CPU side calculations greatly increases the performance gained from using CUDA. I also grasped the CUDA for C syntax and parallel computing methods while applying algorithms for physically based effects. This was a useful application of knowledge I’ve gained in other Computer Science and Physics courses. I also feel much more confident in my ability to use OpenGL and GLSL in a Visual Studio project.


Conclusion

In conclusion, the course has tested my ability to maintain a reasonably large sized project in Visual Studio while allowing me to apply knowledge of physically based effects in a parallel GPU computing environment. Also, I've strengthened my knowledge of OpenGL, GLSL and managing input and output of files in Windows. I am now able to apply the skills I've gained to create CUDA functions that avoid the need to iterate over arrays while computing. The application will now serve as part of my portfolio and also as a tool to produce animated texture effects for use on mobile devices.

Wednesday, April 6, 2011

CUDA Fluid Effects In Action

Here's a short video I put together using the effects:

Fire Effect

The only major differences in rendering a fire particle effect compared to the previous smoke particle effect was in the colour and density of the particles. Fire particles are smaller and more dense than smoke particles and vary in colour from yellow towards red at the end of their lifespan. The change in colour at each step is calculated as follows:


particle[i].r *= 1.1;
particle[i].g *= (particle[i].life/2);
particle[i].b += (1/particle[i].life);

This makes the particles become more red as they age - starting from yellow moving through orange.  The best results visually are when the fade time of particles are set very low so they disappear quickly.

A screenshot of fire and smoke side by side:

Tuesday, March 22, 2011

Update Smoke Particles in a CUDA Kernel

The smoke update step is now performed on the GPU using the following method:

updateSmoke<<<gpuGridSize, gpuBlockSize>>>(gpuParticleArray, MAX_PARTICLES, interval );

to the kernel function:


/*Update smoke particle positions*/
__global__ void
updateSmoke(particle *particle, int timeInterval) {

int i = blockIdx.x * blockDim.x + threadIdx.x;

//Pos += speed.x * lifetime/interval
particle[i].x += ((particle[i].xv * (1.0f-particle[i].life))/LIFE_TIME) * timeInterval;// Greater dispersion for older particles
particle[i].y += (particle[i].yv/LIFE_TIME) * timeInterval;
particle[i].z += ((particle[i].zv * (1.0f-particle[i].life))/LIFE_TIME) * timeInterval;  // Greater dispersion for older particles

// Speed change based on gravity
particle[i].xv += particle[i].xg * timeInterval;
particle[i].yv += particle[i].yg * timeInterval;
particle[i].zv += particle[i].zg * timeInterval;

//Update the lifetime based on the fade/mass value.
particle[i].life -= particle[i].fade * timeInterval;
}

which was adapted from Encelo's smoke tutorial (http://encelo.netsons.org/programming/opengl) and modified into a CUDA kernel for parallel access of particles.

The demo can now render up to about 8,000 smoke particles (combined with the water particles and terrain) and stay around 30 frames per second. The limitation that I've encountered now is in handling the re-initialization of dispersed particles. Since the method I use relies on the standard random number generator in C, it cannot be used in a CUDA kernel. I've looked at implementing a basic random number generator kernel in CUDA, but have yet to obtain desirable results. 

Here's a shot of the particles in action:
Next week I plan on finishing the random number generator in CUDA to improve re-initialization of particles and then proceeding to fire simulation. 

Wednesday, March 16, 2011

Smoke Particle Effect

The current particle smoke calculations are performed on a single processor and limited to 2000 particles per smoke instance.  After 10,000 particles are on screen the performance starts to decrease much below 30 fps. I'm currently working on implementing a 3D CUDA function that will perform the smoke step operation, but I'm not getting proper data out of it yet:

pos.x += speed.x * lifetime/interval
pos.y += speed.y * interval
pos.z += speed.z * lifetime/interval

and then update the lifetime based on the fade/mass value.

I currently have a few GLUI controls set up for changing parameters of the smoke. They are fade/mass, size and colour. Below are some screenshots with different settings:




I'm hoping for the ability to render up to 30,000 particles (or more) with CUDA step operation compared to the current frame-rate using single instructions. There are also many more GLUI controls that could be added to change the affect of gravity on the smoke and its speed.

Tuesday, March 1, 2011

CUDA Smoke Simulation

The main differences in a fluid solver for smoke compared to water are as follows:

  • Decrease in viscosity
  • Addition of density, mass and temperature factors
  • Buoyancy based on the additional factors
Obviously buoyancy is related to water as well, but since the water I rendered was on a fixed plane in 2D this factor was not necessary. Buoyancy will be needed for smoke movement affecting its position along the Y-axis. GPU Gems (Chpt. 38)  refers to the buoyancy force as follows: 

where k is the mass factor and d is the smoke density - s  a constant time scale factor and T the temperature compared to To the ambient temperature.

I plan to implement this function in my current CUDA kernel by adding the value at each fragment to the velocity grid in the Y-direction. This should simulate the effect of gravity based on the weight of the smoke.

GLSL Bump Mapping

Following the 'Swiftless' GLSL tutorial at http://www.swiftless.com/tutorials/glsl/8_bump_mapping.html bump mapping is now working on the water while stationary. Generating a normal map while animating still gives a frame rate less than 1 however, making the demo unusable.

While stationary, the bump map is convincing:


















I'm thinking it would be possible to calculate and output the normal map only when saving the actual textures to disk. That way the animation would be less choppy while interacting with it.

At this point I've added  GLUI controls to adjust the water level and colour, as well as assigning record and print keys for animation. I'm going to move onto smoke simulation next, and likely work out the normal map output at some point.