Saturday, April 9, 2011

Final Report

Introduction

NVidia’s Compute Unified Device Architecture (CUDA) is able to provide parallel computation for physically based systems of millions of particles directly on a graphics card. Specifically on a Geforce GTX 260 I’ve been able to simulate and render over a million interactive fluid particles. This document will include topics that I’ve researched, learned and implemented through the semester in CIS 4900 Special Topics. Knowledge was gained in various areas including developing kernels with CUDA for C, implementing algorithms for physically based fluid effects and experience with less common functions of OpenGL and C.

Setting up CUDA within Microsoft Visual Studio requires addition of the CUDA runtime API and a link to the nvcc compiler. Multiple CUDA utility libraries must be added to a project before it is possible to compile a CUDA kernel function to run on the GPU. Also, Dynamic Link Libraries must be present in the same folder as the compiler output in order to run.

CUDA kernel functions can be declared as either global or device kernels. Global kernels can be called from C whereas device kernels can only be called from within other CUDA kernels. Global kernels must be passed a GPU grid size and block size which refer to the number of threads that will be allocated for each block of data. When calling a global CUDA kernel you can pass it primitive data types and full data structures.


Objectives

The objectives of this course were to create water, fire and smoke simulations on a GPU using parallel computation for the effects. They were to be animated and interactive in real-time and allow for output of the effects to a set of textures. These textures could then be used on a memory constrained device to display high quality animated effects. The final program should be a demo of a desktop-based particle effect generator that can output animated textures.


Algorithm Overview

Water simulation consists of four major steps: addition of forces, advection and diffusion of velocity, and advection of particles. Addition of forces is simply determining velocities from user input. Then, a velocity grid is used to store data about each particles movement and this is traced through time using bilinear interpolation at the velocity advection step. Velocity diffusion then takes the velocity grid and applies viscosity and wave properties to restrict particle motion. Finally these resulting velocities are applied to the particles during particle advection at each time step.

Smoke and fire simulation are broken down simply into the following components: particle initialization, particle simulation and particle rendering. Initialization of particles handles randomizing properties so that colours, size, velocity and position vary for each particle. The simulation is based on a reduction in each particles lifetime that affects its transparency, size, colour, velocity and position. The particles are then either drawn if they are still “alive” or reinitialized if they are not.


Implementation Details

For water simulation an array of floats is passed to each kernel function and for smoke simulation an array of particle structures is used. All data must be copied into CUDA memory on the graphics card before running a kernel that operates on the data.

Initially with water simulation I began by adapting a 2D fluid simulation from NVidia. To move the simulation into 3D required adding a depth (z) component for the points in each kernel. The result was a projected plane of points in 3D. Then, these points were rendered to a texture attached to a FrameBuffer. Finally, the texture was applied to a plane intersecting a terrain map to appear as water.

Smoke and fire simulation can be performed using fewer particles than required for water due to the low density of smoke and fire. Maintaining a particle structure that contains the lifetime, speed, size and colour of each particle allowed the simulation kernels to easily modify each particle in the array in parallel.

Positions of particles slowly move upwards on each step and velocities increase over time. The lifetime of each particle is reduced based on a “fade” value that can be modified in the user interface. The user interface also contains size and starting colour parameters. For smoke simulation this simply represents the upper bound grey value before randomization. For fire simulation this represents the starting value of red and green and adjusts the affect of blue’s colour contribution.


Results

Attempting to simply render the points to appear as water within a terrain map was not possible since the points needed to ignore the depth buffer. The solution to this was rendering to a texture using a Framebuffer. This worked very well and allowed for animated water textures to be exported from the application easily. After adjusting the viscosity level and colour of the water particles they became much more realistic in the texture. Adding alpha blending and GLSL bump mapping produced the best result while the water particles were stationary.

In smoke and fire simulations the performance gain compared to calculating each step on the CPU was almost double. However, performance was limited by requiring a random number generator to re-initialize particles. This was solved using a table of random numbers in CUDA to increase performance.


Limitations

Computing normal maps for water particles in real-time ended up decreasing system performance below a manageable frame-rate. The best result performance-wise and visually was to not calculate a normal map while water is animating – only when textures are written to files.

Writing textures to a file is a costly operation due to the time required to write to a hard disk. To prevent slowdowns while rendering I set up record and print keys to handle when to record textures from each frame in memory and then write them to files all at once.

Since all CUDA kernels operate in parallel it is not possible to access standard C library methods from within a kernel. This means that random number generators and other convenience utilities need to be recreated as a kernel function. I chose to use a lookup table of random values to have apparent randomness in the smoke simulation for simplicity. The random table is then accessed in a device kernel from the global kernel for updating smoke particles.


Experience Gained

I gained a great amount of knowledge in calculating normal maps, DOT3 bump mapping techniques and GLSL bump mapping for this part of the project. I became much more familiar with OpenGL techniques that I previously didn't have much experience with including rendering to a texture, writing textures to a file, GLUI interfaces, and multi-texturing with transparency. Rendering to a texture is very useful in situations where you would like to modify textures during runtime.

GLUI simplified setting many parameters especially for the smoke simulation – allowing me to find the parameters that give the best result visually. Multi-texturing gave much more realistic looking water, but care must be taken to ensure that each texture uses the same alpha value.

I learned that limiting the number of memory copies and CPU side calculations greatly increases the performance gained from using CUDA. I also grasped the CUDA for C syntax and parallel computing methods while applying algorithms for physically based effects. This was a useful application of knowledge I’ve gained in other Computer Science and Physics courses. I also feel much more confident in my ability to use OpenGL and GLSL in a Visual Studio project.


Conclusion

In conclusion, the course has tested my ability to maintain a reasonably large sized project in Visual Studio while allowing me to apply knowledge of physically based effects in a parallel GPU computing environment. Also, I've strengthened my knowledge of OpenGL, GLSL and managing input and output of files in Windows. I am now able to apply the skills I've gained to create CUDA functions that avoid the need to iterate over arrays while computing. The application will now serve as part of my portfolio and also as a tool to produce animated texture effects for use on mobile devices.

Wednesday, April 6, 2011

CUDA Fluid Effects In Action

Here's a short video I put together using the effects:

Fire Effect

The only major differences in rendering a fire particle effect compared to the previous smoke particle effect was in the colour and density of the particles. Fire particles are smaller and more dense than smoke particles and vary in colour from yellow towards red at the end of their lifespan. The change in colour at each step is calculated as follows:


particle[i].r *= 1.1;
particle[i].g *= (particle[i].life/2);
particle[i].b += (1/particle[i].life);

This makes the particles become more red as they age - starting from yellow moving through orange.  The best results visually are when the fade time of particles are set very low so they disappear quickly.

A screenshot of fire and smoke side by side: