Atulit Kumar

#### Parallel computing using GPUs

Here are some of the projects I worked on which utilized the massively parallel compute capability of Nvidia GPUs. This page is still under construction and I will be adding my other projects really soon.

Parallel Fractal Generation

The aim of this project was to generate Mandelbrot images using Pthreads. A Mandelbrot set is a mathematical set of points whose boundary is a distinctive and easily recognizable 2D fractal shape. Later I implemented this first using ISPC and later using CUDA to further speedup the process.

A Simple CUDA Renderer

The aim of this project was to write a parallel Renderer in CUDA that draws colored circles.. Apparently you can do a lot with 'colored circles' :)

Presently, I am working on further optimizing the code.

Color to Grayscale conversion

The aim of this project was to convert a Color jpg image to a Grayscale one. Initially I tried to just take the RGB channels of the image and compute the new luma component by averaging out the values i.e. Y=(R+G+B)/3 for each pixel. But I later realized that that our eyes perceive the colors differently. Our eyes are most sensitive to green and least sensitive to Blue. I tried different weighing of color channels before using the following formula to compute the luma component:

Y' = 0.299 * R +0.587 * G + 0.114 * B.

I got this formula from the wiki page of Grayscale.

This is the original image.

This is the image that was blurred using the Gaussian Blur with a 9 px radius and Sigma as 4.0.

This is the image that was blurred using the Gaussian Blur with a 9 px radius and Sigma as 2.0.

This is the original image.

Gaussian Filter for smooth blurring

The aim of this project was to blur an image with the help of Gaussian Filter. This seemingly simple project was easy to implement but much, much harder to optimize and therefore reduce the compute time. I started off with a compute time of 62.6 ms for this 1280x1600 image of the Leaning Tower of Pisa and was initially happy with the result. But, later I felt that it could be optimized. Instead of separating each channel and computing them individually and then recombining them, I could remove the the separation and the recombination part! This itself gave a 2x boost in speed. After some more optimizations (using shared memory and not recomputing the already computed data), I finally got it down to just over 17 ms! That is almost 3.7x faster than my original implementation! I am reading about Gaussian blurring in general and hope to further decrease the compute time.

More about Gaussian Filter here.