New approaches
and possibilities in computer graphics using high level shading languages
Karlis Salitis, Student, Cs
Daugavpils University, Daugavpils, Latvia
e-mail: ksalitis@hotmail.com
Computer graphics can be
widely used in many areas of our work and entertainment. There are a lot of
approaches we can choose to render something. Our task is to concentrate on the
biggest sector of computer graphics, PC graphics, and learn more about it.
In the research we want to
look at the new approaches in graphics path, that gives both hardware
acceleration and incredible flexibility in terms of possible effect creation.
During the process we are
also eager to find, what current solutions we have and what kind of
improvements we might expect in future, from both software and hardware.
Computer graphics is
advancing increasingly fast nowadays. Incredible effects are merged with
movies, there are a huge number of cartoons fully designed on computers, and
graphic in games is starting to approach movie like quality.
Despite the fact that computer graphic might seem to
be straightforward science, there are a lot of approaches to do one thing, and
mostly, the way you’ve chosen will change the rendered output radically.
There are two forces, though, that drive graphic
industry, the first is the look of the image we render “If it looks like
computer graphics, it is not good computer graphics” – Jeremy Birn, and the
second, is the amount of time we need to spend, in order to get image done. The
ratio between them is usually the main factor that lets us decide when we use
one approach and when another.
Best quality rendering is usually achieved using ray
tracing, radiosity, or photon mapping [1], as they take in account not only
object we are drawing, but also its surroundings, so, effectively dealing with
reflections, refractions and other phenomenons. But, unfortunately it is
overkill to use them for daily tasks, as time to render one frame might take
weeks on standard PC. These approaches are mostly utilized in movie production
and digital imaging.
PC graphics, on the other hand, is usually required
for simulation of processes, or for entertainment. High quality static images
are not what we are looking for, we need PC to be able to render images in
real-time, which means to update image at least 10 times per second. Visual
flaws usually go by just fine if process itself looks smooth.
But the task of this paper is not to analyze every approach, instead we want to concentrate on most widely used one, and show how we could use it more efficiently by taking advantage of recent improvements in hardware and software. It's obvious that computer clusters are not so widespread to be considered widely used, so we'll stay with PC graphics.
There are many ways to do graphics on PC, starting
from simple raster graphic up to ray-traced images, but as you may already
guessed, most of the approaches are too limited or too slow to be useful. But
there is also a halfway, that offers best of both worlds - existence of two
industry supported graphic application programming interfaces – OpenGL [2] and
DirectX [3]. These interfaces are nothing more than two big sets of graphic
manipulation commands, but the power of them hides is in the hardware support,
and worldwide usage.
Lets take look at the principles behind them. Both
of these interfaces have common structure – graphic tasks are ordered in a
pipeline, which describes what is happening with the data we pass and in what
order. Geometrical data is passed in a vector form along parameters, which
describe how it should be handled, and what calculations must be applied.
Resulting image is rendered to screen or to chosen pixel buffer.
Graphic pipeline can be divided in three stages:
application, geometry and rasterizer, where each stage is responsible for
specific task.
Application stage is the entry point. It is the base
of our application where we decide which data must be drawn, set up things we
want or don't want to see, do the physic calculations and user input handling.
Basically we feed the next stage with data we want to draw and set how it will
be handled further.
Geometry stage, works only with vertex data passed from previous stage and is subdivided in 5 sub stages (Fig. 1).
The first sub stage transforms [4] all object
coordinates relative to eye position for further light calculation. Second sub
stage applies Phong lighting model [5] on every vertex, that's lit by one or
more lights, and passes data to rasterizer stage. Projection sub stage is
transforming geometry coordinates too, but this time, they are transformed to
unit cube, which represents viewing volume. Clipping stage makes sure that
after projection there are no vertices outside the volume, if there is,
vertices are destroyed or created, so that there are no geometry outside of it.
Finally, unit cube is stretched so that its corners match screen coordinates,
and passes further.
Finally, rasterizer stage takes shapes one by one
and transforms into pixels. It also assigns color values from texture if
applicable, multiply with light intensity and output pixel if it is not
occluded. Blending operations might occur during this stage too.
Such architecture is very handy, since programmer
has to worry about application stage mostly, API[1]
does the rest. However, there might be cases when such model is very limiting.
What if you want to override API lighting function with more precise one -
there is simply no way to do that. This is the reason why this approach is
often referenced as fixed-function pipeline.
But there is one field fixed-function pipeline
excels in, and that is optimization. Because, of the fact, that pipeline is
always working the same way, it is possible to optimize performance of each
stage. Moreover, n staged pipeline can give up to n times speed boost, because
when 1st stage has finished it's job and provided information to 2nd
stage, it can start to work on next chunk, without stalling.
If you have paid attention to computer prices, then
you might have noticed that high-end video card is quite expensive component of
PC. The reason for that is that video cards nowadays are becoming more complicated,
by taking many tasks, stages, off the CPU[2].
It's impossible to imagine situation, where CPU
would be responsible for rasterization now, as it is one of the most time
demanding tasks, but during 90-ies it was a common practice.
As we mentioned earlier, APIs are industry
supported, in real world it means that hardware manufacturers implement more
features into hardware, and supply API function code with their drivers. When
we call one of API's functions, we are calling function supplied with our hardware,
and if video card can perform the task without CPU, it does so.
But there is one more thing that makes hardware
optimization very simple for pipelined approach. Contrary to mathematical
problems, pixels or vertices are not affecting each other directly. Lets look
at rasterization for example – when geometrical primitive has been rasterized
in pixels, we can work with all pixels at the same time, if we wish, as only
things that bothers us are the current texture coordinates and light
intensities. Even if one pixel occludes another, they are discarded during
depth test later on. So it's a common trend among hardware manufacturers to
increase pixel and vertex pipelines on their graphic chips. For example nVidia
GeForce 6800GT/Ultra can work with 6 vertices and 16 pixels in parallel, so
giving huge performance boost that would be impossible to achieve even on a
20Ghz CPU.
In order to better understand graphic development
tendencies and problems we should take a brief look into history of video card
development.
Based on classification used in book “The Cg
Tutorial”, video cards have gone trough 5 generations. Cards produced till 1990
could be named as additional generation, but because of the fact that they
existed only as storage area for monitor to read, they aren’t.
So all cards produced after VGA controllers and up
till 1998 are considered to belong to the first generation. The cards only task
was to accelerate rasterization, by offloading it from CPU to video card.
The second generation of cards appeared around 1999.
And you can recognize them by marketing buzzword TnL, which stands for
transform and lighting. These cards implemented geometry stage in hardware, so
leaving CPU to work with application stage only.
The third generation, came with an innovative idea -
allow to reprogram fixed function pipeline in the geometry stage, so
effectively replacing lighting and geometry calculations, but still leaving
program execution on video card side. We know these cards as GeForce3/4,
excluding MX series. These cards also included limited configurability for
pixel processing, so allowed to perform some calculations fixed-function
pipeline rasterizer stage wasn't supposed to.
Based on success of the third generation forth generation
of cards appeared. Because of the effects possible to achieve using third
generation cards reprogrammable vector pipes, it was clear that similar
flexibility is more than welcome at rasterizer stage too. First programmable
pixel pipelines appeared on video cards. The most famous card of this
generation is Radeon 9700/9800 as it was first on the market and allowed to use
fragment programs on rasterizer side reasonably fast. This generation of cards
is simple to use, but unfortunately they are limited feature-wise. There are
problems concerning precision/speed issues, but the main flaw is highly limited
instruction count, and absence of true branching. Speed in case of longer
fragment programs suffered too.
Finally, this year, new generation of cards appeared
in form of GeForce 6 series. These cards have approximately 2x times the
shading power of previous generation, have true branching, floating point
texture types with filtering options and possibility to access textures during
geometry stage for approaches like displacement mapping. There are some flaws,
but it's clear that programmable pipelines are definitely the key aspect for
hardware manufacturers now.
If we are using vertex program in geometry stage,
then it will be executed for every vertex we pass. Similarly, if we have
enabled fragment program, it will execute for every fragment, even if it's
occluded later. Newer cards have implemented approach known as early Z, by
moving depth test for fragments before fragment program, so possibly rejecting
pixel without executing program for it. Still, early Z won’t work with fragment
programs that change pixels depth value on the fly or in case blending is
enabled.
Parallel processing in case of reprogrammed pipes
remain, so if we have 16 pixel units, they will work with 16 fragments
simultaneously.
There is a suspicion that early exits from branches
might not work as expected on current generation of hardware and will wait
until all parallel pixel units will finish their job.
But now, lets take a look what happens when we replace fixed-function pipeline. As we have already seen, our program can substitute vertex or pixel processing, so effectively changing API built in algorithms. There is a limitation, though, if we are replacing part of the pipeline, then we must also take care of all functions it should perform. For example, if we are using fog in our application, we should remember to calculate fog coordinate in vertex program, so that pixel pipeline, in case it's not overridden, could complete the calculations. If we are using textures in our pixel program, then vertex unit must supply their lookup coordinates, or we must calculate them on fly.
Also, there are some musts that should not be
forgot. When using reprogrammed vertex pipeline, we must always supply fragment
pipeline with a clip space vertex position, and it, in turn, must set pixel
color and optionally depth. Each of these operations takes up just a line of
code, but they are vital for program to work.
When first programmable video cards appeared,
biggest problem was that they were difficult to use. API functions weren’t
supposed to support pipeline programming from the beginning, and the first
solutions were difficult to use. Drawback was that the only way you could
reprogram part of the pipeline was by writing program code in video card
specific assembly, and then upload it via API extension calls.
This issue was firstly addressed by NVIDIA founded
group. They created first compiler that would accept C-like programs, and
depending on target hardware, compiled them into matching assembly. It is also
possible to use Cg at program runtime.
Afterwards, it became clear that there should be
easy way to use high level shading language from within API. So Microsoft
implemented their HLSL in DirectX, and OpenGL development team agreed to
include functions, to upload program code to driver, which would be responsible
for compilation to native assembly.
Compilation is being performed, during the
application stage, on the CPU.
As far as we have seen, in order to use programmed
pipeline replacements, all chain must be sustained. You must have working
program, geometry and buffer you want to output to. More generally speaking,
that means you have to write all program from the scratch, keeping in mind how,
and where you'll include programmable paths. Such approach might not always be
acceptable, as often you don't want or don't know how to write graphic programs
using APIs.
We have found, that there are solutions for this
problem. You can download free packages, that offer you to use programmable
graphic via high level shading languages directly from most popular 3D modeling
applications like Maya, 3DMax, as well as packages, that let you experiment
with different shading algorithms. You can find more information on program
home pages at NVIDIA developer site [6][7].
Can and should we always use reprogrammed pipes?
Often API calls do just fine, and if you can live with them do so. According to
information found at www.beyond3d.com
and www.opengl.org discussion boards,
driver replaces fixed function pipeline with equal vertex, fragment programs
for newest hardware anyway, so technically there is no difference.
Also, if you'll take a look at fixed-function
pipeline implementation in Cg language [8], you'll understand that it includes
many things that are not always needed, so often calculations can be optimized
or improved.
Lets take a look at lightning equation implemented
in fixed pipe and a way to improve it. Fixed-function lighting model is very
simple, easy to compute and understand. In order to speed things up it is
performed during geometry stage so light intensity is interpolated for pixels
later on. Such approximation works just fine for well tessellated geometry, but
image quality suffers a lot if we have geometric primitives with large area, or
lights that are very close to object. Easy and simple way to increase lighting
quality could be moving of lighting equation from geometry to rasterizer stage.
But because of the fact lighting is one of the most important aspects of
computer graphics, a lot of research has been done in this field, and there are
way better, methods, than just making light calculations per-pixel.
The first thing you can do is to implement bump
mapping [9] proposed by Jim Blinn at rasterizer stage. This approach simulates
bumpy appearance for surfaces with low tessellation, by supplying normal data
in a texture per pixel, instead of using interpolated normals from geometry
stage. Bump mapping gives way better quality and implementation is quite simple.
However, better approaches based on bump mapping,
which improve lighting calculations, also exist. Two of the most anticipated
are parallax mapping [10] and relief mapping [11] using binary search in pixel
shader[3].
These approaches implement base texture and normal map offsetting depending on
view vector, so improving appearance of bumpiness even more.
We would like to stress, that these approaches
aren't just making rendered image better, they also might help with modeling
tasks a lot. If you have high quality height map, you can use parallax or
relief mapping, and achieve effect that you are drawing geometry consisting of
millions of polygons, in the same time one quad and texture map is all you
need. Moreover, height maps can be drawn easy in every paint program, and
shader you have created once can be reused later on.
But lighting is just a small part what you can do.
It's recommended to visit sites like ShaderTech [12] to learn more, and to take
a look at images achieved real-time on PC via reprogrammed pipeline.
High quality graphic on PC in a reasonable amount of
time is only possible by using one of the graphic APIs. APIs got all basic
functions you might wish during the work with graphic, but because of speed
issues their execution is limited to some extent and they are locked to
pipeline architecture. Then again, because of high demand and hardware
improvements, there is a way to get off the fixed pipeline model, so attaining
more flexibility and opportunities.
After realization of new algorithms that replace
parts of the fixed pipeline it becomes clear, that programmable approach gives
way better image quality and can simplify modeling tasks a lot. There are also
three broadly known shading languages you can use to express your ideas relatively
easy.
For 3D content creators and artists there are free
software packages, which allow concentrating on shading problems, rather than
programming tasks.
Based on the fact that shading language compilers
are now core elements in API and driver development, we believe that compiled
code will get optimized even more with each new revision.
Looking at the side of the hardware development, it
is clear, that shading power is the main target of hardware manufacturers. We
can also expect that branching an pixel discarding capabilities on next
generation of hardware is likely to improve, so allowing us to use even more
sophisticated and demanding effects.
Bibliography
[1] Real-time rendering, second
edition; Tomas Akenini-Möller, Eric Haines; A K Petters – 2002;
[2] OpenGL http://www.opengl.org/
[3] DirectX http://www.microsoft.com/windows/directx/default.aspx
[4] Matrix and quaternion FAQ http://skal.planet-d.net/demo/matrixfaq.htm
[5] Phong lighting model
http://www.delphi3d.net/articles/viewarticle.php?article=phong.htm
[6] Cg Toolkit and documentation
http://developer.nvidia.com/object/cg_toolkit_1_1.html
[7] FX Composer http://developer.nvidia.com/object/fx_composer_home.html
[8] Fixed-function pipeline in Cg
http://developer.nvidia.com/object/cg_fixed_function.html
[9] Bump mapping http://www.tweak3d.net/articles/bumpmapping/
[10] Parallax mapping www.infiscape.com/doc/parallax_mapping.pdf
[11] Relief maping with binary search
www.paralelo.com.br/arquivos/ReliefMapping.pdf
[12] Shadertech http://www.shadertech.com/