dxsdk3/docs/web/d3dramp.htm

49 lines
6.5 KiB
HTML
Raw Normal View History

<html>
<head>
<title>Running fast with the D3D Ramp Driver</title>
</head>
<body bgcolor=#000000 text=#FFFFFF link=#FF0000 vlink=#FFFF00>
<basefont face="Times New Roman" size=3>
<TABLE CELLPADDING=0 CELLSPACING=0 BORDER=0>
<BLOCKQUOTE>
<FONT FACE="arial" SIZE="2">
<!--DocHeaderEnd-->
<!-- This is a PANDA Generated HTML file. The source is a WinWord Document. -->
<h2>Running fast with the D3D Ramp Driver</h2>
<P>The DirectX Team
<P>September 5, 1996
<P><h4>Introduction</H4>
<P>There are a number of ways to get maximum performance from the ramp driver. This document details the two most useful texturing modes,<i> Copy Mode and Shaded Ramp Mode</i>. This document covers rasterization performance tips for these modes. When used correctly, the ramp driver should be able to run faster than some hardware alternatives.
<P><h4>Copy Mode</H4>
<P>Copy Mode is the simplest form of rasterization and hence the fastest. When copy mode rasterization is used, no lighting or shading is performed on the texture. The bytes from the texture is copied directly to the screen and mapped onto the polygon using the texture co-ordinates in the vertex. Hence, when using copy mode, the texture format must match the format of the screen. For this reason copy mode is only really useful in 8 bit colour as in 16 bit colour the texture size is doubled, this makes copy mode slower than other rendering modes in 16 bit colour.
<P>Copymode only implements two rasterization options, Z buffering and chroma key transparency. The fastest mode is to just map with no transparency and no Z. Enabling chroma key accelerates the rasterization of invisible pixels as just the texture read is performed, but visible pixels will incur a slight performance hit due to the chroma test.
<P>Enabling Zbuffering incurs the biggest hit for 8 bit copy mode as now a 16 bit value has to be read and conditionally written per pixel. Z buffering copymode can only be faster than non Z copy mode when the average overdraw goes over 2, and the scene is rendered in front to back polygon order.
<P>If your scene has overdraw less than 2 (which is very likely) you should ignore using the Z buffer. The only exception to this rule is if the scene complexity is very high. For example, if you have more than about 1500 rendered polys in the scene the sort overhead begins to get high and it may be worth considering a Z buffer again.
<P>Direct3D will go fastest when all it needs to do is draw one long triangle instruction. Render state changes just get in the way of this, the longer the average triangle instruction the better triangle throughput. Hence peak sorting performance can be achieved when all the texture for a given scene is contained in only one texture map or texture page. This adds the restriction that no texture co-ordinate can be larger than 1.0 but has the performance benefit of no texture state changes at all!
<P>So for normal simple scenes use one texture, one material and sort your triangles, only use Z when the scene gets complex.
<P><h4>A Few Further Tips On Sorting</H4>
<P>If you are sorting, you need to implement your own transform and clipping as Direct3D only accepts unclipped TLVertices (until DX4). In this scenario you will almost certainly be building execute buffers on the fly. One of the simplest ways of doing this is to put all the instructions in the front of the buffer and all the vertices at the end. Say your whole buffer is big enough to contain 1024 vertices, the first triangle generated would be placed at the front of the buffer indexing vertex 1022,1023 and 1024 at the end of the buffer. This scheme means that no buffer space is wasted and there is no dependency on maximum buffer size, as when the instructions growing from the bottom up meet the vertices growing from the top down the buffer is dispatched and a new one started.
<P><h4>Introducing Lighting</H4>
<P>When lighting is used we will almost certainly want to Gouraud shade. Ramp mode Gouraud shading is about 15-20% faster in 8 bit than in 16 bit colour due the decreased memory bandwidth. When lighting, we are using more CPU for rasterization than copy mode, it is not so crucial (but still advisable) to avoid state changes. If you have a few large background polygons in the scene you should try and flat shade them as this is 20-30% faster than gouraud for large polygons.
<P>We now need to worry about materials, in a textured scene you should only need two materials and may even be able to get away with one. Materials allow modulation of the texture colour, this operation should be left to RGB drivers (mainly MMX and HW) in ramp it is best just to stick with intensity modulation.ff
The Ramp driver is named after its use of ramps (colour lookup tables) A d3d material contains a set of colour values and a texture handle. For each colour in the textures palette the ramp driver builds a lookup table of intensities (ramp). The size of the ramps are determined by the shades field in the d3d material. So a texture with a palette of 32 colours in a material with shades set to 16 will generate 32 16 entry ramps. The more ramps there are in a scene the slower the performance due to the worse use of the CPU cache. It is best to use one material setting for all textures and the same palette of 64 or preferably less colours for all textures in the scene. If two textures use the same material any colours that are the same between the two texture palettes will share the same ramp.
<P>As for copy mode Z buffering does incur a performance hit for low overdraw scenes and the same technique of tiling as many textures as possible in to larger texture pages is still beneficial, all the copy mode sorting tips still apply when using shading.
<P><h4>System or Video Memory</H4>
<P>Where should the back buffer go? This can have a large affect on performance but can vary wildly depending on the application and the HW configuration. On faster CPU's, both the above rendering modes will tend to render faster to system memory than video memory as they are helped by the CPU cache. This performance boost is offset by the fact the app can no longer use page flipping and instead must blit from the system memory back buffer to a video memory back buffer. The cost of this blit will often be the decider in the performance. This is especially true of the MMX driver which renders MUCH faster to system memory than video. The only way to really know which is to run and time the app for a few frames in each configuration. All the D3D samples have a system memory option to enable this.
</body>
</html>