블로그 이미지
fiadot_old

칼퇴근을 위한 게임 서버 개발 방법론에 대한 심도있는 고찰 및 성찰을 위한 블로그!

Rss feed Tistory
DSP 2005. 4. 1. 16:23

Optimization

Author Topic: Optimization
texture
KVRian- profile
- pm
Posted: Thu Jan 01, 2004 6:09 am reply with quote

I could do with learning a bit more about optimization techniques for C/C++. In particular for AMD or Intel.

Does anyone know of any decent books / resources?
Joined: 26 Mar 2003 Posts: 640 Location: Hassocks, England
benski
KVRer- profile
- pm
Posted: Thu Jan 01, 2004 8:18 am reply with quote

texture,

Don't know of any books. It is a bit of a "gray art" these days and a lot of the older techniques will actually slow down a modern processor. Your best bet may be to check the music-dsp mailing list archives for information.

I'll post some specific examples in a later reply, but just to get things started, here are the basic of AMD/Intel optimization.

Vector Optimization:
SSE lets you load 16 bytes of memory, representing 4 floats, into one of 8 giant 128bit "multimedia" registers. You can then do most floating point operations using it with other 128bit mm registers. It basically does 4 floating point operations in the time it normally takes to do one. You can them "stream" the register back to another place in memory. Very useful for volume envelopes, FIR filters, and mixing.

Cache Control:
This is the real make-or-break optimization. L1 cache is very fast (can be accessed instantly). L2 is also fast, but takes a few clock cycles to access. And anything in RAM must be moved into cache before use, and can take on the order of 100 clock cycles.
In a multiprocess, mutlithreaded environment, you can't have any guarantee as to what will be in cache (a context switch will flush your cache).
In C/C++ you can't control the cache too well, but you can be mindful of how it works. Some of the old optimization techniques, like look-up tables, and decrementing loop variables to cut out one instruction in the loop, no longer apply because of cache issues. (depending on the calculation, a look-up table can be slower to access than the actual calculation - and worse, it may bump something important out of cache. incrementing the loop variable is faster because the processor pulls memory into the cache chunks at a time, so if you've pulled in MyArray[0], it is likely that MyArray[1] through MyArray[15] also got pulled in - the guarantee doesn't go the other way).

Float-to-int conversions:
The default way that this is done causes a "stall" on the floating point processor, so no other work can get done while this is calculating. There is a short assembly routine (I believe it is available somewhere at www.musicdsp.org) that does it much more quickly.

Denormalization:
When floating point numbers get very small, the CPU goes into a special mode in order to maintain precision. This is a very, very slow mode =) If you've ever hit "stop" in your host, and seen the CPU meter hit 100%, this is the reason. The change-over occurs roughly at -300dB. There are various techniques for overcoming this problem, such as adding in noise or a slight DC offset (-300dB is waaaay below human hearing).

Benchmarking:
You should always measure any change you made. Because of cache issues, a speed-up in one section of code may well slow down the section following it. There is an x86 instruction to get the clock cycle count (you run it before and after a block of code and calculate the difference)

-benski
Joined: 20 Nov 2003 Posts: 19 Location: Pittsburgh, PA
texture
KVRian- profile
- pm
Posted: Thu Jan 01, 2004 8:37 am reply with quote

Thanks very much


I've been having a look around and have found the following if anyone is interested...


"Programming SIMD computers is no easy task, but here you will find all the information you need to build fast applications that fully exploit the power of MMX and SSE instructions on the current microarchitectures. "

http://www.tommesani.com/Docs.html
Joined: 26 Mar 2003 Posts: 640 Location: Hassocks, England
benski
KVRer- profile
- pm
Posted: Thu Jan 01, 2004 11:39 am reply with quote

Looks like a good site! Thanks


출처 : http://www.kvraudio.com/forum/viewtopic.php?t=32922
,
TOTAL TODAY