Notices

Overclockers Forums > Software > Programming Tips and Tricks
Programming Tips and Tricks
Forum Jump

Wow, a big C/C++ code speedup!!!

Post Reply New Thread Subscribe Search this Thread
 
 
Thread Tools
Old 04-14-05, 05:50 PM Thread Starter   #1
macklin01
Computational Oncologist / Biomathematician / Moderator on Vacation, Ph.D.

 
macklin01's Avatar 

Join Date: Apr 2002
Location: Pasadena, CA

 
Talking Wow, a big C/C++ code speedup!!!


I was messing with some C++ code today using the standard gcc/g++ compiler.

I'm not sure about you all, but I use the function "pow( double, double)" all the time. Particularly useful for scientific computing, I suppose, but also for graphics programming. In particular, I use things like

Radius = pow( x, 2.0) + pow( y , 2.0);

quite a bit.

I found that code I compiled ran very slowly. In particular, one algorithm I have for reinitializing a 150x150 data set in a particular way would take 5-10 minutes on a P4 running with fully optimized code. (-O3, -march=pentium4, etc.)

Well, the other day, I was writing some scratch code, and for the heck of it, I tried things like x*x instead of pow(x,2.0), and the resulting code was ten times faster. I couldn't believe it. Well, today I went back and made a new function:

Code:
double intpow( double base, int exponent )
{
 int i;
 double out = base;
 for( i=1 ; i < exponent ; i++ )
 {
  out *= base;
 }
 return out;
}
and used that anytime I had integer powers. As expected, it sped the costliest function of my computational tumor growth function up from taking 5-10 minutes to 4-5 seconds. It was just amazing.

I'd assume this is because pow(double,double) needs to be a lot more complicated to support things like pow( 3.5, 1.5 ). It leaves me very tempted to compare the performance of sqrt(x) and pow(x,0.5). Fascinating stuff.

Anyway, since exponents are used so frequently in coding, I thought I'd share that with all of you. It's amazing what a difference such a small change can make!

-- Paul

__________________
My heatware (macklin01)

Need image I/O for your science apps? Try EasyBMP

My biomedical research: Mathematical Cancer Modeling & Simulation

I'm on vacation as a moderator as I devote more time to my faculty position.
Thank you for your understanding if I don't respond to your PM. -- Paul
macklin01 is offline Folding Profile Heatware Profile   QUOTE Thanks
Old 04-18-05, 09:45 AM   #2
mccoyn
Senior Member

 
mccoyn's Avatar 

Join Date: Nov 2003
Location: Michigan, USA

 
Good work. You might see a similiar advantage if you make that an inline function. Function calls are very expensive in terms of time. Loops are also somewhat expensive because of how processors are designed. The best possible solution (C++) is to use template recursion, which could use multiple inlining and come up with the best possible code, just a few inline multiplications and no loop. Heres a quick example:

Code:
template <unsigned int exponent>
inline double intpow(double base)
{
  return intpow<exponent-1>(base) * base;
}

template <>
inline double intpow<0>(double base)
{
  return 1;
}

int main()
{
  cout << intpow<12>(1.2345) << " == " << pow(1.2345, 12) << endl;
}
This only works with constant exponents though.
mccoyn is offline   QUOTE Thanks
Old 04-18-05, 11:47 AM   #3
Jarlax
Member

 
Jarlax's Avatar 

Join Date: Aug 2003
Location: Asylum

 
I may be wrong, I am pretty new to C++, but isn't inline like garbage collecting in Java in that it is only a suggestion to the compiler but is not forced? The actual compiler decides (based on some algorithm) if it will actually add the code inline.

Not saying it is a bad suggestion or unwarrented but I wanted to clarify for myself.

__________________

Q6600 2.4 -> 3.0
Asus P5B Deluxe, 2x1G G.Skill PC6400
PC Power & Cooling 680 silencer
eVGA 8800 GTS (640 meg)

2.4c -> 3.3 (275x12) @ 1.55v, Zalman 7000-AlCu
Asus P4P800, 2x512 PC3700 Buffalo (BH-5) [2-2-2-5] @ 3.2v
Thermaltake 480 watt
eVGA 7800GS

Jarlax is offline   QUOTE Thanks
Old 04-18-05, 12:59 PM   #4
mccoyn
Senior Member

 
mccoyn's Avatar 

Join Date: Nov 2003
Location: Michigan, USA

 
Yes, I think you are correct. Most compiliers honor it, I believe. The academic and express versions of MSVS are the only exceptions I know of. Another problem is it often isn't honored if optimizations are turned off, so the code will run very slow during debugging. I guess the best answer requires some empirical testing to decide.

Another problem is that compiliers might not allow too many layers of templates. I think the standard says this should be configurable, but default to at least 32. So my code my not work with something like intpow<33>(x).

Another trick I use is macros. Even less flexible than templates, but they work in C and they are fast.
Code:
#define SQUARE(x) x*x
#define CUBE(x) x*x*x
mccoyn is offline   QUOTE Thanks
Old 04-18-05, 03:15 PM Thread Starter   #5
macklin01
Computational Oncologist / Biomathematician / Moderator on Vacation, Ph.D.

 
macklin01's Avatar 

Join Date: Apr 2002
Location: Pasadena, CA

 
Hmm, thanks for the tips. I may have to give this a try later on. Defining square in this way is certainly the optimal solution for the most commonly used power; I guess that there's always a tradeoff between ultimate flexibility ( pow( double, double ) ) and ultimate speed ( #define square ... )

Would your define macro be faster that a function double square( double )? Thanks -- Paul

__________________
My heatware (macklin01)

Need image I/O for your science apps? Try EasyBMP

My biomedical research: Mathematical Cancer Modeling & Simulation

I'm on vacation as a moderator as I devote more time to my faculty position.
Thank you for your understanding if I don't respond to your PM. -- Paul
macklin01 is offline Folding Profile Heatware Profile   QUOTE Thanks
Old 04-18-05, 03:32 PM   #6
mccoyn
Senior Member

 
mccoyn's Avatar 

Join Date: Nov 2003
Location: Michigan, USA

 
The macro should always be at least as fast as the function, when it has a simple arugment. Its entirly possible that that compilier can make the function as fast as the macro (by inlining it and such,) but it should never get faster.

One gotcha is argument expansion. pow(0.1 / 0.4, 2) and intpow(0.1 / 0.4, 2) will evaluate 0.1/0.4 exactly one time. The macro SQUARE(0.1 / 0.4) will be expanded into (0.1 / 0.4) * (0.1 / 0.4), which will be slower than intpow because it evaluates the divide twice. Any attempt to work around this is hacky at best. Heres how I might do that.

Code:
double fast_square_x;
#define FAST_SQUARE(x) (((fast_square_x=x)&0) + fast_square_x * fast_square_x)
I used a global variable and an embeded assignement. I'm also preying that the compilier optimizer doesn't throw out the assignement with the &0, uses left to right ordering for addition and that this macro isn't used in multiple threads simultaneously.

Inline functions are a much cleaner way to do this sort of thing.
mccoyn is offline   QUOTE Thanks
Old 04-18-05, 04:42 PM   #7
JigPu
Inactive Pokémon Moderator

 
JigPu's Avatar 

Join Date: Jun 2001
Location: Vancouver, WA

10 Year Badge
 
Another tiny optimization you could throw in for the original for loop intpow(x,y) would be to use a decrementing loop like so:

for( i=exponent-1 ; i!=0 ; i-- )

I don't know for sure what gcc would do it, but the assembly for decrementing loops is quicker than incrementing, so it might shave off a tiny fraction of overhead. EDIT: Heck, if you're adventurous and can get ASM compiling to work, just write an assembly language version of intpow(x,y)

JigPu

__________________
.... ASRock Z68 Extreme3 Gen3
.... Intel Core i5 2500 ........................ 4 thread ...... 3300 MHz ......... -0.125 V
2x ASUS GTX 560 Ti ............................... 1 GiB ....... 830 MHz ...... 2004 MHz
.... G.SKILL Sniper Low Voltage ............. 8 GiB ..... 1600 MHz ............ 1.25 V
.... OCZ Vertex 3 ................................. 120 GB ............. nilfs2 ..... Arch Linux
.... Kingwin LZP-550 .............................. 550 W ........ 94% Eff. ....... 80+ Plat
.... Nocuta NH-D14 ................................ 20 dB ..... 0.35 C°/W ................ 7 V


"In order to combat power supply concerns, Nvidia has declared that G80 will be the first graphics card in the world to run entirely off of the souls of dead babies. This will make running the G80 much cheaper for the average end user."
"GeForce 8 Series." Wikipedia, The Free Encyclopedia. 7 Aug 2006, 20:59 UTC. Wikimedia Foundation, Inc. 8 Aug 2006.
JigPu is offline   QUOTE Thanks
Old 05-14-05, 09:09 PM   #8
emboss
Member



Join Date: Nov 2003
Location: Canberra, Down Under

 
This would be faster in many cases:

Code:
double intpow( double base, int exponent )
{
  int i;
  double out = 1, curpwr = base;
  for( ; exponent > 0; exponent = exponent >> 1)
  {
    if ((exponent & 1) > 0)
      base *= curpwr;
    curpwr *= curpwr;
  }
  return out;
}
As a template:
Code:
template <unsigned int exponent>
inline double intpowt(double base)
{
  return intpowt<exponent >> 1>(base*base) * (((exponent & 1) > 0) ? base : 1);
}

template <>
inline double intpowt<0>(double base)
{
  return 1;
}
Or for constant powers if you dislike templates:
Code:
#define intpow5(b,e) \
(((e & 1) > 0) ? b : 1)
#define intpow4(b,e) \
((((e & 1) > 0) ? b : 1) * intpow5((b*b), (e >> 1)))
#define intpow3(b,e) \
((((e & 1) > 0) ? b : 1) * intpow4((b*b), (e >> 1)))
#define intpow2(b,e) \
((((e & 1) > 0) ? b : 1) * intpow3((b*b), (e >> 1)))
#define intpow1(b,e) \
((((e & 1) > 0) ? b : 1) * intpow2((b*b), (e >> 1)))
#define intpow(b,e) \
((((e & 1) > 0) ? b : 1) * intpow1((b*b), (e >> 1)))
Add additional layers for exponents > 32.

If you know M4 you could write a recursive macro version as well

edit: Fixed a few typos ...
emboss is offline   QUOTE Thanks
Old 05-16-05, 08:42 PM   #9
mccoyn
Senior Member

 
mccoyn's Avatar 

Join Date: Nov 2003
Location: Michigan, USA

 
Quote:
Originally Posted by emboss
If you know M4 you could write a recursive macro version as well
Whats M4?
mccoyn is offline   QUOTE Thanks
Old 05-16-05, 10:33 PM   #10
emboss
Member



Join Date: Nov 2003
Location: Canberra, Down Under

 
A macro preprocessor; sorta the C preprocessor on steroids One neat feature is that you're allowed to write recursive macros (with some limitations), so instead of the multiple intpow functions you could just do
Code:
#define intpow(b,e) \
((((e & 1) > 0) ? b : 1) * intpow((b*b), (e >> 1)))
with some syntatic changes to show that you're allowed recursion (I don't know M4 so can't put the right alphabet soup in).
emboss is offline   QUOTE Thanks
Old 05-18-05, 08:59 AM   #11
mccoyn
Senior Member

 
mccoyn's Avatar 

Join Date: Nov 2003
Location: Michigan, USA

 
There is a boost preprocessor library that allows you do iteration and recursion using the usual C++ preprocessor. I suspect that it will work for C just as well since it uses the same preprocessor.

I've never used it and I have no idea how they did it.
mccoyn is offline   QUOTE Thanks
Old 05-18-05, 09:50 AM   #12
Patrick_
Registered



Join Date: May 2005

 
Hey thanks for the tip :-D I'll test it out later.

__________________
My Pride and Joy:

P4 2.4C GHz @ 3.2 GHz (self-built watercooled)
ABIT IS7 i865PE
512MB PC3200 GeIL Value
300GB Maxtor Diamondmax 9 ATA/133 16MB Cache
Albatron NVIDIA GeForce FX 5600 TURBO 128-bit 128MB DDR
LG CD-RW (w00t)
430W Award PSU
400W Something PSU
OS: GNU
Distro: Gentoo
Kernel: 2.6.13-rc6-ck1
Patrick_ is offline   QUOTE Thanks
Old 06-06-05, 01:37 PM   #13
khiloa
Open Source Senior

 
khiloa's Avatar 

Join Date: Jul 2004
Location: /usa/sc/florence

 
Interesting stuff, I have not done any C++ lately, except a few little progs, but I am sure I could use some of these tips before too long.

__________________


Say NO to ads!

Did someone just help you? Thank them.

Along with F@H, please help support some of our other teams.

khiloa is offline   QUOTE Thanks

Post Reply New Thread Subscribe


Overclockers Forums > Software > Programming Tips and Tricks
Programming Tips and Tricks
Forum Jump

Thread Tools Search this Thread
Search this Thread:

Advanced Search


Mobile Skin
All times are GMT -5. The time now is 10:40 AM.
Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
You can add these icons by updating your profile information to include your Heatware ID, Benching Profile ID or your Folding/SETI profile ID. Edit your profile!
X

Welcome to Overclockers.com

Create your username to jump into the discussion!

New members like you have made this the best community on the Internet since 1998!


(4 digit year)

Why Join Us?

  • Share experience
  • Max out your hardware
  • Best forum members anywhere
  • Customized forum experience

Already a member?