Learning performance tricks and c++11 > features.

kamran · Jun 18, 2016

Hi, So I've learned c++ from cplusplus.com and tutorialspoint.com and the other things by experience or from other places (while learning OpenGL qt and other stuff for example)

My problem is I don't know many performance related stuff. for example I only recently learned that using a reference in the function reduces the time the function takes, specially while in a loop!
Or I don't know anything about the lambda expressions. my firend gave me a 900 page book "C++ Primer (5th Edition) by Lippmann" And I find reading the book a torture, I already know half the stuff, it's all basics, every 70 pages I learn something little like the decltype(), isn't there any better way to learn these? Like a book only related to these with no basics. or a website:?

JrClocker · Jun 18, 2016

Any reason for C++ and not C#?

When you say performance...what exactly do you mean? The code you write gets compiled into an intermediate step, and runs directly from the CLR.

If you are on a 64-bit machine, only use double precision math, as single precision is slower (the native CLR is double precision, so it has to convert back and forth to single precision).

It would be easier to help with optimizations if I knew a context.

For example, if you are doing large loops, .NET has a "parallel.for" and a "parallel.foreach" that will run your loops in a multithreaded fashion.

kamran · Jun 18, 2016

JrClocker said:
Any reason for C++ and not C#?

When you say performance...what exactly do you mean? The code you write gets compiled into an intermediate step, and runs directly from the CLR.

If you are on a 64-bit machine, only use double precision math, as single precision is slower (the native CLR is double precision, so it has to convert back and forth to single precision).

It would be easier to help with optimizations if I knew a context.

For example, if you are doing large loops, .NET has a "parallel.for" and a "parallel.foreach" that will run your loops in a multi-threaded fashion.

well AFAIK c# is windows only
and I just prefer c++

Don't know why, I don't know much about c# (only bare basics) C++ is my first language
well I might have a algorithm that needs run faster! for example I also know that using Vector is faster then arrays! And better in total! I'm just talking about things to know to use when I need my code to run faster, And also, isn't it always good to have a fast code ^_^? who knows maybe one day I might end up programming for games and in that area afaik, performance is SUPER super important! Or I might even do scientific programming

JrClocker · Jun 18, 2016

I started with assembly, then BASIC, then C, then Visual Basic, then C++, then Java, then C#, then ...

I don't really like programming in C++...too cryptic. Plus, header files are just stupid!

But it all boils down to what you want to use or have to use.

Speed is important (i.e. Efficiency) but so is readability. There are many ways to write the program...all else being equal, choose readability in case you come back to it a few years from now...or someone else has to maintain your code.

Comments are a must.

Use a prefix for class member variables so that they are easy to distinguish from method variables (I use m_ as the prefix).

Avoid "magic numbers" in your code...make a constant instead. When I bane constants, I use all caps for the name to make it easier to read.

Avoid having multiple class definitions in the same file...easier to read!

Good luck! :thup:

JrClocker · Jun 18, 2016

Oh - check out lists...they are efficient.

If you are going to have multiple dimension arrays, this[][] is much faster than this[,]. You just have to remember to initialize the second array group [x][].

jlangfo5 · Jun 19, 2016

Hey! Welcome to the world of programming!

My first language was C++ also, but these days, I program mostly in C.

So, I will start off by saying that I am not an expert at C++, especially with regards to the new standards, so my comments will be mostly language independent.

First thing, making code that runs fast can be dependent on your algorithm itself, the hardware executing instructions, or both.

For example, if you are writing a program to look up a phone number basee off of a last name, you have several options.

We will first focus on algorithm efficiency. The easiest thing to do, would be to have a linear array holding pairs of names and phone numbers, and you would start from the first entry and traverse the array one element at a time until you find the name you want, then return the corresponding phone number.

This kind of lookup is said to have a linear run time complexity, meaning the number of operations increase linearly with the number of elements in the array.

While this solution works, if you have 10 million entries, you would have to traverse the entire 10 million elements to find out of an entry exists or not.

A better solution would be to first ensure that the array elements are sorted by name (this can be expensive but if your list never changes, you will have to sort the names only once, read about sorting algorithms if you haven't yet)
then you can do lookup using what's called a binary search, then your lookup across 10 million elements is guaranteed to never exceed 24 look up operations, as a binary search has what is called logarithmic run time complexity meaning that the number of operations is proportional to log base 2 of the number of elements in the list.

This 'trick' just decrease the lookup operations from 10 million worst case, down to 24.

The other part of writing fast code is understanding the impact that your code will have on the hardware. This is a subject that shows up a lot in embedded systems were people program a lot in C, because they don't want to have to guess to hard about what the compiler is going to do to their code.

A prolific example would be writing code in a way that should make it easy for the cache controller to keep the right data in the cache. Referring back to the old example, the linear array access had very good cache performance because the next needed piece of data was always the element at the next memory address, meaning that there would be few instances were the processor would try and fetch from the cache and find out that the data was not cached, meaning the processor would have to stall while data was loaded from the RAM and into the cache, or even worse, loaded from the disk, then the RAM, and then the cache.

The binary search would have worse cache performance because it's memory access pattern is harder to predict, because you are accessing elements in a manner similar to a tree traversal. (Lots of jumping around in memory making it hard for the cache to "guess" what to load next).

These are just two examples here, and I tried to make them language agnostic. These kinds of ideas are learne from a computer architecture book, and a data structure and algorithms book.

If you are doing things the C++ way, it means knowing the language well enough and the STL too, so that you can pick the right data structure or algorithm to optimize performance.

kamran · Jun 20, 2016

jlangfo5 said:
Hey! Welcome to the world of programming!

My first language was C++ also, but these days, I program mostly in C.

So, I will start off by saying that I am not an expert at C++, especially with regards to the new standards, so my comments will be mostly language independent.

First thing, making code that runs fast can be dependent on your algorithm itself, the hardware executing instructions, or both.

For example, if you are writing a program to look up a phone number basee off of a last name, you have several options.

We will first focus on algorithm efficiency. The easiest thing to do, would be to have a linear array holding pairs of names and phone numbers, and you would start from the first entry and traverse the array one element at a time until you find the name you want, then return the corresponding phone number.

This kind of lookup is said to have a linear run time complexity, meaning the number of operations increase linearly with the number of elements in the array.

While this solution works, if you have 10 million entries, you would have to traverse the entire 10 million elements to find out of an entry exists or not.

A better solution would be to first ensure that the array elements are sorted by name (this can be expensive but if your list never changes, you will have to sort the names only once, read about sorting algorithms if you haven't yet)
then you can do lookup using what's called a binary search, then your lookup across 10 million elements is guaranteed to never exceed 24 look up operations, as a binary search has what is called logarithmic run time complexity meaning that the number of operations is proportional to log base 2 of the number of elements in the list.

This 'trick' just decrease the lookup operations from 10 million worst case, down to 24.

The other part of writing fast code is understanding the impact that your code will have on the hardware. This is a subject that shows up a lot in embedded systems were people program a lot in C, because they don't want to have to guess to hard about what the compiler is going to do to their code.

A prolific example would be writing code in a way that should make it easy for the cache controller to keep the right data in the cache. Referring back to the old example, the linear array access had very good cache performance because the next needed piece of data was always the element at the next memory address, meaning that there would be few instances were the processor would try and fetch from the cache and find out that the data was not cached, meaning the processor would have to stall while data was loaded from the RAM and into the cache, or even worse, loaded from the disk, then the RAM, and then the cache.

The binary search would have worse cache performance because it's memory access pattern is harder to predict, because you are accessing elements in a manner similar to a tree traversal. (Lots of jumping around in memory making it hard for the cache to "guess" what to load next).

These are just two examples here, and I tried to make them language agnostic. These kinds of ideas are learne from a computer architecture book, and a data structure and algorithms book.

If you are doing things the C++ way, it means knowing the language well enough and the STL too, so that you can pick the right data structure or algorithm to optimize performance.

I have followed an amazing course on corusra about algorithms, It covered these things

what I am more worried about is the hardware part. Cache stuff and things like that, sorry for late reply I was busy

The other part of writing fast code is understanding the impact that your code will have on the hardware. This is a subject that shows up a lot in embedded systems were people program a lot in C, because they don't want to have to guess to hard about what the compiler is going to do to their code.

exactly this is exactly what I am talking about

I guess continuing the course will be helpful then :?

jlangfo5 · Jun 20, 2016

kamran said:
I have followed an amazing course on corusra about algorithms, It covered these things

what I am more worried about is the hardware part. Cache stuff and things like that, sorry for late reply I was busy

exactly this is exactly what I am talking about

I guess continuing the course will be helpful then :?

Just glancing at it, their computer architecture course seems like it would cover the backgroundmaterial that should give insight in how to write code that is conscious about the hardware.

https://www.coursera.org/course/comparch

Don't be surprised if it does not actually teach you "Yes! And this is why we always prefer to put the if(TRUE == x) outside of the loop" kind of knowledge. But, it should teach you what the processor has to do in order to run your program, and how the cache behaves. it's normal for similar courses to have you write some assembly by hand, and then go clock cycle by clock cycle through what's going on inside of the CPU pipeline in order to execute each instruction. It can be tedious, but it's fascinating, and if you grasp the concepts well, you should be able to use some imagination to apply the concepts efficient programming practices. It tends to be one of those courses which removes some of the magic from the computer.

It does look like it recommends some previous practice in digital logic and some other subjects.

JrClocker · Jun 21, 2016

Performance of an algorithm is VERY hardware specific...what will work well on a Intel ix machine may not work well/efficient on an embedded platform.

When I am optimizing algorithms, I look at the code that the compiler generates...and see if it is doing anything stupid.

In general, a few that come to mind are:

- Any sort of "jump" is inefficient
---> Processors run at their highest efficiency when they can load up their instruction queue.
---> Any sort of "jump" causes them to have to flush the instruction queue to redirect to the new code
---> Modern processors do well at "predictive branching", but you still pay a performance hit
---> You can look at your code to see how it is compiled for statements like: (1) if-then-else; (2) for(); (3) switch; (4) Do loops; etc.
---> A switch statement may compile with less jumps than an if-then-else...or a for() loop may compile with less jumps than a Do loop.

- For algorithm efficiency, minimize the use of "global variables"
---> If you keep your variables within a subroutine, the compiler will often use the processor registers for the variables...this is the absolute fastest

- When passing data into a subroutine, pass by pointer or "reference"
---> If you don't do this, the compiler "may" create a local copy of the variable on the system stack
---> It takes time to make the copy and push the stack on subroutine entry, and pop the stack on subroutine exit

- There are compiler switches you can use to force register usage, then cache usage, etc. to maximize speed
---> Try to minimize the cache usages in a multi-tasking environment...as the task switch will most likely flush the cache...the performance gain won't be as much as you think.
---> Register variables are the absolute fastest

kamran · Jun 22, 2016

JrClocker said:
Performance of an algorithm is VERY hardware specific...what will work well on a Intel ix machine may not work well/efficient on an embedded platform.

When I am optimizing algorithms, I look at the code that the compiler generates...and see if it is doing anything stupid.

In general, a few that come to mind are:

- Any sort of "jump" is inefficient
---> Processors run at their highest efficiency when they can load up their instruction queue.
---> Any sort of "jump" causes them to have to flush the instruction queue to redirect to the new code
---> Modern processors do well at "predictive branching", but you still pay a performance hit
---> You can look at your code to see how it is compiled for statements like: (1) if-then-else; (2) for(); (3) switch; (4) Do loops; etc.
---> A switch statement may compile with less jumps than an if-then-else...or a for() loop may compile with less jumps than a Do loop.

- For algorithm efficiency, minimize the use of "global variables"
---> If you keep your variables within a subroutine, the compiler will often use the processor registers for the variables...this is the absolute fastest

- When passing data into a subroutine, pass by pointer or "reference"
---> If you don't do this, the compiler "may" create a local copy of the variable on the system stack
---> It takes time to make the copy and push the stack on subroutine entry, and pop the stack on subroutine exit

- There are compiler switches you can use to force register usage, then cache usage, etc. to maximize speed
---> Try to minimize the cache usages in a multi-tasking environment...as the task switch will most likely flush the cache...the performance gain won't be as much as you think.
---> Register variables are the absolute fastest

Thanks a lot for your answers

JrClocker · Jun 22, 2016

No problem!

A lot depends on what you are trying to do, and the platform you are coding on.

What platform? Windows?

Learning performance tricks and c++11 > features.

kamran

Member

JrClocker

AKA: JrMiyagi

kamran

Member

JrClocker

AKA: JrMiyagi

JrClocker

AKA: JrMiyagi

jlangfo5

New Member

kamran

Member

jlangfo5

New Member

JrClocker

AKA: JrMiyagi

kamran

Member

JrClocker

AKA: JrMiyagi

Similar threads