Hey! Welcome to the world of programming!
My first language was C++ also, but these days, I program mostly in C.
So, I will start off by saying that I am not an expert at C++, especially with regards to the new standards, so my comments will be mostly language independent.
First thing, making code that runs fast can be dependent on your algorithm itself, the hardware executing instructions, or both.
For example, if you are writing a program to look up a phone number basee off of a last name, you have several options.
We will first focus on algorithm efficiency. The easiest thing to do, would be to have a linear array holding pairs of names and phone numbers, and you would start from the first entry and traverse the array one element at a time until you find the name you want, then return the corresponding phone number.
This kind of lookup is said to have a linear run time complexity, meaning the number of operations increase linearly with the number of elements in the array.
While this solution works, if you have 10 million entries, you would have to traverse the entire 10 million elements to find out of an entry exists or not.
A better solution would be to first ensure that the array elements are sorted by name (this can be expensive but if your list never changes, you will have to sort the names only once, read about sorting algorithms if you haven't yet)
then you can do lookup using what's called a binary search, then your lookup across 10 million elements is guaranteed to never exceed 24 look up operations, as a binary search has what is called logarithmic run time complexity meaning that the number of operations is proportional to log base 2 of the number of elements in the list.
This 'trick' just decrease the lookup operations from 10 million worst case, down to 24.
The other part of writing fast code is understanding the impact that your code will have on the hardware. This is a subject that shows up a lot in embedded systems were people program a lot in C, because they don't want to have to guess to hard about what the compiler is going to do to their code.
A prolific example would be writing code in a way that should make it easy for the cache controller to keep the right data in the cache. Referring back to the old example, the linear array access had very good cache performance because the next needed piece of data was always the element at the next memory address, meaning that there would be few instances were the processor would try and fetch from the cache and find out that the data was not cached, meaning the processor would have to stall while data was loaded from the RAM and into the cache, or even worse, loaded from the disk, then the RAM, and then the cache.
The binary search would have worse cache performance because it's memory access pattern is harder to predict, because you are accessing elements in a manner similar to a tree traversal. (Lots of jumping around in memory making it hard for the cache to "guess" what to load next).
These are just two examples here, and I tried to make them language agnostic. These kinds of ideas are learne from a computer architecture book, and a data structure and algorithms book.
If you are doing things the C++ way, it means knowing the language well enough and the STL too, so that you can pick the right data structure or algorithm to optimize performance.