Hello World¶

Note

This is very much a work in progress! We’ll be publishing this as we update this periodically so please bear with us as well get the content ready. Feel free to check out the posted content in the meantime anyway!

We also very clearly need a better introduction.

As we go up in the levels of abstraction, it becomes easier to forget to peek under the covers sometimes. This book hopes to serve as a reference for what things look like when you do try and peek under the cover. We’ll talk about high level algorithms, but we’ll focus on how they look under a microscope. Everyone talks about the Big O notation and everyone uses that as a guideline for determining what is faster. The territory we will enter, Big O can’t help us. It might even lead us astray. We will show you cases where `O(n)` can be faster than `O(1)`.

How about we try an example¶

Lets say we have a 2-D matrix (5x5) that looks something like this:

```----------------
| 1  2  3  4  5|
| 6  7  8  9 10|
|11 12 13 14 15|
|16 17 18 19 20|
|21 22 23 24 25|
----------------
```

and we want to add all the values in it. We could do it manually by doing:

```1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 = ???
```

That’s a lot of mental math to do, so lets just write a program instead!

```#include <iostream>
#include <vector>
#include <chrono>
#include <cstdint>
```
```auto by_row_column(const std::vector<std::vector<uint32_t>>& matrix)
{
// We start by creating a variable to hold our sum (don't forget to initialize it to 0!)
uint32_t sum = 0;

// For simplicity, we will pretend we will always be working with a square matrix here
const std::size_t n = matrix.size();

for (std::size_t row = 0; row < n; ++row)
for (std::size_t column = 0; column < n; ++column)
sum += matrix[row][column];

return sum;
}
```
```auto by_column_row(const std::vector<std::vector<uint32_t>>& matrix)
{
uint32_t sum = 0;

const std::size_t n = matrix.size();

for (std::size_t row = 0; row < n; ++row)
for (std::size_t column = 0; column < n; ++column)
sum += matrix[column][row];

return sum;
}
```
```// Lets start by popualating our example matrix:
const std::vector<std::vector<uint32_t>> small_matrix{
{ 1,  2,  3,  4,  5},
{ 6,  7,  8,  9, 10},
{11, 12, 13, 14, 15},
{16, 17, 18, 19, 20},
{21, 22, 23, 24, 25}
};
```
```// Lets see what the output of our function looks like!
std::cout << by_row_column(small_matrix) << std::endl;
```
```325
```
```// Lets make sure that both functions return the same value:
std::cout << by_column_row(small_matrix) << std::endl;
```
```325
```

Great! No funny business here so far. Everything looks the same. The only line that’s different between the two is:

```            sum += matrix[row][column];
```

and

```            sum += matrix[column][row];
```

But is everything actually the same? Let’s see what happens when we try and benchmark the two! If they are really the same, we’d expect it to take about the same time to run, right?

```auto to_nanos = [](const auto start, const auto stop) { return std::chrono::duration<double, std::milli>(stop - start).count(); };

{
auto start = std::chrono::high_resolution_clock::now();
auto sum = by_row_column(small_matrix);

auto end = std::chrono::high_resolution_clock::now();
std::cout << "Took " << to_nanos(start, end) << " millis to run the row, column version. Calculated sum: " << sum << std::endl;
}

{
auto start = std::chrono::high_resolution_clock::now();
auto sum = by_column_row(small_matrix);

auto end = std::chrono::high_resolution_clock::now();
std::cout << "Took " << to_nanos(start, end) << " millis to run the column, row version. Calculated sum: " << sum << std::endl;
}
```
```Took 0.001546 millis to run the row, column version. Calculated sum: 325
```
```Took 0.000777 millis to run the column, row version. Calculated sum: 325
```

They look pretty close to each other. This difference might just be due to computer jitter. Let’s just try with a larger input just to make sure:

```// Lets create a 10,000x10,000 instead. We aren't crazy enough to hand write a matrix that large, so lets use two for loops to create it for us.
std::vector<std::vector<uint32_t>> large_matrix;
for (std::size_t i = 0; i < 10000; ++i)
{
large_matrix.push_back(std::vector<uint32_t>{});
for (std::size_t j = 0; j < 10000; ++j)
{
large_matrix[i].push_back(i*j);
}
}

auto to_nanos = [](const auto start, const auto stop) { return std::chrono::duration<double, std::milli>(stop - start).count(); };

// Back to our benchmarks now!
{
auto start = std::chrono::high_resolution_clock::now();
auto sum = by_row_column(large_matrix);

auto end = std::chrono::high_resolution_clock::now();
std::cout << "Took " << to_nanos(start, end) << " millis to run the row, column version. Calculated sum: " << sum << std::endl;
}

{
auto start = std::chrono::high_resolution_clock::now();
auto sum = by_column_row(large_matrix);

auto end = std::chrono::high_resolution_clock::now();
std::cout << "Took " << to_nanos(start, end) << " millis to run the column, row version. Calculated sum: " << sum << std::endl;
}
```
```Took 344.453 millis to run the row, column version. Calculated sum: 857419840
```
```Took 983.367 millis to run the column, row version. Calculated sum: 857419840
```

What?? How is one 10x slower than the other when all we did was change one line. One line!

If we did the algorithmic runtime analysis like one would in an introductory course in computer science, we’d say that both of them have the same algorithmic complexity of

```O(n^2) for an n by n matrix
```

So shouldn’t they have been the same? There isn’t even a special constant here that we are discarding.

If this sparks your curiosity, we hope you are excited for the material to come. We will spend the rest of this book talking about different situations like these, what causes them, and most importantly, why.

How to approach this book¶

We encourage you, as a reader, to focus on the how and why and less on the what. Don’t worry too much about the syntax or the language specific details. You can always look these up if you know what you are looking for. Focus on the concepts and understanding how things are designed. As these ideas start to become more and more intuitive, you should be able to take most of what you learn in here into other languages as well. The ideas we discuss are not just limited to C++. It’s just easy to pick a language and show you what can be done and how. We also encourage you to try out the examples we discuss. Check out the examples on gcc.godbolt.org or run them locally. Reading will only get you so far in the process of learning. Maybe you’ll discover something interesting that we don’t cover in this book!

Expectations for this book¶

What we will not cover¶

Before we talk about what this book will cover, it’s important that we set expectations about what will not be discussed in this book. This will not be a good reference as an introduction to the topics of programming or the C++ language. We expect a baseline understanding of what C++ is, and what a `class` is, what functions are, basic data types in C++ like `int`, `float`, and `std::string`. Below are a list of wonderful references that should help in getting up to speed on the basics.

1. The C++ Programming Language by Bjarne Stroustrup

2. Cpp Reference

• This is a wonderful resource for brushing up on things in the C++ standard library. We use this almost everyday to look up details and nuances.

We will not spend any time discussing coding styles or discussing programming language in the abstract.

So what is this book about then?¶

The primary objective of this book will be to show you how they can write performant code. We will try and deconstruct topics and ideas that will help in building a mental model for accomplishing this goal. We’ll expose you to tips and tricks and do our best to show you visually and help you understand the why. We will try and dive into assembly code when it makes sense. We’ll show you how to compute things at compile time so that you don’t have to calculate it at runtime. We will try and show what happens in related systems such as caches, how you can take advantage of it, and more importantly, why. The pre-requisites for reading this book are not too high and we will do our best to be a bit more verbose than normal so that there are multiple ways of thinking about the same idea.

When it comes to defining different terms, we will start by being generally vague about it. We’ll then walk through examples, further narrowing down the scope of the term and it’s meaning. It’s not really exciting to be hit by walls of definitions before you even get a chance to develop a feel for what something is. We find that it’s better to ignore a few exceptions early on so that it is easier to understand than to focus on all edge cases and fail to properly grasp the topic at hand.

Throughout the process, we will try and make it as easy as possible for you to interact with the code. Make it easy for you to run it locally, or online. Make it easy to reason about and make it so that you don’t have to trust us on anything. You can just try it out yourself!