Asynchronous computing and parallel algorithms¶

  • Asynchronous programming using std::async and std::future
  • Asynchronous programming using hpx::async and hpx::future
  • Experimental support of the parallel algorithms in GCC and MSVC
  • Full support of parallel algorithms in HPX

Asynchronous programming using std::async and std::future¶

Documentation:

  • std::future
  • std::async

Book:

  • Williams, Anthony.C++ concurrency in action : practical multithreading. Shelter Island, NY:Manning, 2012.ISBN: 9781933988771.
In [1]:
%%writefile example.cpp
#include <future>
#include <iostream>

int add(int a, int b)
{
    return a + b;
}

int main(void)
{
int a = 5;
int b = 5;
std::future<int> result = std::async(add,a,b);

std::cout<< result.get() << std::endl;

return EXIT_SUCCESS;
}
Out[1]:
file written: example.cpp

Compilation¶

We need to add -phtread for the asynchronous execution

In [2]:
%%bash
g++ example.cpp -pthread -o example
Out[2]:

Execution¶

In [3]:
%%bash
./example
Out[3]:
10

Asynchronous programming using hpx::async and hpx::future¶

In [4]:
#include<run_hpx.cpp>
#include <hpx/future.hpp>
#include <iostream>
Out[4]:

In [5]:
int add(int a, int b)
{
    return a + b;
}
Out[5]:

In [6]:
run_hpx([](){

hpx::lcos::future<int> result = hpx::async(add,5,5);

std::cout << result.get() ;

});
10
Out[6]:
(void) @0x7fe5377c8e60

Example Numerical integration¶

For this example the Taylor series for the $\sin(x)$ function is computed. The Taylor series is given by,

$$ \sin(x) \approx = \sum\limits_{n=0}^N (-1)^{n-1} \frac{x^{2n}}{(2n)!}.$$

For the concurrent computation, the interval $[0, N]$ is split in two partitions from $[0, N/2]$ and $[(N/2)+1, N]$, and these are computed asynchronously using hpx::async. Note that each asynchronous function call returns an hpx::future which is needed to synchronize the collection of the partial results.

In [8]:
#include <cmath>

// Define the partial Taylor function
double Taylor(size_t begin, size_t end, size_t n, double x)
{
 double denom = fact(2 * n);
 double res = 0;
 for (size_t i = begin; i != end; ++i)
 {
 res += std::pow(-1, i - 1) * std::pow(x, 2 * n) / denom;
 }
 return res;
}
Out[8]:

In [9]:
run_hpx([](){

// Compute the Taylor series sin(2.0) for 25 iterations
size_t n = 25;

// Launch two concurrent computations of each partial result
hpx::future<double> f1 = hpx::async(taylor, 0, n / 2, n, 2.);
hpx::future<double> f2 = hpx::async(taylor, (n / 2) + 1, n, n, 2.);

// Introduce a barrier to gather the results
double res = f1.get() + f2.get();

// Print the result
std::cout << "Sin(2.) = " << res << std::endl;

});
Sin(2.) = -0.000691055
Out[9]:
(void) @0x7fe5377c8e60

alt text

Experimental support of the parallel algorithms in GCC and MSVC¶

  • C++17 added support for parallel algorithms to the standard library, to help programs take advantage of parallel execution for improved performance.
  • Parallelized versions of 69 algorithms from , and are available
  • Only recently released compilers (gcc >= 9 and MSVC >= 19.14) 1implement these new features and some of them are still experimental

Links:

  • Compiler support
  • Parallelism in C++ 17

Example¶

We want to compute the sum of all elements in some std::vector n

$$ \sum_{i=0}^N r+= n[i] $$

in sequential and parallel using the C++ Standard template library without using any for loop.

Links:

  • std::vector
  • std::iterators
  • std::fill
  • std::accumulate
  • std::reduce

Naive implementation¶

In [10]:
size_t len = 1000000000;
std::vector<int> n = std::vector<int>(len);

for (size_t i = 0; i < n.size(); i++)
n[i] = -1;

int result = 0;

for (size_t i = 0; i < n.size(); i++)

result += n[i];

std::cout << "Result= " << result << std::endl;
Result= -1000000000
Out[10]:
(std::basic_ostream<char, std::char_traits<char> >::__ostream_type &) @0x7fe551ce7500

Implementation using the C++ STL sequential algorithms¶

In [11]:
std::vector<int> n2 = std::vector<int>(len);

std::fill(n2.begin(),n2.end(),-1);

result = std::accumulate(n2.begin(),n2.end(),0.0);

std::cout << "Result= " << result << std::endl;
Result= -1000000000
Out[11]:
(std::basic_ostream<char, std::char_traits<char> >::__ostream_type &) @0x7fe551ce7500

Implementation using the C++ STL parallel algorithms¶

In [12]:
%%writefile parallel.cpp
#include<execution>
#include<iostream>

int main(void){

size_t len = 1000000000;
std::vector<int> n = std::vector<int>(len);

std::fill(n.begin(),n.end(),-1);

int result = 0;

result = std::reduce(std::execution::par,n.begin(), n.end());

std::cout << "Result= " << result << std::endl;

return EXIT_SUCCESS;
}
Out[12]:
file written: parallel.cpp
Compilation¶

The parallelism for the gcc is based-on the Threading Building Blocks library. Therefore, we need to add -ltbb to the compiler. Since these feature are experimental, we need to use the following C++ standard -std=c++1z.

In [13]:
%%bash 
g++ -std=c++1z -ltbb parallel.cpp -o parallel
Out[13]:

In [14]:
%%bash
./parallel
Out[14]:
Result= -1000000000

Execution policies¶

  • std::reduce(std::execution::par,n.begin(), n.end()); - Parallel execution
  • std::reduce(std::execution::seq,n.begin(), n.end()); - Sequential execution
  • std::reduce(std::execution::par_unseq,n.begin(), n.end()); - Parallel execution with vectorization

Documentation

Fore more details: CppCon 2016: Bryce AdelsteinLelbach “The C++17 Parallel Algorithms Library and Beyond”

Full support of parallel algorithms in HPX¶

  • HPX provides the same set of parallel algorithms as the experimental ones in the C++ STL
  • In addition, HPX provides the functionality to receive a future and combined the parallel algorithms with asynchronous computation.
  • HPX provides some more convenient way to iterate over a std::vector
In [15]:
#include<hpx/include/parallel_reduce.hpp>
Out[15]:

In [27]:
run_hpx([](){

std::cout << "Result:"  << hpx::ranges::reduce(hpx::execution::par,
n.begin(),n.end(),0) << std::endl;

std::cout << "Result:"  << hpx::ranges::reduce(hpx::execution::seq,
n.begin(),n.end(),0) << std::endl;

});
Result:-1000000000
Result:-1000000000
Out[27]:
(void) @0x7fe5377c8e60

Combing the parallel algorithms and futurization¶

In [30]:
run_hpx([](){

auto f =
hpx::ranges::reduce(
hpx::execution::par(
hpx::execution::task),
n.begin(),
n.end(),0);

std::cout<< f.get();

});
-1000000000
Out[30]:
(void) @0x7fe5377c8e60

Using parallel algorithms to iterated over a std::vector¶

In [78]:
%%writefile loop.cpp
#include<execution>
#include<iostream>
#include<vector>
#include<numeric>
#include<algorithm>
#include <ctime>
#include <experimental/random>

int main(void)
{

std::vector<int> l = std::vector<int>(10);
std::srand (time(NULL));
std::generate(l.begin(), l.end(), std::rand);

std::vector<int> i = std::vector<int>(10);

std::iota(std::begin(i), std::end(i), 0);

std::for_each(
    std::execution::par,
    i.begin(),
    i.end(),
    [&](auto&& item)
    {
        std::cout << "Element: " << l[item] << " at index: " << item << std::endl;
    });

return EXIT_SUCCESS;
}
Out[78]:
file written: loop.cpp
In [79]:
%%bash
g++ loop.cpp -std=c++1z -ltbb -o loop
Out[79]:

In [80]:
%%bash 
./loop
Out[80]:
Element: 858665547 at index: 0
Element: 44651610 at index: 1
Element: 1758033419 at index: 2
Element: 1001389801 at index: 3
Element: 1841445098 at index: 4
Element: 827817333 at index: 5
Element: 1382917348 at index: 6
Element: 840201672 at index: 7
Element: 98077046 at index: 8
Element: 1061316923 at index: 9

Using range-based HPX loops¶

In [82]:
#include<hpx/include/parallel_for_loop.hpp>
Out[82]:

In [93]:
std::vector<int> l = std::vector<int>(10);
std::srand (time(NULL));
std::generate(l.begin(), l.end(), std::rand);
Out[93]:
(void) nullptr
In [97]:
run_hpx([](){

hpx::for_loop(
hpx::execution::par,
0,
l.size(),
[](boost::uint64_t i)
{
 std::cout << "Element: " << l[i] << " at index: " << i << std::endl;
}
);

});
Element: Element: Element: 336799978 at index: 1538852047 at index: Element: 1335964673 at index: 01831084038 at index: 3

2
Element: 1790697844 at index: 1
4
Element: 2083382621 at index: 7
Element: 1512287832 at index: 8
Element: 1623962572 at index: 5
Element: 1685298515 at index: 9
Element: 612437479 at index: 6
Out[97]:
(void) @0x7fe5377c8e60

alt text