Chapter 20: Multi Threading

The 98 C++ standard did not acknowledge the existence of multi-threading. Between then and the release of the current C++ standard computers have evolved to multi-core machines, and using multi-threading by now is a real option to consider when developing software.

Multi-threading is an extensive and complex subject, and many good reference texts on the subject exist. The C++ multi-threading is built upon the facilities offered by the pthreads library (cf. Nichols, B, et al.'s Pthreads Programming, O'Reilly ). However, in line with C++'s current-day philosophy the multi-threading implementation offered by the language offers a high level interface to multi-threading, and using the raw pthread building blocks is hardly ever necessary (cf. Williams, A. (2012): C++ Concurrency in action, Manning).

This chapter covers the facilities for multi-threading as supported by C++. Although the coverage aims at providing the tools and examples allowing you to create your own multi-threaded programs, coverage necessarily is far from complete. The topic of multi threading is too extensive for that. The mentioned reference texts provide a good starting point for any further study of multi threading.

A thread of execution (commonly abbreviated to a thread) is a single flow of control within a program. It differs from a separately executed program, as created by the fork(1) system call in the sense that threads all run inside one program, while fork(1) creates independent copies of a running program. Multi-threading means that multiple tasks are being executed in parallel inside one program, and no assumptions can be made as to which thread is running first or last, or at what moment in time. Especially when the number of threads does not exceed the number of cores, each thread may be active at the same time. If the number of threads exceed the number of cores, the operating system will resort to task switching, offering each thread time slices in which it can perform its tasks. Task switching takes time, and the law of diminishing returns applies here as well: if the number of threads greatly exceeds the number of available cores (also called overpopulation), then the overhead incurred may exceed the benefit of being able to run multiple tasks in parallel.

Since all threads are running inside one single program, all threads share the program's data and code. When the same data are accessed by multiple threads, and at least one of the threads is modifying these data, access must be synchronized to avoid that threads read data while these data are being modified by other threads, and to avoid that multiple threads modify the same data at the same time.

So how do we run a multi-threaded program in C++? Let's look at hello world, the multi-threaded way:

     1: #include <iostream>
     2: #include <thread>
     3:
     4: void hello()
     5: {
     6:     std::cout << "hello world!\n";
     7: }
     8:
     9: int main()
    10: {
    11:     std::thread hi(hello);
    12:     hi.join();
    13: }

At line 2 the header thread is included, informing the compiler about the existence of the class std::thread (cf. section 20.2.2);
At line 11 the std::thread hi object is created. It is provided with the name of a function (hello) which will be called in a separate thread. Actually, the second thread, running hello, is immediately started when a std::thread is defined this way;
The main function itself also represents a thread: the program's first thread. It should wait until the second thread has finished. This is realized in line 12, where hi.join() waits until the thread hi has finished its job. Since there are no further statements in main, the program itself ends immediately thereafter.
The function hello itself, defined in lines 4 through 7, is trivial: it simply inserts the text `hello world' into cout, and terminates, thus ending the second thread.

When compiling multi-threaded programs using the Gnu g++ compiler the -pthread option must be specified. At link-time the libpthread library must be available as well.

To create a multi-threaded program defined in a source file multi.cc the g++ compiler can be called like this:

    g++ --std=c++11 -pthread -Wall multi.cc

When several pre-compiled objects must be linked, the -lpthread linker option should also be specified.

20.1: Handling time (absolute and relative)

The C programming language offered tools like sleep(3) and select(2) to suspend program execution for a certain amount of time.

In multi threaded programs threads are frequently suspended, albeit usually for a very short time interval. E.g., when a thread must access a variable, but the variable is currently updated by another thread, then the first thread should wait until the second thread has completed the update. Updating a variable usually doesn't take much time, but if it may take an unexpectedly long time, then the waiting thread may want to be informed about it, so it can do something else as long as the second thread hasn't finished updating the variable.

Sleep and select can be used for wating, but as they were designed in an era when multi threading was commonly unavailable, their capabilities are limited when used in multi threaded programs.

The gap is bridged by the STL offering dedicated classes for specifying time, which mix well with time-dependent thread members. Threads are the topic of the next section (20.2). Before that, we'll first have a look at facilities for specifying time.

20.1.1: Units: the class std::ratio

Thread execution may have to be suspended until a specific point in time, for a specific amount of time, or until some event has occurred. When specifying time an appropriate time unit must be selected. Units (not necessarily time units) are defined in the class template std::ratio.

Before the class ratio can be used, the <ratio> header file must be included. Usually just the <chrono> header file is included, as chrono includes ratio, and facilities for specifying time are available after including the chrono header file.

The class template ratio expects two integral template arguments, defining, respectively, the numerator (amount) and denominator (fraction) of an amount. By default the denominator equals 1, resulting in the ratio's first argument (the numerator) being interpreted as the represented amount. Examples:

    ratio<1>        - representing one; 
    ratio<60>       - representing 60
    ratio<1, 1000>  - representing 1/1000.

The class template ratio defines two directly accessible static data fields: num represents its numerator, den its denominator. A ratio definition by itself simply defines a certain amount. E.g., when executing the following program

    #include <ratio>
    #include <iostream>
    using namespace std;
    int main()
    {
        cout << ratio<5, 1000>::num << ',' << ratio<5, 1000>::den << '\n';
    }

the text 1,200 is displayed, as that's the `amount' represented by ratio<5, 1000>: ratio simplifies the fraction whenever possible.

A fairly large number of predefined ratio types exist. They can be used instead of the more cumbersome ratio<x> or ratio<x, y> specification:


yocto	10^-24	zepto	10^-21

atto	10^-18	femto	10^-15	pico	10^-12
nano	10^-9	micro	10^-6	milli	10^-3
centi	10^-2	deci	10^-1

deca	10¹	hecto	10²	kilo	10³
mega	10⁶	giga	10⁹	tera	10¹²
peta	10¹⁵	exa	10¹⁸

zetta	10²¹	yotta	10²⁴

(note: the definitions of the types yocto, zepto, zetta and yotta use integral constants exceeding 64 bits. Although these constants are defined in C++, they are not available on 64 bit or smaller architectures.)

20.1.2: Amounts of time: std::chrono::duration

The class template std::chrono::duration is defined in the std::chrono namespace. Objects of the class duration define amounts of time.

Before using the class duration the <chrono> header file must be included (which in turn includes the <ratio> header file).

The class template duration requires two template type arguments: a numeric type (commonly int64_t) defining the duration's value, and a time-unit, called its Period, usually defined using the class template ratio.

Here predefined ratio types simplify the job of using the right granularity. E.g., to define 30 minutes you could use

    std::chrono::duration<int64_t, std::deca> halfHr(180)

(resulting in 180 deca-seconds, so 1800 seconds) but even if you specify `using namespace std' and `using namespace chrono' this is rather complex and non-intuitive. Fortunately, various predefined duration types exist:


nanoseconds		`duration<int64_t, nano>`
microseconds		`duration<int64_t, micro>`
milliseconds		`duration<int64_t, milli>`
seconds		`duration<int64_t>`
minutes		`duration<int64_t, ratio<60>>`
hours		`duration<int64_t, ratio<3600>>`

Using these types, a time amount of 30 minutes can now simply be defined as std::chrono::minutes halfHour(30).

The class template duration itself defines two types:

std::chrono::duration<Value, Period>::rep:
the Value type used by the duration type (e.g., int64_t)
std::chrono::duration<Value, Period>::period:
the ratio type used by the duration (e.g., std::ratio::nano) (so period itself has static values num and (den), as mentioned in the previous section).

These types can be retrieved from a duration object using decltype. E.g.,

    auto time(minutes(3) * 3);
    cout << decltype(time)::period::num;    // displays 60

In addition to these types the class template duration offers the following constructors:

constexpr duration(): the default constructor defines a duration of zero time units.
constexpr explicit duration(Value const &value): a specific duration of value time units. Here Value refers to the duration's numeric type (e.g., int64_t). So, when defining
```
    std::chrono::minutes halfHour(30);
```
the argument 30 is stored inside an int64_t.
duration also offers copy and move constructors.

In addition, duration has these members:

Arithmetic operators:
Duration objects may be added, or subtracted, and they may be multiplied and divided by a numeric value. They also support the modulo operator, using an integral constant for its right-hand side operand. The binary arithmetic and the binary compound assignment operators are also available.
constexpr Value count() const:
returns the value that is stored inside a duration object. For halfHour this would return 30, not 1800, as the time unit itself is obtained from its duration<Value, Unit>::period type.
static constexpr duration zero():
this is a static member, returning a duration object representing a duration of zero time units.
static constexpr duration min():
this is a static member, returning a duration object representing numeric_limits<Rep>::lowest().
static constexpr duration max():
this is a static member, returning a duration object representing numeric_limits<Rep>::max().

Different duration types may be combined, unless precision would be lost. When the binary arithmetic operators are used the resulting duration uses the finer of the two granularities. When the binary compound assignment operator is used the granularity of the left-hand side operand must at least be equal to the granularity of the right-hand side operand, or a compilation error is issued. E.g.,

    minutes halfHour(30);
    seconds half_a_minute(30);

    cout << (halfHour + half_a_minute).count(); // displays 1830

    //halfHour += half_a_minute;    won't compile: precision is lost

    half_a_minute += halfHour;
    cout << half_a_minute.count();              // displays 1830

The C++14 standard defines suffixes h, min, s, ms, us, ns for integral values, creating the corresponding duration time intervals. E.g., minutes oneMinute = 1m.

20.1.3: Clocks measuring time

Since time plays an important role, not only in real life, but also in multi-threaded programs, we need clocks to represents and measure time. C++ offers several predefined clock types, and all these clocks are defined in the std::chrono namespace.

Before using these clocks the <chrono> header file must be included.

A clock type must be specified when referring to a point in time using std::chrono::time_point (covered by the next section). It is also possible to define your own clock type (not covered by the C++ Annotations (clause 20.11.3 of the C++11 standard lists the requirements for a clock type)).

If Clock is a predefined clock type, then Clock defines the following types:

std::chrono::Clock::duration:
the duration type which used by Clock (e.g., std::chrono::nanoseconds)
std::chrono::Clock::period:
the time period used by Clock (e.g., std::ratio::nano)
std::chrono::Clock::rep:
defining the value type used by Clock to store duration values (e.g., int64_t)
std::chrono::Clock::time_point:
time points used by Clock (e.g., std::chrono::time_point<system_clock, duration>)

In addition to these types predefined clocks offer a member

static std::chrono::time_point std::chrono::Clock::now():
returns the time_point representing the current time

There are three predefined clock types:

std::chrono::system_clock:
the `wall clock' time, using the system's real time clock. In addition to now the class system_clock offers these two static members:
- static time_t std::chrono::system_clock::to_time_t
  (std::chrono::time_point const & timePoint)
  a time_t value (the same type as returned by C's time(2) function) as used by the representing the same point in time as timePoint. Example:
```
    std::chrono::system_clock::to_time_t(
                                    std::chrono::system_clock().now()
                                )
```
- static std::chrono::time_point std::chrono::system_clock::from_time_t
  (time_t seconds)
  
  a time_point value representing the same point in time as time_t.
std::chrono::steady_clock:
a clock whose time increases in parallel with the increase of real time.
std::chrono::high_resolution_clock:
the computer's fastest clock (i.e., the clock having the shortest timer-tick period). In practice this is the same clock as system_clock.

As an example: to access the current time you could use:

    auto point = std::chrono::system_clock::now();

20.1.4: Points in time: std::chrono::time_point

The class time_point is defined in the std::chrono namespace. Objects of the class std::chrono::time_point define a point in time.

Before using the class std::chrono::time_point the <chrono> header file must be included.

The class time_point is a class template, requiring two template type arguments: a Clock type and a Duration type. The Clock type usually is one of the predefined clock types, e.g., chrono::system_clock. The Duration type may be omitted, in which case the Clock's duration type is used. An explicit duration type may also be provided.

In the previous section auto was used to specify the type of the return value of system_clock::now. The explicit definition looks like this:

    std::chrono::time_point<std::chrono::system_clock> now = 
        std::chrono::system_clock::now();

The class std::chrono::time_point features three constructors:

time_point():
the default constructor represents the beginning of the clock's epoch (E.g., Jan, 1, 1970, 00:00h);
time_point(time_point<Clock, Duration> const &timeStep):
initializes a time_point object to represent a point in time timeStep Duration units byond the clock's epoch;
time_point(time_point<Clock, Duration2> const &timeStep):
this constructor is defined as a member template, using the template header template <typename Duration2>. The type Duration2 is a std::chrono::duration (or comparable) type, using a possibly larger period for its unit than time_point's Duration type. It initializes a time_point object to represent a point in time timeStep Duration2 units byond the clock's epoch.

The class std::chrono::time_point has these operators and members:

std::chrono::time_point &operator+=(Duration const &duration):
this operator is also available as binary arithmetic operator, expecting a std::chrono::time_point const & and a Duration const & argument (in any order). The amount of time represented by duration is added to the current time_point value. Example:
```
    std::chrono::system_clock::now() + seconds(5);
```
std::chrono::time_point &operator-=(Duration const &duration):
this operator is also available as binary arithmetic operator, expecting a std::chrono::time_point const & and a Duration const & argument (in any order). The amount of time represented by duration is subtracted from the current time_point value. Example:
```
    auto point = std::chrono::system_clock::now();
    point -= seconds(5);
```
constexpr Duration time_since_epoch() const:
returns the object's Duration since the epoch.
static constexpr time_point min():
a static member returning the value returned by the time point's duration::min value.
static constexpr time_point max():
a static member returning the value returned by the time point's duration::max value.

All predefined clocks use nanoseconds as their time unit. To obtain the time expressed in a larger time unit, divide the value returned by the time_point's count value by larger time unit converted to nanoseconds. E.g., the number of hours passed since the beginning of the epoch is:

    using namespace std;
    using namespace chrono;     // for brevity

    cout << system_clock::now().time_since_epoch().count() /
            nanoseconds(hours(1)).count() << " hours since the epoch\n";

20.1.5: Converting time to a NTBS

To convert time to a textual representation standard C functions can be used. These functions usually expect arguments in seconds, as returned by, e.g., the std::chrono::system_clock::to_time_t function. Standard C functions can be used to convert the returned time_t values to a textual representation. E.g.,

    using namespace std;
    using namespace chrono;     // for brevity

    time_t tm = system_clock::to_time_t(system_clock::now() + hours(1));
    cout << asctime(localtime(&tm));

Here are some additional examples showing how time_point objects can be used:

    #include <iostream>
    #include <chrono>

    using namespace std;
    using namespace chrono;

    int main()
    {
            // the current time (or use `auto')
            // 'now' is a time_point<system_clock>
        auto now(system_clock::now());

            // its value in seconds:
        cout << system_clock::to_time_t(now) << '\n';

            // now + two hours:
        cout << system_clock::to_time_t(now + hours(2)) << '\n';

            // define a time_point 1 hour after the epoch:
        time_point<system_clock> oneHrLater(hours(1));

            // show the epoch and the time in seconds of oneHrLater:
        cout << system_clock::to_time_t(time_point<system_clock>()) << ' ' <<
                system_clock::to_time_t(oneHrLater) << '\n';
    }

20.2: Multi Threading

In C++ multi threading may be implemented at various levels of abstraction. In general the highest level of abstraction which is available to implement a mult-threaded problem should be used. Not so much because it's often simpler than using lower levels of abstraction, but because higher levels of abstraction are usually semantically closer to the original problem description, resulting in code which is easier to understand and therefore easier to maintain. Also, high-abstraction classes also provide exception safety and prevent the occurrence of memory leaks.

C++'s main tool for creating multi-threaded programs is the class std::thread, and some examples of its use have already been shown at the beginning of this chapter.

Characteristics of individual threads can be queried from the std::this_thread namespace. Also, std::this_thread offers some control over the behavior of an individual thread.

To synchronize access to shared data C++ offers mutexes (implemented by the class std::mutex) and condition variables (implemented by the class std::condition_variable).

Members of these classes may throw system_error objects (cf. section 10.9) when encountering a low-level error condition.

20.2.1: The namespace std::this_thread

The namespace std::this_thread contains functions that are uniquely associated with the currently running thread.

Before using the namespace this_thread the <thread> header file must be included.

Inside the std::this_thread namespace several free functions are defined, providing information about the current thread or that can be used to control its behavior:

thread::id this_thread::get_id() noexcept:
returns an object of type thread::id that identifies the currently active thread of execution. For an active thread the returned id is unique in the sense that it maps 1:1 to the currently active thread, and is not returned by any other thread. If a thread is currently not running thread::id() is returned by the std::thread object's get_id member.
void yield() noexcept:
when a thread calls this_thread::yield() the current thread is briefly suspended, allowing other (waiting) threads to start.
void sleep_for(chrono::duration<Rep, Period> const &relTime) noexcept:
when a thread call this_thread::sleep_for(...) it is suspended for the amount of time that's specified in its argument. E.g.,
```
std::this_thread::sleep_for(std::chrono::seconds(5));
```
void sleep_until(chrono::time_point<Clock, Duration> const &absTime) noexcept:
when a thread calls this member it is suspended until the specified absTime is in the past. The next example has the same effect as the previous example:
```
// assume using namespace std
this_thread::sleep_until(chrono::system_clock().now() + chrono::seconds(5));
```
Conversely, the sleep_until call in the next example immediately returns:
```
this_thread::sleep_until(chrono::system_clock().now() - chrono::seconds(5));
```

20.2.2: The class std::thread

Multi threading in C++ starts off with objects of the class std::thread. Each object of this class handles a separate thread.

Before using Thread objects the <thread> header file must be included.

Thread objects can be constructed in various ways:

thread() noexcept:
The default constructor creates a thread object. As it receives no function to execute, it does not start a separate thread of execution. It is used, e.g., as a data member of a class, allowing class objects to start a separate thread at some later point in time;
thread(thread &&tmp) noexcept:
The move constructor takes ownership of the thread controlled by tmp, while tmp, if it runs a thread, loses control over its thread. Following this, tmp is in its default state, and the newly created thread is responsible for calling, e.g., join.

explicit thread(Fun &&fun, Args &&...args):

This member template (cf. section 22.1.3) expects a function (or functor) as its first argument. The function is immediately started as a separate thread. If the function (or functor) expects arguments, then these arguments can be passed to the thread's constructor immediately following its first (function) argument. Additional arguments are passed with their proper types and values to fun. Following the thread object's construction, a separately running thread of execution is started.
The notation Arg &&...args indicates that any additional arguments are passed as is to the function. The types of the arguments that are passed to the thread constructor and that are expected by the called function must match: values must be values, references must be reference, r-value references must be r-value references (or move construction must be supported). The following example illustrates this requirement:
     1: #include <iostream>
     2: #include <thread>
     3:
     4: using namespace std;
     5:
     6: struct NoMove
     7: {
     8:     NoMove() = default;
     9:     NoMove(NoMove &&tmp) = delete;
    10: };
    11:
    12: struct MoveOK
    13: {
    14:     int d_value = 10;
    15:
    16:     MoveOK() = default;
    17:     MoveOK(MoveOK const &) = default;
    18:
    19:     MoveOK(MoveOK &&tmp)
    20:     {
    21:         d_value = 0;
    22:         cout << "MoveOK move cons.\n";
    23:     }
    24: };
    25:
    26: void valueArg(int value)
    27: {}
    28: void refArg(int &ref)
    29: {}
    30: void r_refArg(int &&tmp)
    31: {
    32:     tmp = 100;
    33: }
    34: void r_refNoMove(NoMove &&tmp)
    35: {}
    36: void r_refMoveOK(MoveOK &&tmp)
    37: {}
    38:
    39: int main()
    40: {
    41:     int value = 0;
    42:
    43:     std::thread(valueArg,   value).join();
    44:     std::thread(refArg,     ref(value)).join();
    45:     std::thread(r_refArg,   move(value)).join();
    46:
    47: //  std::thread(refArg,     value);
    48:
    49:     std::thread(r_refArg,   value).join();
    50:     cout << "value after r_refArg: " << value << '\n';
    51:
    52: //  std::thread(r_refNoMove, NoMove());
    53:
    54:     NoMove noMove;
    55: //  std::thread(r_refNoMove, noMove).join();
    56:
    57:     MoveOK moveOK;
    58:     std::thread(r_refMoveOK, moveOK).join();
    59:     cout << moveOK.d_value << '\n';
    60: }
At lines 43 through 45 we see a value, reference, and and r-value reference being passed to a std::thread: with the functions running the threads expecting matching argument types.

Line 47 fails to compile, as a value argument doesn't match the reference expected by refArg. Note that this problem was solved in line 43 by using the std::ref function.

On the other hand lines 49 and 58 compile OK, as int values and class-types supporting move operations can be passed as values to functions expecting r-value references. In this case notice that the functions expecting the r-value references do not access the provided arguments (except for the actions performed by their move constructors), but use move construction to create temporary values or objects on which the functions operate.

Lines 52 and 55 won't compile as the NoMove struct doesn't offer a move constructor.

Be careful when passing local variables as arguments to thread objects: if the thread continues to run when the function whose local variables are used terminates, then the thread suddenly uses wild pointers or wild references, as the local variables no longer exist. To prevent this from happening (illustrated by the next example) do as follows:

pass an anonymous copy of the local variable as argument to the thread constructor, or

call join on the thread object to ensure that the thread has finished within the local variable's lifetime.
     1: #include <iostream>
     2: #include <thread>
     3: #include <string>
     4: #include <chrono>
     5:
     6: void threadFun(std::string const &text)
     7: {
     8:     for (size_t iter = 1; iter != 6; ++iter)
     9:     {
    10:         std::cout << text << '\n';
    11:         sstd::this_thread::sleep_for(std::chrono::seconds(1));
    12:     }
    13: }
    14:
    15: std::thread safeLocal()
    16: {
    17:     std::string text = "hello world";
    18:     return std::thread(threadFun, std::string(text));
    19: }
    20:
    21: int main()
    22: {
    23:     std::thread local(safeLocal());
    24:     std::cout << "safeLocal has ended\n";
    25:     local.join();
    26: }
In line 18 be sure not to call std::ref(text) instead of std::string(text).
If the thread cannot be created a std::system_error exception is thrown.
Since this constructor not only accepts functions but also function objects as its first argument, a local context may be passed to the function object's constructor. Here is an example of a thread receiving a function object using a local context:
    #include <iostream>
    #include <thread>
    #include <array>

    using namespace std;

    class Functor
    {
        array<int, 30> &d_data;
        int d_value;

        public:
            Functor(array<int, 30> &data, int value)
            :
                d_data(data),
                d_value(value)
            {}
            void operator()(ostream &out)
            {
                for (auto &value: d_data)
                {
                    value = d_value++;
                    out << value << ' ';
                }
                out << '\n';
            }
    };

    int main()
    {
        array<int, 30> data;
        Functor functor(data, 5);
        thread funThread(functor, ref(cout));
        funThread.join();
    };

The class std::thread does not provide a copy constructor.

The following members are available:

thread &operator=(thread &&tmp) noexcept:
If the operator's left-hand side operand (lhs) is a joinable thread, then terminate is called. Otherwise, tmp is assigned to the operator's lhs and tmp's state is changed to the thread's default state (i.e., thread()).

void detach():

Requires joinable (see below) to return true. The thread for which detach is called continues to run. The (e.g., parent) thread calling detach continues immediately beyond the detach-call. After calling object.detach(), `object' no longer represents the (possibly still continuing but now detached) thread of execution. It is the detached thread's implementation's responsibility to release its resources when its execution ends.
Since detach disconnects a thread from the running program, e.g., main no longer can wait for the thread's completion. As a program ends when main ends, its still running detached threads also stop, and a program may not properly finish all its threads, as demonstrated by the following example:
    #include <thread>
    #include <iostream>
    #include <chrono>

    void fun(size_t count, char const *txt)
    {
        for (; count--; )
        {
            std::this_thread::sleep_for(std::chrono::milliseconds(100));
            std::cout << count << ": " << txt << std::endl;
        }
    }

    int main()
    {
        std::thread first(fun, 5, "hello world");
        first.detach();

        std::thread second(fun, 5, "a second thread");
        second.detach();

        std::this_thread::sleep_for(std::chrono::milliseconds(400));
        std::cout << "leaving" << std::endl;
    }
A detached thread may very well continue to run after the function that launched it has finished. Here, too, you should be very careful not to pass local variables to the detached thread, as their references or pointers will be undefined once the function defining the local variables terminates:
#include <iostream>
#include <thread>
#include <chrono>

using namespace std;
using namespace chrono;

void add(int const &p1, int const &p2)
{
    this_thread::sleep_for(milliseconds(200));
    cerr << p1 << " + " << p2 << " = " << (p1 + p2) << '\n';
}

void run()
{
    int v1 = 10;
    int v2 = 20;

//  thread(add, ref(v1), ref(v2)).detach();     // DON'T DO THIS
    thread(add, int(v1), int(v2)).detach();     // this is OK: own copies
}

void oops()
{
    int v1 = 0;
    int v2 = 0;
}

int main()
{
    run();
    oops();
    this_thread::sleep_for(seconds(1));
}

id get_id() const noexcept:
If the current object does not represent a running thread thread::id() is returned. Otherwise, the thread's unique ID (also obtainable from with the thread from this_thread::get_id()) is returned.
void join():
Requires joinable to return true. Blocks the thread calling join until the thread for which join is called has completed. Following its completion the object whose join member was called no longer represents a running thread, and its get_id member will return std::thread::id().
This member was used in several examples shown so far. As noted: when main ends while a joinable thread is still running, terminate is called, aborting the program.
bool joinable() const noexcept:
returns object.get_id() != id(), where object is the thread object for which joinable was called.
void swap(thread &other) noexcept:
The states of the thread object for which swap was called and other are swapped. Note that threads may always be swapped, even when their thread functions are currently being executed.
unsigned thread::hardware_concurrency() noexecpt:
This static member returns the number of threads that can run at the same time on the current computer. On a stand-alone multi-core computer it (probably) returns the number of cores.

Things to note:

When intending to define an anonymous thread it may appear not to start, unless you immediately also call join. E.g.,
```
void doSomething();
int main()
{
    thread(doSomething);        // nothing happens??
    thread(doSomething).join()  // doSomething is executed??
}
```
This similar to the situation we encountered in section 7.5: the first statement doesn't define an anonymous thread object at all. It simply defines the thread object doSomething. Consequently, compilation of the second statement fails, as there is no thread(thread &) constructor. When the first statement is omitted, the doSomething function is executed by the second statement. If the second statement is omitted, a default constructed thread object by the name of doSomething is defined.
A thread only starts after its construction has completed. This includes move constructions or move assignments. E.g., in a statement like
```
    thread object(thread(doSomething));
```
the move constructor is used to transfer control from an anonymous thread executing doSomething to the thread object. Only after object's construction has completed doSomething is started in the separate thread.
Exceptions thrown from the thread (e.g., by the function defining the thread's actions) are local to the executed thread. Either they must be caught by the executing thread (as each running thread has its own execution stack), or they can be passed to the starting thread using a packaged_task and a future (cf., respectively, sections 20.12 and 20.9).

A thread ends when the function executing a thread finishes. When a thread object is destroyed while its thread function is still running, terminate is called, aborting the program's end. Bad news: the destructors of existing objects aren't called and exceptions that are thrown are left uncaught. This happens in the following program as the thread is still active when main ends:

    #include <iostream>
    #include <thread>

    void hello()
    {
        while (true)
            std::cout << "hello world!\n";
    }

    int main()
    {
        std::thread hi(hello);
    }

There are several ways to solve this problem. One of them is discussed in the next section.

20.2.2.1: Static data and threads: std::thread_local

With multi-threaded programs the well-known distinction between global and local data is somewhat too coarse. For single- and multi-threaded programs alike, global data are available to all of the program's code, and local data are available to the function (or compound statement) in which the local data are defined. But multi-threaded programs may feel the need for an intermediate type of data, uniquely available to the different threads.

The thread_local keyword provides this intermediate data level. Global variables declared as thread_local are global within each individual thread. Each thread owns a copy of the thread_local variables, and may modify them at will. A thread_local variable in one thread is completely separated from that variable in another thread. Here is an example:

     1: #include <iostream>
     2: #include <thread>
     3:
     4: using namespace std;
     5:
     6: thread_local int t_value = 100;
     7:
     8: void modify(char const *label, int newValue)
     9: {
    10:     cout << label << " before: " << t_value << ". Address: " <<
    11:                                                     &t_value << endl;
    12:     t_value = newValue;
    13:     cout << label << " after: " << t_value << endl;
    14: }
    15:
    16: int main()
    17: {
    18:     thread(modify, "first", 50).join();
    19:     thread(modify, "second", 20).join();
    20:     modify("main", 0);
    21: }

At line 6 the thread_local variable t_value is defined. It is initialized to 100, and that becomes the initial value for each separately running thread;
In lines 8 through 14 the function modify is defined. It assigns a new value to t_value;
At lines 18 and 19 two threads are started, which are immediately joining the main thread again.
The main thread itself is also a thread, and it directly calls modify.

Running this program shows that each separate thread starts with t_value being 100, and then modifies it without affecting the values of t_value used by other threads.

Note that, although the t_value variables are unique to each thread, identical addresses may be shown for them. Since each thread uses its own stack, these variables may occupy the same relative locations within their respective stacks, giving the illusion that their physical addresses are identical.

20.2.2.2: Exceptions and join()

Once a thread starts and it isn't detached it must eventually join its starting (parent) thread, or the program aborts. Usually, once a thread has started the parent thread continues to do some work by itself:

    void childActions();
    void doSomeWork();

    void parent()
    {
        thread child(childActions);
        doSomeWork();
        child.join();
    }

However, maybe doSomeWork can't complete its work, and throws an exception, to be caught outside of parent. This, unfortunately, ends parent, and child.join() is missed. Consequently, the program aborts because of a thread that hasn't been joined.

Clearly, all exceptions must be caught, join must be called, and the exception must be rethrown. But parent cannot use a function try-block, as the thread object is already out of scope once execution reaches the matching catch-clause. So we get:

    void childActions();
    void doSomeWork();

    void parent()
    {
        thread child(childActions);
        try
        {
            doSomeWork();
            child.join();
        }
        catch (...)
        {
            child.join();
            throw;
        }
    }

This is ugly: suddenly the function's code is clobbered with a try-catch clause, as well as some unwelcome code-duplication.

This situation can be avoided using object based programming. Like, e.g., unique pointers, which use their destructors to encapsulate the destruction of dynamically allocated memory, we can use a comparable technique to encapsulate thread joining in an object's destructor.

By defining the thread object inside a class we're sure that by the time the our object goes out of scope, even if the childActions function throws an exception, the thread's join member is called. Here are the bare essentials of our JoinGuard class, providing the join-guarantee (using in-line member implementations for brevity):

     1: #include <thread>
     2:
     3: class JoinGuard
     4: {
     5:     std::thread d_thread;
     6:
     7:     public:
     8:         JoinGuard(std::thread &&threadObj)
     9:         :
    10:             d_thread(std::move(threadObj))
    11:         {}
    12:         ~JoinGuard()
    13:         {
    14:             if (d_thread.joinable())
    15:                 d_thread.join();
    16:         }
    17: };

At line 8 its only constructor starts: it receives a temporary thread object, which is moved, in line 10, to JoinGuard's d_thread data member.
When the JoinGuard object ceases to exist, its destructor (line 12) makes sure the thread is joined if it's still joinable (lines 14 and 15).

Here is an example how JoinGuard could be used:

     1: #include <iostream>
     2: #include "joinguard.h"
     3:
     4: void childActions();
     5:
     6: void doSomeWork()
     7: {
     8:     throw std::runtime_error("doSomeWork throws");
     9: }
    10:
    11: void parent()
    12: {
    13:     JoinGuard{std::thread{childActions}};
    14:     doSomeWork();
    15: }
    16:
    17: int main()
    18: try
    19: {
    20:     parent();
    21: }
    22: catch (std::exception const &exc)
    23: {
    24:     std::cout << exc.what() << '\n';
    25: }

At line 4 childActions is declared. Its implementation (not provided here) defines the child thread's actions.
The main function (lines 17 through 25) provides the function try-block to catch the exception thrown by parent;
The parent function defines (line 13) an anonymous JoinGuard, receiving an anonymous thread object. Anonymous objects are used, as the parent function doesn't need to access them anymore.
In line 14 doSomeWork is called, which throws an exception. This ends parent, but just before that JoinGuard's destructor makes sure that the child-thread has been joined.

20.3: Synchronization (mutexes)

Objects of mutex classes are used to protect shared data.

Before using mutexes the <mutex> header file must be included.

One of the key characteristics of multi-threaded programs is that threads may share data. Functions running as separate threads have access to all global data, and may also share the local data of their parent threads. However, unless proper measures are taken, this may easily result in data corruption, as illustrated by the following simulation of some steps that could be encountered in a multi-threaded program:

---------------------------------------------------------------------------
Time step:    Thread 1:     var        Thread 2:       description
---------------------------------------------------------------------------
    0                        5
    1           starts                                  T1 active
    2           writes var                              T1 commences writing
    3           stopped                                 Context switch
    4                                   starts          T2 active
    5                                   writes var      T2 commences writing
    6                       10          assigns 10      T2 writes 10
    7                                   stopped         Context switch
    8           assigns 12                              T1 writes 12
    9                       12
----------------------------------------------------------------------------

In this example, threads 1 and 2 share variable var, initially having the value 5. At step 1 thread 1 starts, and starts to write a value into var. However, it is interrupted by a context switch, and thread 2 is started (step 4). Thread 2 also wants to write a value into var, and succeeds until time step 7, when another context switch takes place. By now var is 10. However, thread 1 was also in the process of writing a value into var, and it is given a chance to complete its work: it assigns 12 to var in time step 8. Once time step 9 is reached, thread 2 proceeds on the (erroneous) assumption that var must be equal to 10. Clearly, from the point of view of thread 2 its data have been corrupted.

In this case data corruption was caused by multiple threads accessing the same data in an uncontrolled way. To prevent this from happening, access to shared data should be protected in such a way that only one thread at a time may access the shared data.

Mutexes are used to prevent the abovementioned kinds of problems by offering a guarantee that data are only accessed by the thread that could lock the mutex that is used to synchronize access to those data.

Exclusive data access completely depends on cooperation between the threads. If thread 1 uses mutexes, but thread 2 doesn't, then thread 2 may freely access the common data. Of course that's bad practice, which should be avoided.

It is stressed that although using mutexes is the programmer's responsibility, their implementation isn't: mutexes offer the necessary atomic calls. When requesting a mutex-lock the thread is blocked (i.e., the mutex statement does not return) until the lock has been obtained by the requesting thread.

Apart from the class std::mutex the class std::recursive_mutex is available. When a recursive_mutex is called multiple times by the same thread it increases its lock-count. Before other threads may access the protected data the recursive mutex must be unlocked again that number of times. Moreover, the classes std::timed_mutex and std::recursive_timed_mutex are available. Their locks expire when released, but also after a certain amount of time.

The members of the mutex classes perform atomic actions: no context switch occurs while they are active. So when two threads are trying to lock a mutex only one can succeed. In the above example: if both threads would use a mutex to control access to var thread 2 would not have been able to assign 12 to var, with thread 1 assuming that its value was 10. We could even have two threads running purely parallel (e.g., on two separate cores). E.g.:

-------------------------------------------------------------------------
Time step:    Thread 1:        Thread 2:        escription
-------------------------------------------------------------------------
    1         starts           starts           T1 and T2 active
    2         locks            locks            Both threads try to 
                                                lock the mutex
    3         blocks...        obtains lock     T2 obtains the lock,
                                                and T1 must wait
    4         (blocked)        processes var    T2 processes var,
                                                T1 still blocked
    5         obtains lock     releases lock    T2 releases the lock,
                                                and T1 immediately 
                                                obtains the lock
    6         processes var                     now T1 processes var
    7         releases lock                     T1 also releases the lock
-------------------------------------------------------------------------

Although mutexes can directly be used in programs, this rarely happens. It is more common to embed mutex handling in locking classes that make sure that the mutex is automatically unlocked again when the mutex lock is no longer needed. Therefore, this section merely offers an overview of the interfaces of the mutex classes. Examples of their use will be given in the upcoming sections (e.g., section 20.4).

All mutex classes offer the following constructors and members:

mutex() constexpr:
The default constexpr constructor is the only available constructor;
~mutex():
The destructor does not unlock a locked mutex. If locked it must explicitly be unlocked using the mutex's unlock member;
void lock():
The calling thread blocks until it owns the mutex. Unless lock is called for a recursive mutex a system_error is thrown if the thread already owns the lock. Recursive mutexes increment their internal lock count;
bool try_lock() noexcept:
The calling thread tries to obtain ownership of the mutex. If ownership is obtained, true is returned, otherwise false. If the calling thread already owns the lock true is also returned, and in this case a recursive mutex also increments its internal lock count;
void unlock() noexcept:
The calling thread releases ownership of the mutex. A system_error is thrown if the thread does not own the lock. A recursive mutex decrements its interal lock count, releasing ownership of the mutex once the lock count has decayed to zero;

The timed-mutex classes (timed_mutex, recursive_timed_mutex) also offer these members:

bool try_lock_for(chrono::duration<Rep, Period> const &relTime) noexcept:
The calling thread tries to obtain ownership of the mutex within the specified time interval. If ownership is obtained, true is returned, otherwise false. If the calling thread already owns the lock true is also returned, and in this case a recursive timed mutex also increments its internal lock count. The Rep and Duration types are inferred from the actual relTime argument. E.g.,
```
std::timed_mutex timedMutex;
timedMutex.try_lock_for(chrono::seconds(5));
```
bool try_lock_until(chrono::time_point<Clock, Duration> const &absTime) noexcept:
The calling thread tries to obtain ownership of the mutex until absTime has passed. If ownership is obtained, true is returned, otherwise false. If the calling thread already owns the lock true is also returned, and in this case a recursive timed mutex also increments its internal lock count. The Clock and Duration types are inferred from the actual absTime argument. E.g.,
```
std::timed_mutex timedMutex;
timedMutex.try_lock_until(chrono::system_clock::now() + chrono::seconds(5));
```

20.3.1: Initialization in multi-threaded programs

Before using the std::once_flag and the std::call_once function, introduced in this section, the <mutex> header file must be included.

In single threaded programs the initialization of global data not necessarily happens at the same point in code. An example is the initialization of the object of a singleton class (cf. Gamma et al. (1995), Design Patterns, Addison-Wesley). Singleton classes may define a single static pointer data member Singleton *s_object, pointing to the singleton's object, and may offer a static member instance, implemented something like this:

    Singleton &Singleton::instance()
    {
        return s_object ? 
                    s_object 
                : 
                    (s_object = new Singleton);
    }

With multi-threaded programs this approach immediately gets complex. For example, if two threads call instance at the same time, while s_object still equals 0, then both may call new Singleton, resulting in one dynamically allocated Singleton object becoming unreachable. Other threads, called after s_object was initialized for the first time, may either return a reference to that object, or may return a reference to the object initialized by the second thread. Not exactly the expected behavior of a singleton.

Mutexes (cf. section 20.3) can be used to solve these kinds of problems, but they result in some overhead and inefficiency, as the mutex must be inspected at each call of Singleton::instance.

When variables must dynamically be initialized, and the initialization should take place only once the std::once_flag type and the std::call_once function should be used.

The call_once function expects two or three arguments:

The first argument is a once_flag variable, keeping track of the actual initialization status. The call_once function simply returns if the once_flag indicates that initialization already took place;
The second argument is the address of a function which must be called only once. This function may be a free function or it may be the address of a class member function;
If the second argument is the address of a class member function, then the object for which the member function should be called must be provided as call_once's third argument.

A thread-safe implementation of the singleton's instance function can now easily be designed (using in-class implementations for brevity):

    class Singleton
    { 
        static std::once_flag s_once;
        static Singleton *s_singleton;
        ...
        public:
            static Singleton *instance()
            {
                std::call_once(s_once, []{s_singleton = new Singleton;} );
                return s_singleton;
            }
        ...
    };

However, there are additional ways to initialize data, even for multi-threaded programs:

First, suppose a constructor is declared with the constexpr keyword (cf. section 8.1.4.1), satisfying the requirements for constant initialization. In this case, a static object, initialized using that constructor, is guaranteed to be initialized before any code is run as part of the static initialization phase. This used by std::mutex, as it eliminates the possibility of race conditions when global mutexes are initialized.

Second, a static variable defined within a compound statement may be used (e.g., a static local variable within a function body). Static variables defined within a compound statement are initialized the first time the function is called at the point in the code where the static variable is defined. Here is an example:

        #include <iostream>

        struct Cons
        {
            Cons()
            {
                std::cout << "Cons called\n";
            }
        };
        void called(char const *time)
        {
            std::cout << time << "time called() activated\n";
            static Cons cons;
        }
        int main()
        {
            std::cout << "Pre-1\n";
            called("first");
            called("second");
            std::cout << "Pre-2\n";
            Cons cons;
        }
    /*
        Displays:
            Pre-1
            firsttime called() activated
            Cons called
            secondtime called() activated
            Pre-2
            Cons called
    */

This feature causes a thread to wait automatically if another thread is still initializing the static data (note that non-static data never cause problems, as non-static local variables only exist within their own thread of execution).

20.3.2: C++14: shared mutexes

The C++14 standard defines the type std::shared_mutex, available after including the <shared_mutex> header file.

The type std::shared_mutex is a shared mutex type. Shared mutex types behave like timed_mutex types and optionally have the characteristics described below.

> In this description, m denotes an object of a > mutex type, rel_type denotes an object of an instantiation of duration > (20.11.5), and abs_time denotes an object of an instantiation of time_point > (20.11.6).

The class shared_mutex provides a non-recursive mutex with shared ownership semantics, comparable to, e.g., the shared_ptr type. A program using shared_mutexes is undefined if:

it destroys a shared_mutex object owned by any thread;
a thread recursively attempts to gain ownership of a shared_mutex;
a thread terminates while owning a shared_mutex.

Shared mutex types provide a shared lock ownership mode. Multiple threads can simultaneously hold a shared lock ownership of a shared_mutex type of object. But no thread can hold a shared lock while another thread holds an exclusive lock on the same shared_mutex object, and vice-versa.

The type shared_mutex offers the following members:

void lock_shared():
Blocks the calling thread until shared ownership of the mutex can be obtained by the calling thread. An exception is thrown if the current thread already owns the lock, if it is not allowed to lock the mutex, or if the mutex is already locked and blocking is not possible;
void unlock_shared():
Releases a shared lock on the mutex held by the calling thread. Nothing happens if the current thread does not already own the lock;
bool try_lock_shared():
The current thread attempts to obtain shared ownership of the mutex without blocking. If shared ownership is not obtained, there is no effect and try_lock_shared immediately returns. Returns true if the shared ownership lock was acquired, false otherwise. An implementation may fail to obtain the lock even if it is not held by any other thread. Initially the calling thread may not yet own the mutex;
bool try_lock_shared_for(rel_time):
Attempts to obtain shared lock ownership for the calling thread within the relative time period specified by rel_time. If the time specified by rel_time is less than or equal to rel_time.zero(), the member attempts to obtain ownership without blocking (as if by calling try_lock_shared()). The member shall return within the time interval specified by rel_time only if it has obtained shared ownership of the mutex object. Returns true if the shared ownership lock was acquired, false otherwise. Initially the calling thread may not yet own the mutex;
bool try_lock_shared_until(abs_time):
Attempts to obtain shared lock ownership for the calling thread until the time specified by abs_time has passed. If the time specified by abs_time has already passed then the member attempts to obtain ownership without blocking (as if by calling try_lock_shared()). Returns true if the shared ownership lock was acquired, false otherwise. Initially the calling thread may not yet own the mutex;

20.4: Locks and lock handling

Locks are used to simplify the use of mutexes. Before locks can be used the <mutex> header file must be included.

Whenever threads share data, and at least one of the threads may change common data, mutexes should be used to prevent threads from using the same data synchronously.

Usually locks are released at the end of action blocks. This requires explicit calls to the mutexes' unlock function, with introduces comparable problems as we've seen with the thread's join member.

To simplify locking and unlocking two mutex wrapper classes are available:

std::lock_guard:
objects of this class offer the basic unlock-guarantee: their destructors call the member unlock of the mutexes they control;
std::unique_lock:
objects of this class offer a more extensive interface, allowing explicit unlocking and locking of the mutexes they control, while their destructors preserve the unlock-guarantee also offered by lock_guard;

The class lock_guard offers a limited, but useful interface:

lock_guard<Mutex>(Mutex &mutex):
when defining a lock_guard object the mutex type (e.g., std::mutex, std::timed_mutex, std::shared_mutex) is specified, and a mutex of the indicated type is provided as its argument. The construction blocks until the lock_guard object owns the lock. The lock_guard's destructor automatically releases the mutex lock.
lock_guard<Mutex>(Mutex &mutex, std::adopt_lock_t):
this constructor is used to transfer control over the mutex from the calling thread to the lock_guard. The mutex lock is released again by the lock_guard's destructor. At construction time the mutex must already be owned by the calling thread. Here is an illustration of how it can be used:
```
 1: void threadAction(std::mutex &mut, int &sharedInt)
 2: {
 3:     std::lock_guard<std::mutex> lg{mut, std::adopt_lock_t()};
 4:     // do something with sharedInt
 5: }
```
- At line 1 threadAction receives a reference to a mutex. Assume the mutex owns the lock;
- At line 3 control is transferred to the lock_guard. Even though we don't explicitly use the lock_guard object, an object should be defined to prevent the compiler from destroying an anonymous object before the function ends;
- When the function ends, at line 5, the mutex's lock is released by the lock_guard's destructor.
mutex_type:
in addition to the constructors and destructor, lock_guard<Mutex> types also define the type mutex_type: it is a synonym of the Mutex type that is passed to the lock_guard's constructor.

Here is a simple example of a multi-threaded program using lock_guards to prevent information inserted into cout from getting mixed.

    bool oneLine(istream &in, mutex &mut, int nr)
    {
       lock_guard<mutex> lg(mut);

        string line;
        if (not getline(in, line))
            return false;

        cout << nr << ": " << line << endl;

        return true;
    }

    void io(istream &in, mutex &mut, int nr)
    {
        while (oneLine(in, mut, nr))
            this_thread::yield();
    }

    int main(int argc, char **argv)
    {
        ifstream in(argv[1]);
        mutex ioMutex;

        thread t1(io, ref(in), ref(ioMutex), 1);
        thread t2(io, ref(in), ref(ioMutex), 2);
        thread t3(io, ref(in), ref(ioMutex), 3);

        t1.join();
        t2.join();
        t3.join();
    }

As with lock_guard, a mutex-type must be specified when defining objects of the class std::unique_lock. The class unique_lock is much more elaborate than the basic lock_guard class template. Its interface does not define a copy constructor or overloaded assignment operator, but it does define a move constructor and a move assignment operator. In the following overview of unique_lock's inteface Mutex refers to the mutex-type that is specified when defining a unique_lock:

unique_lock() noexcept:
the default constructor is not yet associated with a mutex object. It must be assigned a mutex (e.g., using move-assignment) before it can do anything useful;
explicit unique_lock(Mutex &mutex):
initializes a unique_lock with an existing Mutex object, and calls mutex.lock();
unique_lock(Mutex &mutex, defer_lock_t) noexcept:
initializes a unique_lock with an existing Mutex object, but does not call mutex.lock(). Call it by passing a defer_lock_t object as the constructor's second argument, e.g.,
```
unique_lock<mutex> ul(mutexObj, defer_lock_t())
```
unique_lock(Mutex &mutex, try_to_lock_t) noexcept:
initializes a unique_lock with an existing Mutex object, and calls mutex.try_lock(): the constructor won't block if the mutex cannot be locked;
unique_lock(Mutex &mutex, adopt_lock_t) noexcept:
initializes a unique_lock with an existing Mutex object, and assumes that the current thread has already locked the mutex;
unique_lock(Mutex &mutex, chrono::duration<Rep, Period> const &relTime) noexcept:
this constructor tries to obtain ownership of the Mutex object by calling mutex.try_lock_for(relTime). The specified mutex type must therefore support this member (e.g., it is a std::timed_mutex). It could be called like this:
```
std::unique_lock<std::timed_mutex> ulock(timedMutex, 
                                         std::chrono::seconds(5));
```
unique_lock(Mutex &mutex, chrono::time_point<Clock, Duration> const &absTime) noexcept:
this constructor tries to obtain ownership of the Mutex object by calling mutex.try_lock_until(absTime). The specified mutex type must therefore support this member (e.g., it is a std::timed_mutex). This constructor could be called like this:
```
std::unique_lock<std::timed_mutex> ulock(
                timedMutex, 
                std::chrono::system_clock::now() + std::chrono::seconds(5)
            );
```
void lock():
blocks the current thread until ownership of the mutex that is managed by the unique_lock is obtained. If no mutex is currently managed, then a system_error exception is thrown.
Mutex *mutex() const noexcept:
returns a pointer to the mutex object stored inside the unique_lock (a nullptr is returned if no mutex object is currently associated with the unique_lock object.)
explicit operator bool() const noexcept:
returns true if the unique_lock owns a locked mutex, otherwise false is returned;
unique_lock& operator=(unique_lock &&tmp) noexcept:
if the left-hand operand owns a lock, it will call its mutex's unlock member, whereafter tmp's state is transferred to the left-hand operand;
bool owns_lock() const noexcept:
returns true if the unique_lock owns the mutex, otherwise false is returned;
Mutex *release() noexcept:
returns a pointer to the mutex object that is associated with the unique_lock object, discarding that association;
void swap(unique_lock& other) noexcept:
swaps the states of the current unique_lock and other;
bool try_lock():
tries to obtain ownership of the mutex that is associated witg the unique_lock, returning true if this succeeds, and false otherwise. If no mutex is currently associated with the unique_lock object, then a system_error exception is thrown;
bool try_lock_for(chrono::duration<Rep, Period> const &relTime):
this member function tries to obtain ownership of the Mutex object managed by the unique_lock object by calling the mutex's try_lock_for(relTime) member. The specified mutex type must therefore support this member (e.g., it is a std::timed_mutex);
bool try_lock_until(chrono::time_point<Clock, Duration> const &absTime):
this member function tries to obtain ownership of the Mutex object managed by the unique_lock object by calling the mutex's mutex.try_lock_until(absTime) member. The specified mutex type must therefore support this member (e.g., it is a std::timed_mutex);
void unlock():
releases ownership of the mutex (or reduces the mutex's lock count). A system_error exception is thrown if the unique_lock object does not own the mutex.

In addition to the members of the classes std::lock_guard and std::unique_lock the functions std::lock and std::try_lock are available. These functions can be used to prevent deadlocks, the topic of the next section.

20.4.1: Deadlocks

A deadlock occurs when two locks are required to process data, but one thread obtains the first lock and another thread obtains the second lock. C++ defines the generic std::lock and std::try_lock functions that can be used to help preventing such situations.

Before these functions can be used the <mutex> header file must be included

In the following overview L1 &l1, ... represents one or more references to objects of lockable types:

void std::lock(L1 &l1, ...):
When the function returns locks were obtained on all li objects. If a lock could not be obtained for at least one of the objects, then all locks obtained so far are relased, even if the object for which no lock could be obtained threw an exception;
int std::try_lock(L1 &l1, ...):
This function calls the lockable objects' try_lock members. If all locks could be obtained, then -1 is returned. Otherwise the (0-based) index of the first argument which could not be locked is returned, releasing all previously obtained locks.

As an example consider the following little multi-threaded program: The threads use mutexes to obtain unique access to cout and to an int value. However, fun1 first locks cout (line 7), and then value (line 10); fun2 first locks value (line 16) and then cout (line 19). Clearly, if fun1 has locked cout fun2 can't obtain the lock until fun1 has released it. Unfortunately, fun2 has locked value, and the functions only release their locks when returning. But in order to access the information in value fun1 it must have obtained a lock on value, which it can't, as fun2 has already locked value: the threads are waiting for each other, and neither thread gives in.

     1: int value;
     2: mutex valueMutex;
     3: mutex coutMutex;
     4:
     5: void fun1()
     6: {
     7:     lock_guard<mutex> lg1(coutMutex);
     8:     cout << "fun 1 locks cout\n";
     9:
    10:     lock_guard<mutex> lg2(valueMutex);
    11:     cout << "fun 1 locks value\n";
    12: }
    13:
    14: void fun2()
    15: {
    16:     lock_guard<mutex> lg1(valueMutex);
    17:     cerr << "fun 2 locks value\n";
    18:
    19:     lock_guard<mutex> lg2(coutMutex);
    20:     cout << "fun 2 locks cout\n";
    21: }
    22:
    23: int main()
    24: {
    25:     thread t1(fun1);
    26:     fun2();
    27:     t1.join();
    28: }

A good recipe for avoiding deadlocks is to prevent nested (or multiple) mutex lock calls. But if multiple mutexes must be used, always obtain the locks in the same order. Rather than doing this yourself, std::lock and std::try_lock should be used whenever possible to obtain multiple mutex locks. These functions accept multiple arguments, which must be lockable types like lock_guard, unique_lock, or even a plain mutex. The previous deadlocking program, can be modified to call std::lock to lock both mutexes. In this example using one single mutex would also work, but the modified program now looks as similar as possible to the previous program. Note how in lines 10 and 21 a different ordering of the unique_locks arguments was used: it is not necessary to use an identical argument order when calling std::lock or std::try_lock.

     1: int value;
     2: mutex valueMutex;
     3: mutex coutMutex;
     4:
     5: void fun1()
     6: {
     7:     unique_lock<mutex> lg1(coutMutex, defer_lock);
     8:     unique_lock<mutex> lg2(valueMutex, defer_lock);
     9:
    10:     lock(lg1, lg2);
    11:
    12:     cout << "fun 1 locks cout\n";
    13:     cout << "fun 1 locks value\n";
    14: }
    15:
    16: void fun2()
    17: {
    18:     unique_lock<mutex> lg1(coutMutex, defer_lock);
    19:     unique_lock<mutex> lg2(valueMutex, defer_lock);
    20:
    21:     lock(lg2, lg1);
    22:
    23:     cout << "fun 2 locks cout\n";
    24:     cout << "fun 2 locks value\n";
    25: }
    26:
    27: int main()
    28: {
    29:     thread t1(fun1);
    30:     thread t2(fun2);
    31:     t1.join();
    32:     t2.join();
    33: }

20.4.2: C++14: shared locks

The C++14 standard defines the type std::shared_lock, available after including the <shared_mutex> header file.

An object of the type std::shared_lock controls the shared ownership of a lockable object within a scope. Shared ownership of the lockable object may be acquired at construction time or thereafter, and once acquired, it may be transferred to another shared_lock object. Objects of type shared_lock cannot be copied, but move construction and assignment is supported.

The behavior of a program is undefined if the contained pointer to a mutex (pm) has a non-zero value and the lockable object pointed to by pm does not exist for the entire remaining lifetime of the shared_lock object. The supplied mutex type must be a shared_mutex or a type having the same characteristics.

The type shared_lock offers the following constructors, destructor and operators:

shared_lock() noexcept:
The default constructor creates a shared_lock which is not owned by a thread and for which pm == 0;
explicit shared_lock(mutex_type &mut):
This constructor locks the mutex, calling mut.lock_shared(). The calling thread may not already own the lock. Following the construction pm == &mut, and the lock is owned by the current thread;
shared_lock(mutex_type &mut, defer_lock_t) noexcept:
This constructor assigns pm to &mut, but the calling thread does not own the lock;
shared_lock(mutex_type &mut, try_to_lock_t):
This constructor tries to locks the mutex, calling mut.try_lock_shared(). The calling thread may not already own the lock. Following the construction pm == &mut, and the lock may or may not be owned by current thread, depending on the return value of try_lock_shared;
shared_lock(mutex_type &mut, adopt_lock_t):
This constructor can be called if the calling thread has shared ownership of the mutex. Following the construction pm == &mut, and the lock is owned by the current thread;
shared_lock(mutex_type &mut, chrono::time_point<Clock, Duration> const &abs_time):
This constructor is a member template, where Clock and Duration are types specifying a clock and absolute time (cf. section 20.1). It can be called if the calling thread does not already own the mutex. It calls mut.try_lock_shared_until(abs_time). Following the construction pm == &mut, and the lock may or may not be owned by current thread, depending on the return value of try_lock_shared_until;
shared_lock(mutex_type &mut, chrono::duration<Rep, Period> const &rel_time):
This constructor is a member template, where Clock and Period are types specifying a clock and relative time (cf. section 20.1). It can be called if the calling thread does not already own the mutex. It calls mut.try_lock_shared_for(abs_time). Following the construction pm == &mut, and the lock may or may not be owned by current thread, depending on the return value of try_lock_shared_for;
shared_lock(shared_lock &&tmp) noexcept:
The move constructor transfers the information in tmp to the newly constructed shared_lock. Following the construction tmp.pm == 0 and tmp no longer owns the lock;
~shared_lock():
If the lock is owned by the current thread, pm->unlock_shared() is called;
shared_lock &operator=(shared_lock &&tmp) noexcept (The move assignment operator calls pm->unlock_shared and then transfers the information in tmp to the current shared_lock object. Following this tmp.pm == 0 and tmp no longer owns the lock;)
explicit operator bool () const noexcept:
Returns whether or not the shared_lock object owns the lock.

The following members are provided:

void lock():
Calls pm->lock_shared(), after which the current tread owns the shared lock. Exceptions may be thrown from lock_shared, and otherwise if pm == 0 or if the current thread already owns the lock;
mutex_type *mutex() const noexcept:
Returns pm;
mutex_type *release() noexcept:
Returns the previous value of pm, which is equal to zero after calling this member. Also, the current object no longer owns the lock;
void swap(shared_lock &other) noexcept:
Swaps the data members of the current and the other shared_lock objects. There is also a free member swap, a function template, swapping two shared_lock<Mutex> objects, where Mutex represents the mutex type for which the shared lock objects were instantiated: void swap(shared_lock<Mutex> &one, shared_lock<Mutex> &two) noexcept;
bool try_lock():
Calls pm->try_lock_shared(), returning this call's return value. Exceptions may be thrown from try_lock_shared, and otherwise if pm == 0 or if the current thread already owns the lock;
bool try_lock_for(const chrono::duration<Rep, Period>& rel_time):
A member template, where Clock and Period are types specifying a clock and relative time (cf. section 20.1). It calls mut.try_lock_shared_for(abs_time). Following the call the lock may or may not be owned by current thread, depending on the return value of try_lock_shared_until. Exceptions may be thrown from try_lock_shared_for, and otherwise if pm == 0 or if the current thread already owns the lock;
bool try_lock_until(const chrono::time_point<Clock, Duration>& abs_time):
A member template, where Clock and Duration are types specifying a clock and absolute time (cf. section 20.1). It calls mut.try_lock_shared_until(abs_time), returning its return value. Following the call the lock may or may not be owned by current thread, depending on the return value of try_lock_shared_until. Exceptions may be thrown from try_lock_shared_until, and otherwise if pm == 0 or if the current thread already owns the lock;
void unlock():
Unlocks the shared mutex lock, releasing its ownership. Throws an exception if the shared mutex was not owned by the current thread.

20.5: Event handling (condition variables)

This section introduces condition variables. Condition variables allow programs to synchronize threads using the states of data, rather than simply locking the access to data (which is realized using mutexes).

Before condition variables can be used the <condition_variable> header file must be included.

To start our discussion, consider a classic producer-consumer scenario: the producer generates items which are consumed by a consumer. The producer can only produce a certain number of items before its storage capacity has filled up and the client cannot consume more items than the producer has produced.

At some point the producer's storage capacity has filled to the brim, and the producer has to wait until the client has at least consumed some items, thereby creating space in the producer's storage. Similarly, the consumer cannot start consuming until the producer has at least produced some items.

Implementing this scenario only using mutexes (data locking) is not an attractive option, as merely using mutexes forces a program to implement the scenario using polling: processes must continuously (re)acquire the mutex's lock, determine whether they can perform some action, followed by the release of the lock. Often there's no action to perform, and the process is busy acquiring and releasing the mutex's lock. Polling forces threads to wait until they can lock the mutex, even though continuation might already be possible. The polling interval could be reduced, but that too isn't an attractive option, as that increases the overhead associated with handling the mutexes (also called `busy waiting').

Condition variables can be used to prevent polling. Threads can use condition variables to notify waiting threads that there is something for them to do. This way threads can synchronize on data values (states).

As data values may be modified by multiple threads, threads still need to use mutexes, but only for controlling access to the data. In addition, condition variables allow threads to release ownership of mutexes until a certain value has been obtained, until a preset amount of time has been passed, or until a preset point in time has been reached.

The prototypical setup of threads using condition variables looks like this:

consumer thread(s) act like this:

    lock the mutex
    while the required condition has not yet been attained (i.e., is false):
        wait until being notified 
         (this automatically releasing the mutex's lock).
    once the mutex's lock has been reacquired, and the required condition
    has been attained:
        process the data
    release the mutex's lock.

producer thread(s) act similarly:

    lock the mutex
    while the required condition has not yet been attained:
        do something to attain the required condition
    notify waiting threads (that the required condition has been attained)
    release the mutex's lock.

No matter which thread starts, the thread holding the mutex's lock will at some point release the lock, allowing the other process to (re)acquire it. If the consumer starts it immediately releases the lock once it enters its waiting state; if the producer starts it releases the lock once the condition is true.

This protocol hides a subtle initial synchronization requirement. The consumer will miss the producer's notification if it (i.e., the consumer) hasn't yet entered its waiting state. So waiting (consumer) threads should start before notifying (producer) threads. Once threads have started, no assumptions can be made anymore about the order in which any of the condition variable's members (notify_one, notify_all, wait, wait_for, and wait_until) are called.

Condition variables come in two flavors: objects of the class std::condition_variable are used in combination with objects of type unique_lock<mutex>. Because of optimizations which are available for this specific combination using condition_variables is somewhat more efficient than using the more generally applicable class std::condition_variable_any, which may be used with any (e.g., user supplied) lock type.

Condition variable classes (covered in detail in the next two sections) offer members like wait, wait_for, wait_until, notify_one and notify_all that may concurrently be called. The notifying members are always atomically executed. Execution of the wait members consists of three atomic parts:

the mutex is released, and the thread is suspended until its notification;
Once the notification has been received, the lock is reacquired
The wait state ends (and processing continues beyond the wait call).

So, returning from wait-members the previously waiting thread has reacquired the mutex's lock.

In addition to the condition variable classes the following free function and enum type is provided:

void std::notify_all_at_thread_exit(condition_variable &cond, unique_lock<mutex> lockObject):
once the current thread has ended, all other threads waiting on cond are notified. It is good practice to exit the thread as soon as possible after calling notify_all_at_thread_exit.
Waiting threads must verify that the thread they were waiting for has indeed ended. This is usually realized by first obtaining the lock on lockObject, followed by verifying that the condition they were waiting for is true and that the lock was not reacquired before notify_all_at_thread_exit was called.
std::cv_status:
the cv_status enum is used by several member functions of the condition variable classes (cf. sections 20.5.1 and 20.5.2):
```
namespace std
{
    enum class cv_status 
    { 
        no_timeout, 
        timeout 
    };
}
```

20.5.1: The class std::condition_variable

The class std::condition_variable merely offers a default constructor. No copy constructor or overloaded assignment operator is provided.

Before using the class condition_variable the <condition_variable> header file must be included.

The class's destructor requires that no thread is blocked by the thread destroying the condition_variable. So all threads waiting on a condition_variable must be notified before a condition_variable object's lifetime ends. Calling notify_all (see below) before a condition_variable's lifetime ends takes care of that, as the condition_variable's thread releases its lock of the mutex variable, allowing one of the notified threads to lock the mutex.

In the following member-descriptions a type Predicate indicates that a provided Predicate argument can be called as a function without arguments, returning a bool. Also, other member functions are frequently referred to. It is tacitly assumed that all member referred to below were called using the same condition variable object.

The class condition_variable supports several wait members, which block the thread until notified by another thread (or after a configurable waiting time). However, wait members may also spuriously unblock, without having reacquired the lock. Therefore, returning from wait members threads should verify that the required condition is actually true. If not, again calling wait may be appropriate. The next piece of pseudo code illustrates this scheme:

    while (conditionNotTrue())
        condVariable.wait(&uniqueLock);

The class condition_variable's members are:

void notify_one() noexcept:
one wait member called by other threads returns. Which one actually returns cannot be predicted.
void notify_all() noexcept:
all wait members called by other threads unblock their wait states. Of course, only one of them will subsequently succeed in reacquiring the condition variable's lock object.
void wait(unique_lock<mutex>& uniqueLock):
before calling wait the current thread must have acquired the lock of uniqueLock. Calling wait releases the lock, and the current thread is blocked until it has received a notification from another thread, and has reacquired the lock.
void wait(unique_lock<mutex>& uniqueLock, Predicate pred):
this is a member template, using the template header template <typename Predicate>. The template's type is automatically derived from the function's argument type and does not have to be specified explicitly.
Before calling wait the current thread must have acquired the lock of uniqueLock. As long as `pred' returns false wait(lock) is called.
cv_status wait_for(unique_lock<mutex> &uniqueLock, std::chrono::duration<Rep, Period> const &relTime):
this member is defined as a member template, using the template header template <typename Rep, typename Period>. The template's types are automatically derived from the types of the function's arguments and do not have to be specified explicitly. E.g., to wait for at most 5 seconds wait_for can be called like this:
```
cond.wait_for(&unique_lock, std::chrono::seconds(5));
```
This member returns when being notified or when the time interval specified by relTime has passed.
When returning due to a timeout, std::cv_status::timeout is returned, otherwise std::cv_status::no_timeout is returned.
Threads should verify that the required data condition has been met after wait_for has returned.
bool wait_for(unique_lock<mutex> &uniqueLock, chrono::duration<Rep, Period> const &relTime, Predicate pred):
this member is defined as a member template, using the template header template <typename Rep, typename Period, typename Predicate>. The template's types are automatically derived from the types of the function's arguments and do not have to be specified explicitly.
As long as pred returns false, the previous wait_for member is called. If the previous member returns cv_status::timeout, then pred is returned, otherwise true.
cv_status wait_until(unique_lock<mutex>& uniqueLock, chrono::time_point<Clock, Duration> const &absTime):
this member is defined as a member template, using the template header template <typename Clock, typename Duration>. The template's types are automatically derived from the types of the function's arguments and do not have to be specified explicitly. E.g., to wait until 5 minutes after the current time wait_until can be called like this:
```
cond.wait_until(&unique_lock, chrono::system_clock::now() +
                              std::chrono::minutes(5));
```
This function acts identically to the wait_for(unique_lock<mutex> &uniqueLock, chrono::duration<Rep, Period> const &relTime) member described earlier, but uses an absolute point in time, rather than a relative time specification.
This member returns when being notified or when the time interval specified by relTime has passed. When returning due to a timeout, std::cv_status::timeout is returned, otherwise std::cv_status::no_timeout is returned.
bool wait_until(unique_lock<mutex> &lock, chrono::time_point<Clock, Duration> const &absTime, Predicate pred):
this member is defined as a member template, using the template header template <typename Clock, typename Duration, typename Predicate>. The template's types are automatically derived from the types of the function's arguments and do not have to be specified explicitly.
As long as pred returns false, the previous wait_until member is called. If the previous member returns cv_status::timeout, then pred is returned, otherwise true.

Threads should verify that the required condition is true when wait-members of condition variables return.

20.5.2: The class std::condition_variable_any

Different from the class condition_variable the class std::condition_variable_any can be used with any (e.g., user supplied) lock type, and not just with the stl-provided unique_lock<mutex>.

Before using the class condition_variable_any the <condition_variable> header file must be included.

The functionality that is offered by condition_variable_any is identical to the functionality offered by the class condition_variable, albeit that the lock-type that is used by condition_variable_any is not predefined. The class condition_variable_any therefore requires the specification of the lock-type that must be used by its objects.

In the interface shown below this lock-type is referred to as Lock. Most of condition_variable_any's members are defined as member templates, defining a Lock type as one of its parameters. The requirements of these lock-types are identical to those of the stl-provided unique_lock, and user-defined lock-type implementations should provide at least the interface and semantics that is also provided by unique_lock.

This section merely presents the interface of the class condition_variable_any. As its interface offers the same members as condition_variable (allowing, where applicable, passing any lock-type instead of just unique_lock to corresponding members), the reader is referred to the previous section for a description of the semantics of the class members.

Like condition_variable, the class condition_variable_any only offers a default constructor. No copy constructor or overloaded assignment operator is provided.

Also, like condition_variable, the class's destructor requires that no thread is blocked by the current thread. This implies that all other (waiting) threads must have been notified; those threads may, however, subsequently block on the lock specified in their wait calls.

Note that, in addition to Lock, the types Clock, Duration, Period, Predicate, and Rep are template types, defined just like the identically named types mentioned in the previous section.

Assuming that MyMutex is a user defined mutex type, and that MyLock is a user defined lock-type (cf. section 20.4 for details about lock-types), then a condition_variable_any object can be defined and used like this:

    MyMutex mut;
    MyLock<MyMutex> ul(mut);
    condition_variable_any cva;

    cva.wait(ul);

These are the class condition_variable_any's members:

void notify_one() noexcept;
void notify_all() noexcept;
void wait(Lock& lock);
void wait(Lock& lock, Predicate pred);
cv_status wait_until(Lock& lock, const chrono::time_point<Clock, Duration>& absTime);
bool wait_until(Lock& lock, const chrono::time_point<Clock, Duration>& absTime, Predicate pred);
cv_status wait_for(Lock& lock, const chrono::duration<Rep, Period>& relTime);
bool wait_for(Lock& lock, const chrono::duration<Rep, Period>& relTime,) Predicate pred;

20.5.3: An example using condition variables

Condition variables are used to synchronize threads on the values of data, rather than on the mere access to data (for which plain mutex-objects can be used). Using condition variables, a thread simply sleeps until it is notified by another thread. In a producer-consumer type of program this is usually accomplished like this:

    consumer loop:
        - wait until there's an item in store,
            then reduce the number of stored items
        - remove the item from the store
        - increment the number of available storage locations
        - do something with the retrieved item

    producer loop:
        - produce the next item
        - wait until there's room to store the item,
            then reduce the number of available storage locations
        - store the item
        - increment the number of stored items

It is important that the two storage administrative tasks (registering the number of available items and available storage locations) are either performed by the client or by the producer. For the consumer `waiting' means:

Get a lock on the variable containing the actual count
As long as the count is zero: wait, releasing the lock until another thread has increased the count, then re-acquire the lock.
Reduce the count
Release the lock.

This scheme is implemented in a class Semaphore, offering members wait and notify_all. For a more extensive discussion of semaphores see Tanenbaum, A.S. and Austin, T. (2013) Structured Computer Organization, Pearson Prentice-Hall.

The data member containing the actual count is called d_semaphore. It is protected by mutex d_mutex. In addition a condition_variable d_condition is defined:

    mutable std::mutex d_mutex;
    std::condition_variable d_condition;
    size_t d_available;

The waiting process is implemented through its member function wait:

     1: void Semaphore::wait()
     2: {
     3:     std::unique_lock<std::mutex> lk(d_mutex);   // get the lock
     4:     while (d_available == 0)
     5:         d_condition.wait(lk);   // internally releases the lock
     6:                                 // and waits, on exit
     7:                                 // acquires the lock again
     8:     --d_available;              // dec. available
     9: }   // the lock is released

In line 5 d_condition.wait releases the lock. It waits until receiving a notification, and re-acquires the lock just before returning. Consequently, wait's code always has complete and unique control over d_semaphore.

What about notifying the a waiting thread? This is handled in lines 4 and 5 of the member function notify_all:

     1: void Semaphore::notify_all()
     2: {
     3:     std::lock_guard<std::mutex> lk(d_mutex);    // get the lock
     4:     if (d_available++ == 0)
     5:         d_condition.notify_all();   // use notify_one to notify one other
     6:                                     // thread
     7: }   // the lock is released

At line 4 d_semaphore is always incremented; by using a postfix increment it can simultaneously be tested for being zero. If it was initially zero then d_semaphore is now one. A thread waiting until d_semaphore exceeds zero may now continue. A waiting thread is notified by calling d_condition.notify_one. In situations where multiple threads are waiting `notify_all' can also be used.

Using the facilities of the class Semaphore whose constructor expects an initial value of its semaphore data member, the classic consumer-producer paradigm can now be implemented using multi-threading (A more elaborate example of the producer-consumer program is found in the yo/threading/examples/events.cc file in the C++ Annotations's source archive):

    Semaphore available(10);
    Semaphore filled(0);
    std::queue itemQueue;

    void consumer()
    {
        while (true)
        {
            filled.wait();
                // mutex lock the queue with multiple consumers
            size_t item = itemQueue.front();
            itemQueue.pop();
            available.notify_all();
            process(item);      // not implemented here
        }
    }

    void producer()
    {
        size_t item = 0;
        while (true)
        {
            ++item;
            available.wait();
                // mutex lock the queue with multiple consumers
            itemQueue.push(item);
            filled.notify_all();
        }
    }
    int main()
    {
        thread consume(consumer);
        thread produce(producer);

        consume.join();
        produce.join();
    }

20.6: Atomic actions: mutexes not required

Before using the facilities introduced in this section the <atomic> header file must be included.

When data are shared among multiple threads, data corruption is usually prevented using mutexes. To increment a simple int using this strategy code as shown below is commonly used:

    {
        lock_guard<mutex> lk(intVarMutex);
        ++intVar;
    }

The compound statement is used to limit the lock_guard's lifetime, so that intVar is only locked for a short little while.

This scheme is not complex, but at the end of the day having to define a lock_guard for every single use of a simple variable, and having to define a matching mutex for each simple variable is a bit annoying and cumbersome.

C++ offers a way out through the use of atomic data types. Atomic data types are available for all basic types, and also for (trivial) user defined types. Trivial types are (see also section 23.6.2) all scalar types, arrays of elements of a trivial type, and classes whose constructors, copy constructors, and destructors all have default implementations, and their non-static data members are themselves of trivial types.

The class template std::atomic<Type> is available for all built-in types, including pointer types. E.g., std::atomic<bool> defines an atomic bool type. For many types alternative somewhat shorter type names are available. E.g, instead of std::atomic<unsigned short> the type std::atomic_ushort can be used. Refer to the atomic header file for a complete list of alternate names.

If Trivial is a user-defined trivial type then std::atomic<Trivial> defines an atomic variant of Trivial: such a type does not require a separate mutex to synchronize access by multiple threads.

Objects of the class template std::atomic<Type> cannot directly be copied or assigned to each other. However, they can be initialized by values of type Type, and values of type Type can also directly be assigned to std::atomic<Type> objects. Moreover, since atomic<Type> types offer conversion operators returning their Type values, an atomic<Type> objects can also be assigned to or initialized by another atomic<Type> object using a static_cast:

    atomic<int> a1 = 5;
    atomic<int> a2(static_cast<int>(a1));

The class std::atomic<Type> provides several public members, shown below. Non-member (free) functions operating on atomic<Type> objects are also available.

The std::memory_order enumeration defines the following symbolic constants, which are used to specify ordering constraints of atomic operations:

memory_order_acq_rel: the operation must be a read-modify-write operation, combining memory_order_acquire and memory_order_release;
memory_order_acquire: the operation is an acquire operation. It synchronizes with a release operation that wrote the same memory location;
memory_order_consume: the operation is a consume operation on the involved memory location;
memory_order_relaxed: no ordering constraints are provided by the operation;
memory_order_release: the operation is a release operation. It synchronizes with acquire operations on the same location;
memory_order_sec_cst: the default memory order specification for all operations. Memmory storing operations use memory_order_release, memory load operations use memory_order_acquire, and read-modify-write operations use memory_order_acq_rel.

The memory order cannot be specified for the overloaded operators provided by atomic<Type>. Otherwise, most atomic member functions may also be given a final memory_order argument. Where this is not available it is explictly mentioned at the function's description.

Here are the standard available std::atomic<Type> member functions:

bool compare_exchange_strong(Type &currentValue, Type newValue) noexcept:
The value in the atomic object is compared to newValue using byte-wise comparisons. If equal true is returned, and newValue is stored in the atomic object; if unequal false is returned and the object's current value is stored in currentValue;
bool compare_exchange_weak(Type &oldValue, Type newValue) noexcept:
The value in the atomic object is compared to newValue using byte-wise comparisons. If equal true is returned, and newValue is stored in the atomic object; if unequal, or newValue cannot be atomically assigned to the current object false is returned and the object's current value is stored in currentValue;
Type exchange(Type newValue) noexcept:
The object's current value is returned, and newValue is assigned to the current object;
bool is_lock_free() const noexept:
If the operations on the current object can be performed lock-free true is returned, otherwise false. This member has no memory_order parameter;
Type load() const noexcept:
The object's value is returned;
operator Type() const noexcept:
The object's value is returned;
void store(Type newValue) noexcept:
NewValue is assigned to the current object. Note that the standard assignment operator can also be used.

In addition to the above members, integral atomic types `Integral' (essentially the atomic variants of all built-in integral types) also offer the following member functions:

Integral fetch_add(Integral value) noexcept:
Value is added to the object's value, and the object's value at the time of the call is returned;
Integral fetch_sub(Integral value) noexcept:
Value is subtracted from the object's value, and the object's value at the time of the call is returned;
Integral fetch_and(Integral mask) noexcept:
The bit-and operator is applied to the object's value and mask, assigning the resulting value to the currrent object. The object's value at the time of the call is returned;
Integral fetch_|=(Integral mask) noexcept:
The bit-or operator is applied to the object's value and mask, assigning the resulting value to the currrent object. The object's value at the time of the call is returned;
Integral fetch_^=(Integral mask) noexcept:
The bit-xor operator is applied to the object's value and mask, assigning the resulting value to the currrent object. The object's value at the time of the call is returned;
Integral operator++() noexcept:
The prefix increment operator, returning object's new value;
Integral operator++(int) noexcept:
The postfix increment operator, returning the object's value before it was incremented;
Integral operator--() noexcept:
The prefix decrement operator, returning object's new value;
Integral operator--(int) noexcept:
The postfix decrement operator, returning the object's value before it was decremented;
Integral operator+=(Integral value) noexcept:
Value is added to the object's current value and the object's new value is returned;
Integral operator-=(Integral value) noexcept:
Value is subtracted from the object's current value and the object's new value is returned;
Integral operator&=(Integral mask) noexcept:
The bit-and operator is applied to the object's current value and mask, assigning the resulting value to the currrent object. The object's new value is returned;
Integral operator|=(Integral mask) noexcept:
The bit-or operator is applied to the object's current value and mask, assigning the resulting value to the currrent object. The object's new value is returned;
Integral operator^=(Integral mask) noexcept:
The bit-xor operator is applied to the object's current value and mask, assigning the resulting value to the currrent object. The object's new value is returned;

Some of the free member functions have names ending in _explicit. The _explicit functions define an additional parameter `memory_order order', which is not available for the non-_explicit functions (e.g., atomic_load(atomic<Type> *ptr) and atomic_load_explicit(atomic<Type> *ptr, memory_order order))

Here are the free functions that are available for all atomic types:

bool std::atomic_compare_exchange_strong(_explicit)(std::atomic<Type> *ptr, Type *oldValue, Type newValue) noexept:
returns ptr->compare_exchange_strong(*oldValue, newValue);
bool std::atomic_compare_exchange_weak(_explicit)(std::atomic<Type> *ptr, Type *oldValue, Type newValue) noexept:
returns ptr->compare_exchange_weak(*oldValue, newValue);
Type std::atomic_exchange(_explicit)(std::atomic<Type> *ptr, Type newValue) noexept:
returns ptr->exchange(newValue);
void std::atomic_init(std::atomic<Type> *ptr, Type init) noexept:
Stores init non-atomically in *ptr. The object pointed to by ptr must have been default constructed, and as yet no member functions must have been called for it. This function has no memory_order parameter;
bool std::atomic_is_lock_free(std::atomic<Type> const *ptr) noexept:
returns ptr->is_lock_free(). This function has no memory_order parameter;
Type std::atomic_load(_explicit)(std::atomic<Type> *ptr) noexept:
returns ptr->load();
void std::atomic_store(_explicit)(std::atomic<Type> *ptr, Type value) noexept:
calls ptr->store(value).

In addition to the abovementioned free functions atomic<Integral> types also offer the following free member functions:

Integral std::atomic_fetch_add(_explicit)(std::atomic<Integral> *ptr, Integral value) noexcept:
returns ptr->fetch_add(value);
Integral std::atomic_fetch_sub(_explicit)(std::atomic<Integral> *ptr, Integral value) noexcept:
returns ptr->fetch_sub(value);
Integral std::atomic_fetch_and(_explicit)(std::atomic<Integral> *ptr, Integral mask) noexcept:
returns ptr->fetch_and(value);
Integral std::atomic_fetch_or(_explicit)(std::atomic<Integral> *ptr, Integral mask) noexcept:
returns ptr->fetch_or(value);
Integral std::atomic_fetch_xor(_explicit)(std::atomic<Integral> *ptr, Integral mask) noexcept:
returns ptr->fetch_xor(mask).

20.7: An example: threaded quicksort

The quicksort sorting algorithm (Hoare, 1962) is a well-known sorting algorithm. Given an array of n elements, it works like this:

Pick an element from the array, and partition the array with respect to this element (call it the pivot element). This leaves us with two (possibly empty) sub-arrays: one to the left of the pivot element, and one to the right of the pivot element;
Recursively perform quicksort on the left-hand sub-array;
Recursively perform quicksort on the right-hand sub-array.

To convert this algorithm to a multi-threaded algorithm appears to be be a simple task:

    void quicksort(Iterator begin, Iterator end)
    {
        if (end - begin < 2)            // less than 2 elements are left
            return;                     // and we're done

        Iter pivot = partition(begin, end); // determine an iterator pointing
                                            // to the pivot element

        thread lhs(quicksort, begin, pivot);// start threads on the left-hand
                                            // side sub-arrays
        thread rhs(quicksort, pivot + 1, end);  // and on the right-hand side
                                                // sub-arrays
        lhs.join();
        rhs.join();                         // and we're done
    }

Unfortunately, this translation to a multi-threaded approach won't work for reasonably large arrays because of a phenomenon called overpopulation: more threads are started than the operating system is prepared to give us. In those cases a Resource temporarily unavailable exception is thrown, and the program ends.

Overpopulation can be avoided by using a pool of workers, where each `worker' is a thread, which in this case is responsible for handling one (sub) array, but not for the nested calls. The pool of workers is controlled by a scheduler, receiving the requests to sort sub-arrays, and passing these requests on to the next available worker.

The main data structure of the example program developed in this section is a queue of std::pairs containing iterators of the array to be sorted (cf. Figure 22, the sources of the program are found in the annotation()'s yo/threading/examples/multisort directory). Two queues are being used: one queue is a task-queue, receiving the iterators of sub-arrays to be partitioned. Instead of immediately launching new threads (the lhs and rhs threads in the above example), the ranges to be sorted are pushed on the task-queue. The other queue is the work-queue: elements are moved from the task-queue to the work-queue, where they will be processed by one of the worker threads.

Figure 22 is shown here. — Figure 22: Data structure used for multi-threading quicksort

The program's main function starts the workforce, reads the data, pushes the arrays begin and end iterators on the task queue and then starts the scheduler. Once the scheduler ends the sorted array is displayed:

    int main()
    {
        workForce();            // start the worker threads
        readData();             // read the data into vector<int> g_data
        g_taskQ.push(           // prepare the main task
                    Pair(g_data.begin(), g_data.end())
                ); 
        scheduler();            // sort g_data
        display();              // show the sorted elements
    }

The workforce consists of a bunch of detached threads. Each thread represents a worker, implemented in the function void worker. Since the number of worker threads is fixed, overpopulation doesn't occur. Once the array has been sorted and the program stops these detached threads simply end:

    for (size_t idx = 0; idx != g_sizeofWorkforce; ++idx)
        thread(worker).detach();

The scheduler continues for as long as there are sub-arrays to sort. When this is the case the task queue's front element is moved to the work queue. This reduces the work queue's size, and prepares an assignment for the next available worker. The scheduler now waits until a worker is available. Once workers are available one of them is informed of the waiting assignment, and the scheduler waits for the next task:

    void scheduler()
    {
        while (newTask())
        {
            g_workQ.rawPushFront(g_taskQ);

            g_workforce.wait();           // wait for a worker to be available
            g_worker.notify_all();            // activate a worker
        }
    }

The function newTask simply checks whether the task queue is empty. If so, and none of the workers is currently busy sorting a sub-array then the array has been sorted, and newTask can return false. When the task queue is empty but a worker is still busy, it may be that new sub-array dimensions are going to be placed on the task queue by an active worker. Whenever a worker is active the Semaphore g_workforce's size is less than the size of the work force:

    bool wip()
    {
        return g_workforce.size() != g_sizeofWorkforce;
    }

    bool newTask()
    {
        bool done;

        unique_lock<mutex> lk(g_taskMutex);
        while ((done = g_taskQ.empty()) && wip())
            g_taskCondition.wait(lk);

        return not done;
    }

Each detached worker thread performs a continuous loop. In the loop it waits for a notification by the scheduler. Once it receives a notification it retrieves its assignment from the work queue, and partitions the sub-array specified in its assignment. Partitioning may result in new tasks. Once this has been completed the worker has completed its assignment: it increments the available workforce and notifies the scheduler that it should check whether all tasks have been performed:

    void worker()
    {
        while (true)
        {
            g_worker.wait();      // wait for action

            partition(g_workQ.popFront());
            g_workforce.notify_all();

            lock_guard<mutex> lk(g_taskMutex);
            g_taskCondition.notify_one();
        }
    }

Sub-arrays smaller than two elements need no partitioning. All larger sub-arrays are partitioned relative to their first element. The std::partition generic algorithm does this well, but if the pivot is itself an element of the array to partition then the pivot's eventual location is undetermined: it may be found anywhere in the series of elements which are at least equal to the pivot. The two required sub-arrays, however, can easily be constructed:

First call std::partition relative to an array's first element, partitioning the array's remaining elements, returning mid, pointing to the first element of the series of elements that are at least as large as the array's first element;
Then swap the array's first element with element to which mid - 1 points;
The two sub-arrays range from, respectively, array.begin() to mid - 1 (elements all smaller than the pivot), and from mid to array.end() (elements all at least as large as the pivot).

The two iterator pairs defining these two sub-arrays are thereupon added to the task queue, creating two new tasks to be dealt with by the scheduler:

    void partition(Pair const &range)
    {
        if (range.second - range.first < 2)
            return;

        auto rhsBegin = partition(range.first + 1, range.second,
                                bind2nd(less<int>(), *range.first));
        auto lhsEnd = rhsBegin - 1;

        swap(*range.first, *lhsEnd);

        pushTask(range.first, lhsEnd);
        pushTask(rhsBegin, range.second);
    }

20.8: Shared States

Just before a thread ends it may have produced some results. These results may have to to be communicated to other threads. In multi threaded programs several classes and functions can be used that produce shared states, making it easy to communicate results to other threads. Results could be values, objects or exceptions.

Objects that contain such shared states are called asynchronous return objects. However, due to the nature of multi threading, a thread may request the results of an asynchronous return object before these result are actually available. In those cases the requesting thread blocks, waiting for the results to become available. Asynchronous return objects offer wait and get members which, respectively, wait until the results have become available, and produce the asynchronous results once they are available. The phrase that is used to indicate that the results are available is `the shared state has been made ready'.

Shared states are made ready by asynchronous providers. Asynchronous providers are simply objects or functions providing results to shared states. Making a shared state ready means that an asynchronous provider

marks its shared state as being ready, and
unblocks any waiting threads (e.g., by allowing blocking members, like wait, to return).

Once a shared state has been made ready it contains a value, object, or exception which can be retrieved by objects having access to the shared state. While code is waiting for a shared state to become ready the value or exception that is going to be stored in the shared state may be computed. When multiple threads try to access the same shared state they must use synchronizing mechanisms (like mutexes, cf. section 20.3) to prevent access-conflicts.

Shared states use reference counting to keep track of the number of asynchronous return objects or asynchronous providers that hold references to them. These return objects and providers may release their references to these shared states (which is called `releasing the shared state). This happens when a return object or provider holds the last reference to the shared state, and the shared state is destroyed.

On the other hand, an asynchronous provider may also abandon its shared state. In that case the provider, in sequence,

stores an exception object of type std::future_error, holding the error condition std::broken_promise in its shared state;
makes its shared data ready; and
releases its shared data.

Objects of the class std::future (see the next section) are asynchronous return objects. They can be produced by the std::async (section 20.11) family of functions, and by objects of the classes std::packaged_task (section 20.12), and std::promise (section 20.13).

20.9: Asynchronous return objects: std::future

Condition variables allow threads to wait until data have obtained certain values. A thread may also have to wait until a sub-thread has finished when calling a sub-thread's join member.

Waiting may be unwelcome: instead of just waiting our thread might also be doing something useful. It might as well pick up the results produced by a sub-thread at some point in the future.

In fact, exchanging data among threads always poses some difficulties, as it requires shared variables, and the use of locks and mutexes to prevent data corruption. Rather than waiting and using locks it would be nice if some asynchronous task could be started, allowing the initiating thread (or even other threads) to pick up the result at some point in the future, when the results are needed, without having to worry about data locks or waiting times. For situations like these C++ provides the class std::future.

Before using the class std::future the <future> header file must be included.

Objects of the class template std::future harbor the results produced by asynchronously executed tasks. The class std::future is a class template. Its template type parameter specifies the type of the result returned by the asynchronously executed task. This type may be void.

On the other hand, the asynchronously executed task may throw an exception (ending the task). In that case the future object catches the exception, and rethrows it once its return value (i.e., the value returned by the asynchronously executed task) is requested.

In this section the members of the class template future are described. Future objects are commonly initialized through anonymous future objects returned by the factory function std::async or by the get_future members of the classes std::promise, and std::packaged_task (introduced in upcoming sections). Examples of the use of std::future objects are provided in those sections.

Some of future's members return a value of the strongly typed enumeration std::future_status. This enumeration defines three symbolic constants: future_status::ready, future_status::timeout, and future_status::deferred.

Error conditions are returned through std::future_error exceptions. These error conditions are represented by the values of the strongly typed enumeration std::future_errc (covered in the next section).

The class future itself provides the following constructors:

future():
The default constructor constructs an future object that does not refer to shared results. Its valid member returns false.
future(future &&tmp) noexcept:
The move constructor is available. Its valid member returns what tmp.valid() would haved returned prior to the constructor invocation. After calling the move constructor tmp.valid() returns false.

The class future does not offer a copy constructor or an overloaded assignment operator.

Here are the members of the class std::future:

future &operator=(future &&tmp):
The move assignment operator grabs the information from the tmp object; following this, tmp.valid() returns false.
std::shared_future<ResultType> share() &&:
Returns a std::shared_future<ResultType> (see section 20.10). After calling this function, the future's valid member returns false.
ResultType get():
First wait (see below) is called. Once wait has returned the results produced by the associated asynchronoust task are returned. With future<Type> specifications the returned value is the moved shared value if Type supports move assignment, otherwise a copy is returned. With future<Type &> specifications a Type & is returned, with future<void> specifications nothing is returned. If the shared value is an exception, it is thrown instead of returned. After calling this member the future object's valid member returns false.
bool valid() const:
Returns true if the (future) object for which valid is called refers to an object returned by an asynchronous task. If valid returns false, the future object exists, but in addition to valid only its destructor and move constructor can safely be called. When other members are called while valid returns false a std::future_error exception is thrown (having the value future_errc::no_state).
void wait() const:
The thread is blocked until the results produced by the associated asynchronous task are available.
std::future_status wait_for(chrono::duration<Rep, Period> const &rel_time) const:
This member template derives the template types Rep and Period from the actually specified duration (cf. section 20.1.2). If the results contain a deferred function nothing happens. Otherwise wait_for blocks until the results are available or until the amount of time specified by rel_time has expired. Possible return values are:
- future_status::deferred if the results contains a deferred function;
- future_status::ready if the results are avaiable;
- future_status::timeout if the function is returning because the amount of time specified by rel_time has expired.
future_status wait_until(chrono::time_point<Clock, Duration> const &abs_time) const:
This member template derives the template types Clock and Duration from the actually specified abs_time (cf. section 20.1.4). If the results contain a deferred function nothing happens. Otherwise wait_until blocks until the results are available or until the point in time specified by abs_time has expired. Possible return values are:
- future_status::deferred if the results contain a deferred function;
- future_status::ready if the results are available;
- future_status::timeout if the function is returning because the point in time specified by abs_time has expired.

The class std::future<ResultType> declares the following friends:

    std::promise<ResultType>

(sf. section 20.13), and

    template<typename Function, typename... Args>
        std::future<typename result_of<Function(Args...)>::type> 
        std::async(std::launch, Function &&fun, Args &&...args);

(cf. section 20.11).

20.9.1: The std::future_error exception and the std::future_errc enum

Members of the class std::future may return errors by throwing std::future_error exceptions. These error conditions are represented by the values of the strongly typed enumeration std::future_errc which defines the following symbolic constants:

broken_promise
Broken_promise is thrown when a future object was received whose value was never assigned by a promise or packaged_task. For example, an object of the class promise<int> should set the value of the future<int> object returned by its get_future member (cf. section 20.13), but if it doesn't do so, then a broken_promise exception is thrown, as illustrated by the following program:
```
 1: std::future<int> fun()
 2: {
 3:     return std::promise<int>().get_future();
 4: }
 5:
 6: int main()
 7: try
 8: {
 9:     fun().get();
10: }
11: catch (std::exception const &exc)
12: {
13:     std::cerr << exc.what() << '\n';
14: }
```
At line 3 a promise object is created, but its value is never set. Consequently, it `breaks its promise' to produce a value: when main tries to retrieve its value (in line 9) a std::futue_error exception is thrown containing the the future_errc::broken_promise value
future_already_retrieved
Future_already_retrieved is thrown when multiple attempts are made to retrieve the future object from, e.g., a promise or packaged_task object that (eventually) should be ready. For example:
```
 1: int main()
 2: {
 3:     std::promise<int> promise;
 4:     promise.get_future();
 5:     promise.get_future();
 6: }
```
Note that after defining the std::promise object in line 3 it has merely been defined: no value is ever assigned to its future. Even though no value is assigned to the future object, it is a valid object. I.e., after some time the future should be ready, and the future's get member should produce a value. Hence, line 4 succeeds, but then, in line 5, the exception is thrown as `the future has already been retrieved'.
promise_already_satisfied
Promise_already_satisfied is thrown when multiple attempts are made to assign a value to a promise object. Assigning a value or exception_ptr to the future of a promise object may happen only once. For example:
```
 1: int main()
 2: {
 3:     std::promise<int> promise;
 4:     promise.set_value(15);
 5:     promise.set_value(155);
 6: }
```
no_state
No_state is thrown when a member function (other than valid, see below) of a future object is called when its valid member returns false. This happens, e.g., when calling members of a default constructed future object. No_state is not thrown for future objects returned by the async factory function or returned by the get_future members of promise or packaged_task type of objects. Here is an example:
```
 1: int main()
 2: {
 3:     std::future<int> fut;
 4:     fut.get();
 5: }
```

The class std::future_error is derived from the class std::exception, and offers, in addition to the char const *what() const member also the member std::error_code const &code() const, returning an std::error_code object associated with the thrown exception.

20.10: Shared asynchronous return objects: std::shared_future

When a thread activates an asynchronous provider (e.g., a std::async) then the return value of the asynchronously called function becomes available in its activating thread through a std::future object. The future object cannot be used by another thread. If this is required (e.g., see this chapter's final section) the future object must be converted to a std::shared_future object.

Before using the class std::shared_future the <future> header file must be included.

Once a shared_future object is available, its get member (see below) can repeatedly be called to retrieve the results of the original future object. This is illustrated by the next small example:

     1: int main()
     2: {
     3:     std::promise<int> promise;
     4:     promise.set_value(15);
     5:
     6:     auto fut = promise.get_future();
     7:     auto shared1 = fut.share();
     8:
     9:     std::cerr << "Result: " << shared1.get() << '\n';
    10:               << "Result: " << shared1.get() << '\n';
    11:               << "Valid: " << fut.valid() << '\n';
    12:
    13:     auto shared2 = fut.share();
    14:
    15:     std::cerr << "Result: " << shared2.get() << '\n';
    16:               << "Result: " << shared2.get() << '\n';
    17: }

In lines 9 and 10 the promise's results are retrieved multiple times, but having obtained the shared_future in line 7, the original future object no longer has an associated shared state. Therefore, when another attempt is made (in line 13) to obtain the shared_future, a no associated state exception is thrown and the program aborts.

However, multiple copies of shared_future objects may co-exist. When multiple copies of shared_future objects exist (e.g. in different threads), the results of the associated asynchronous task are made ready (become available) at exactly the same moment in time.

The relationship between the classes future and shared_future resembles the relationship between the classes unique_ptr and shared_ptr: there can only be one instance of a unique_pointer, pointing to data, whereas there can be many instances of a shared_pointer, each pointing to the same data.

The effect of calling any member of a shared_future object for which valid() == false other than the destructor, the move-assignment operator, or valid is undefined.

The class shared_future supports the following constructors:

shared_future() noexcept
an empty shared_future object is constructed that does not refer to shared results. After using this constructor the object's valid member returns false.
shared_future(shared_future const &other)
a shared_future object is constructed that refers to the same results as other (if any). After using this constructor the object's valid member returns the same value as other.valid().
shared_future(shared_future<Result> &&tmp) noexcept
Effects: move constructs a shared_future object that refers to the results that were originally referred to by tmp (if any). After using this constructor the object's valid member returns the same value as tmp.valid() returned prior to the constructor invocation, and tmp.valid() returns false.
shared_future(future<Result> &&tmp) noexcept
Effects: move constructs a shared_future object that refers to the results that were originally referred to by tmp (if any). After using this constructor the object's valid member returns the same value as tmp.valid() returned prior to the constructor invocation, and tmp.valid() returns false.

The class's destructor destroys the shared_future object for which it is called. If the object for which the destructor is called is the last shared_future object, and no std::promise or std::packaged_task is associated with the results associated with the current object, then the results are also destroyed.

Here are the members of the class std::shared_future:

shared_future& operator=(shared_future &&tmp):
The move assignment operator releases the current opject's shared results, and move assigns tmp's results to the current object. After calling the move assignment operator the current object's valid member returns the same value as tmp.valid() returned prior to the invocation of the move assignment operator, and tmp.valid() returns false;
shared_future& operator=(shared_future const &rhs):
The assignment operator releases the current opject's shared results, and rhs's results are shared with the current object. After calling the assignment operator the current object's valid member returns the same value as tmp.valid();
Result const &shared_future::get() const:
(Specializations for shared_future<Result &> and shared_future<void> are also available). This member waits until the shared results are available, and subsequently returns Result const &. Note that access to the data stored in Results, accessed through get is not synchronized. It is the responsibility of the programmer to avoid race conditions when accessing Result's data. If Result holds an exception, it is thrown when get is called;
bool valid() const:
Returns true if the current object refers to shared results;
void wait() const:
Blocks until shared results are available (i.e., the associated asynchronous task has produced results);
future_status wait_for(const chrono::duration<Rep, Period>& rel_time) const:
(The template types Rep and Period normally are derived by the compiler from the actual rel_time specification.) If the shared results contain a deferred function (cf. section 20.11) nothing happens. Otherwise wait_for blocks until the results of the associated asynchronous task has produced results, or until the relative time specified by rel_time has expired. The member returns
- future_status::deferred if the shared results contain a deferred function;
- future_status::ready if the shared results are available;
- future_status::timeout if the function is returning because the amount of time specified by rel_time has expired;
future_status wait_until(const chrono::time_point<Clock, Duration>& abs_time) const:
(The template types Clock and Duration normally are derived by the compiler from the actual abs_time specification.) If the shared results contain a deferred function nothing happens. Otherwise wait_until blocks until the shared results are available or until the point in time specified by abs_time has expired. Possible return values are:
- future_status::deferred if the shared results contain a deferred function;
- future_status::ready if the shared results are available;
- future_status::timeout if the function is returning because the point in time specified by abs_time has expired.

20.11: Starting a new thread: std::async

In this section the function template std::async is covered. Async is used to start asynchronous tasks, returning values (or void) to the calling thread, which is hard to realize merely using the std::thread class.

Before using the function async the <future> header file must be included.

When starting a thread using the facilities of the class std::thread the initiating thread at some point commonly calls the thread's join method. At that point the thread must have finished or execution blocks until join returns. While this often is a sensible course of action, it may not always be: maybe the function implementing the thread has a return value, or it could throw an exception.

In those cases join cannot be used: if an exception leaves a thread, then your program ends. Here is an example:

     1: void thrower()
     2: {
     3:     throw std::exception();
     4: }
     5:
     6: int main()
     7: try
     8: {
     9:    std::thread subThread(thrower);
    10: }
    11: catch (...)
    12: {
    13:     std::cerr << "Caught exception\n";
    14: }

In line 3 thrower throws an exception, leaving the thread. This exception is not caught by main's try-block (as it is defined in another thread). As a consequence, the program terminates.

This scenario doesn't occur when std::async is used. Async may start a new asynchronous task, and the activating thread may retrieve the return value of the function implementing the asynchronous task or any exception leaving that function from a std::future object returned by the async function. Basically, async is called similarly to the way a thread is started using std::thread: it is passed a function and optionally arguments which are forwarded to the function.

Although the function implementing the asynchronous task may be passed as first argument, async first argument may also be a value of the strongly typed enumeration std::launch:

    enum class launch
    {
        async,
        deferred
    };

When passing launch::async the asynchronous task immediately starts; when passing launch::deferred the asynchronous task is deferred. When std::launch is not specified the default value

launch::async |
launch::deferred

is used, giving the implementation freedom of choice, usually resulting in deferring execution of the asynchronous task.

So, here is the first example again, this time using async to start the sub-thread:

     1: bool fun()
     2: {
     3:     return std::cerr << "    hello from fun\n";
     4: }
     5: int exceptionalFun()
     6: {
     7:     throw std::exception();
     8: }
     9:
    10: int main()
    11: try
    12: {
    13:     auto fut1 = std::async(std::launch::async, fun);
    14:     auto fut2 = std::async(std::launch::async, exceptionalFun);
    15:
    16:     std::cerr << "fun returned " << std::boolalpha << fut1.get() << '\n';
    17:     std::cerr << "exceptionalFun did not return " << fut2.get() << '\n';
    18: }
    19: catch (...)
    20: {
    21:     std::cerr << "caught exception thrown by exceptionalFun\n";
    22: }

Now the threads immediately start, but although the results are available around line 13, the thrown exception isn't terminating the program. The first thread's return value is made available in line 16, the exception thrown by the second thread is simply caught by main's try-block (line 19).

The function template async has several overloaded versions:

The basic form expects a function or functor as its first argument, returning a std::future holding the function's return value or exception thrown by the function:

    template <typename Function, class ...Args>
    std::future<
        typename std::result_of< Function(Args ...) >::type
    > std::async(Function &&fun, Args &&...args);

Alternatively, the first argument may be the address of a member function. In that case the (required) second argument is an object (or a pointer to an object) of that member function's class. Any remaining arguments are passed to the member function (see also the remarks below).

The first argument may also be a combination (using the bit_or operator) of the enumeration values of the std::launch enumeration:

    template <class Function, class ...Args>
    std::future<typename std::result_of<Function(Args ...)>::type> 
        std::async(std::launch policy, Function &&fun, Args &&...args);

If the first argument specifies std::launch values, the second argument may also be the address of a member function. In that case the (required) thirs argument is an object (or a pointer to an object) of that member function's class. Any remaining arguments are passed to the member function (see also the remarks below).

When calling async all arguments except for the std::launch argument must be references, pointers or move-constructible objects:

When a member function is specified, then the object for which the member function is called must be a named object, an anonymous object, or a pointer to a named object.
When a named object is passed to the async function template then copy construction is used to construct a copy of the argument which is then forwarded to the thread-launcher.
When an anonymous object is passed to the async function template then move construction is used to forward the anonymous object to the thread launcher.

Once the thread itself starts another move construction is used to construct an object for the duration of the thread. When a pointer to an object is passed, the sub-thread uses the object referred to by the pointer, and neither copy- nor move-construction is required. However, when using a pointer to an object the programmer should make sure that the object's lifetime exceeds the duration of the thread (note that this is not automatically guaranteed, as the asynchronous task may not actually start before the future's get member is called).

Because of the default std::launch::deferred | std::launch::async argument used by the basic async call it is likely that the function which is passed to async doesn't immediately start. The launch::deferred policy allows the implementor to defer its execution until the program explicitly asks for the function's results. Consider the following program:

     1: void fun()
     2: {
     3:     std::cerr << "    hello from fun\n";
     4: }
     5:
     6: std::future<void> asyncCall(char const *label)
     7: {
     8:     std::cerr << label << " async call starts\n";
     9:     auto ret = std::async(fun);
    10:     std::cerr << label << " async call ends\n";
    11:     return ret;
    12: }
    13:
    14: int main()
    15: {
    16:     asyncCall("First");
    17:     asyncCall("Second");
    18: }

Although async is called in line 9, the program's output may not show fun's output line when it is run. This is a result of the (default) use of lauch::deferred: the system simply defers fun's execution until requested, which doesn't happen. But the future object that's returned by async has a member wait. Once wait returns the shred state must be available. In other words: fun must have finished. Here is what happens when after line 9 the line ret.wait() is inserted:

    First async call starts
        hello from fun
    First async call ends
    Second async call starts
        hello from fun
    Second async call ends

Actually, evaluation of fun can be requested at the point where we need its results, maybe even after calling asyncCall, as shown in the next example:

     1: int main()
     2: {
     3:     auto ret1 = asyncCall("First");
     4:     auto ret2 = asyncCall("Second");
     5:
     6:     ret1.get();
     7:     ret2.get();
     8: }

Here the ret1 and ret2 std::future objects are created, but their fun functions aren't evaluated yet. Evaluation occurs at lines 6 and 7, resulting in the following output:

    First async call starts
    First async call ends
    Second async call starts
    Second async call ends
        hello from fun
        hello from fun

The std::async function template is used to start a thread, making its results available to the calling thread. On the other hand, we may only be able to prepare (package) a task (a thread), but may have to leave the completion of the task to another thread. Scenarios like this are realized through objects of the class std::package_task, which is the topic of the next section.

20.12: Preparing a task for execution: std::packaged_task

The class template std::packaged_task allows a program to `package' a function or functor and pass the package to a thread for further processing. The processing thread then calls the packaged function, passing it its arguments (if any). After completing the function the packaged_task's future is ready, allowing the program to retrieve the results produced by the function. Thus, functions and the results of function calls can be transferred between threads.

Before using the class template packaged_task the <future> header file must be included.

Before describing the class's interface, let's first look at an example to get an idea about how a packaged_task can be used. Remember that the essence of packaged_task is that part of your program prepares (packages) a task for another thread to complete, and that the program at some point needs the result of the completed task.

To clarify what's happening here, let's first look at a real-life analogon. Every now and then I make an appointment with my garage to have my car serviced. The `package' in this case are the details about my car: its make and type determine the kind of actions my garage performs when servicing it. My neighbor also has a car, which also needs to be serviced every now and then. This also results in a `package' for the garage. At the appropriate time me and my neighbor take our cars to the garage (i.e., the packages are passed to another thread). The garage services the cars (i.e., calls the functions stored in the packaged_tasks [note that the tasks differ, depending on the types of the cars]), and performs some actions that are associated with it (e.g., registering that my or my neighbor's car has been serviced, or order replacement parts). In the meantime my neighbor and I perform our own businesses (the program continues while a separate thread runs as well). But by the end of the day we'd like to use our cars again (i.e., get the results associated with the packaged_task). A common result in this example is the garage's bill, which we have to pay (the program obtains the packaged_task's results).

Here is a little C++ program illustrating the use of a packaged_task (assuming the required headers and using namespace std have been specified):

     1: mutex carDetailsMutex;
     2: condition_variable condition;
     3: string carDetails;
     4: packaged_task<size_t (std::string const &)> serviceTask;
     5:
     6: size_t volkswagen(string const &type)
     7: {
     8:     cout << "performing maintenance by the book for a " << type << '\n';
     9:     return type.size() * 75;            // the size of the bill
    10: }
    11:
    12: size_t peugeot(string const &type)
    13: {
    14:     cout << "performing quick and dirty maintenance for a " << type << '\n';
    15:     return type.size() * 50;             // the size of the bill
    16: }
    17:
    18: void garage()
    19: {
    20:     while (true)
    21:     {
    22:         unique_lock<mutex> lk(carDetailsMutex);
    23:         while (carDetails.empty())
    24:             condition.wait(lk);
    25:
    26:         cout << "servicing a " << carDetails << '\n';
    27:         serviceTask(carDetails);
    28:         carDetails.clear();
    29:     }
    30: }
    31:
    32: int main()
    33: {
    34:     thread(garage).detach();
    35:
    36:     while (true)
    37:     {
    38:         string car;
    39:         if (not getline(cin, car) || car.empty())
    40:             break;
    41:         {
    42:             lock_guard<mutex> lk(carDetailsMutex);
    43:             carDetails = car;
    44:         }
    45:         serviceTask =  packaged_task<size_t (string const &)>(
    46:                     car[0] == 'v' ? volkswagen : peugeot
    47:                 );
    48:         auto bill = serviceTask.get_future();
    49:         condition.notify_one();
    50:         cout << "Bill for servicing a " << car <<
    51:                                 ": EUR " << bill.get() << '\n';
    52:     }
    53: }

Lines 1-3 define the variables used for synchronization;
Line 4 defines a packaged_task: serviceTask is initialized with a function (or functor) expecting a string, returning a size_t;
Lines 6-10 and 12-16 define such functions: volkswagen and peugeot represent the tasks to perform when cars of the provided types come in for service; presumably they return the bill.
Lines 18-30 define the function void garage, defining the actions performed by the garage when cars come in for service. These actions are performed by a separate detached thread, starting in line 34. In a continuous loop it waits until it obtains a lock on the carDetailsMutex and carDetails is no longer empty. Then, at line 27, it passes carDetails to the packaged_task `serviceTask'. By itself this is not identical to calling the packaged_task's function, but eventually its function will be called. At this point the packaged_task receives its function's arguments, which it eventually will forward to its configured function. Finally, at line 28 it clears carDetails, thus preparing itself for the next request.
Lines 32-53 define main:
- First, at line 34 the anonymous detached thread running garage is started.
Then the program's main loop starts (lines 36-52):
- The main thread reads commands from the standard input until an empty or no line is received (lines 38-40).
- By convention the line's first letter starts the car's brand (volkswagen or peugeot), and the packaged_task, provided with the right servicing function, is constructed next (line 45).
- Then, at line 48 the results, stored in a future, are retrieved. Although at this point the future might not be ready, the future object itself is, and it is simply returned as the bill.
- Now we're ready to inform the garage that it can service a car: the garage is notified in line 49.
Anything may happen next: the program may perform any actions, but eventually it requests the results produced by the garage.
- The main thread obtains the results by calling bill.get() in line 51. If, by this time, the car is still being serviced, the bill isn't ready yet, and bill.get() blocks until it is, and the bill for servicing a car is shown.

Now that we've seen an example of a program using a packaged_task, let's have a look at its interface. Note that the class packaged_task is a class template: its template type parameter specifies the prototype of a function or function object implementing the task performed by the packaged_task object.

Constructors and destructor:

packaged_task() noexcept:
The default constructor constructs a packaged_task object which is not associated with a function or shared state;
explicit packaged_task(ReturnType(Args...) &&function):
A packaged_task is constructed for a function or functor expecting arguments of types Args..., and returning a value of type ReturnType. The packaged_task class template specifies ReturnType (Args...) as its template type parameter. The constructed object contains a shared state, and a (move constructed) copy of function.
Optionally an Allocator may be specified as second template type parameter, in which case the first two arguments are std::allocator_arg_t, Allocator const &alloc. The type std::allocator_arg_t is a type introduced to disambiguate constructor selections, and can simply be specified as std::allocator_arg_t().
This constructor may throw a std::bad_alloc exception or exceptions thrown by function's copy or move constructors
packaged_task(packaged_task &&tmp) noexcept:
The move constructor moves any existing shared state from tmp to the newly constructed object, removing the shared state from tmp.
~packaged_task():
The object's shared state (if any) is abandoned

Member functions:

future<ReturnType> get_future():
A std::future object haring the current object's shared state is returned. A future_error exception is thrown upon error, containing
- future_already_retrieved if get_future was already called on a packaged_task object containing the same shared state as the current object;
- no_state if the current object has no shared state.
Note: Any futures that share the object's shared state may access the result returned by the object's task.
void make_ready_at_thread_exit(Args... args):
Calls void operator()(Args... args) (see below) when the current thread exits, once all objects of thread storage duration associated with the current thread have been destroyed.
packaged_task &operator=(packaged_task &&tmp):
The move assignment operator first releases the current object's shared state (if available), after which the current object and tmp are swapped;
void operator()(Args... args):
The args arguments are forwarded to the current object's stored task. When the stored task returns its return value is stored in the current object's shared state. Otherwise any exception thrown by the task is stored in the object's shared state. Following this the object's shared state is made ready, and any threads blocked in a function waiting for the object's shared state to become ready are unblocked. A future_error exception is thrown upon error, containing
- promise_already_satisfied if the shared state has already been made ready;
- no_state if the current object does not have any shared state.
Calling this member synchronizes with calling any member function of a (shared_)future object that provides access to the packaged_task's results.
void reset():
Abandons any available shared state, initializing the current object to packaged_task(std::move(funct)), where funct is the object's stored task. This member may throw the following exceptions:
- bad_alloc if memory for the new shared state could not be allocated;
- any exception thrown by the move constructor of the task stored in the shared state;
- future_error with a no_state error condition if the current object contains no shared state.
void swap(packaged_task &other) noexcept:
The shared states and stored tasks of the current object and other are swapped.
bool valid() const noexcept:
Returns true if the current object contains a shared state, otherwise false is returned;

The following non-member (free) function operating on packaged_task objects is available:

void swap(packaged_task<ReturnType(Args...)> &lhs, packaged_task<ReturnType(Args...)> &rhs) noexcept
Calls lhs.swap(rhs)

20.13: The class `std::promise'

In addition to std::package_task and std::async the class template std::promise can be used to obtain the results from a separate thread.

Before using the class template promise the <future> header file must be included.

A promise is useful to obtain the results from another thread without further synchronization requirements. Consider the following program:

    void compute(int *ret)
    {
        *ret = 9;
    }

    int main()
    {
        int ret = 0;
        std::thread(compute2, &ret).detach();
        cout << ret << '\n';
    }

Chances are that this program shows the value 0: the cout statement is already executed before the detached thread has had a chance to complete its work. In this example that problem can easily be solved by using a non-detached thread, and using the thread's join member, but when multiple threads are used that requires named threads and as many join calls. Instead, using a promise might be preferred:

     1: void compute1(promise<int> &ref)
     2: {
     3:     ref.set_value(9);
     4: }
     5:
     6: int main()
     7: {
     8:     std::promise<int> p;
     9:     std::thread(compute, ref(p)).detach();
    10:
    11:     cout << p.get_future().get() << '\n';
    12: }

In this example again a detached thread is used, but its results are kept for future reference in a promise object, instead of directly being assigned to a final destination variable. The promise object contains a future object holding the computed value. The future's get member blocks until the future has been made ready, at which point the result becomes available. By then the detached thread may or may not yet have been completed. If it already completed its work then get immediately returns, otherwise there will be a slight delay.

Promises can be useful when implementing a multi threaded version of some algorithm without having to use additional synchronization statements. As an example consider matrix multiplications. Each element of the resulting product matrix is computed as the inner product of two vectors: the inner product of a row of the left-hand matrix operand and a column of the right-hand matrix operand becomes element [row][column] of the resulting matrix. Since each element of the resulting matrix can independently be computed from the other elements, a multi threaded implementation is well possible. In the following example the function innerProduct (lines 4..11) leaves its result in a promise object:

     1: int m1[2][2] = {{1, 2}, {3, 4}};
     2: int m2[2][2] = {{3, 4}, {5, 6}};
     3:
     4: void innerProduct(promise<int> &ref, int row, int col)
     5: {
     6:     int sum = 0;
     7:     for (int idx = 0; idx != 2; ++idx)
     8:         sum += m1[row][idx] * m2[idx][col];
     9:
    10:     ref.set_value(sum);
    11: }
    12:
    13: int main()
    14: {
    15:     promise<int> result[2][2];
    16:
    17:     for (int row = 0; row != 2; ++row)
    18:     {
    19:         for (int col = 0; col != 2; ++col)
    20:             thread(innerProduct, ref(result[row][col]), row, col).detach();
    21:     }
    22:
    23:     for (int row = 0; row != 2; ++row)
    24:     {
    25:         for (int col = 0; col != 2; ++col)
    26:             cout << setw(3) << result[row][col].get_future().get();
    27:         cout << '\n';
    28:     }
    29: }

Each inner product is computed by a separate (anonymous and detached) thread (lines 17..21), which starts as soon as the run-time system allows it to start. By the time the threads have finished the resulting inner products can be retrieved from the promises' futures. Since futures' get members block until their results are actually available, the resulting matrix can simply be displayed by calling those members in sequence (lines 23..28).

So, a promise allows us to use a thread to compute a value (or exception, see below), which value may then be collected by another thread at some future point in time. The promise remains available, and as a consequence further synchronization of the threads and the program starting the threads is not necessary. When the promise object contains an exception, rather than a value, its future's get member rethrows the stored exception.

Here is the class promise's interface. Note that the class promise is a class template: its template type parameter ReturnType specifies the template type parameter of the std::future that can be retrieved from it.

Constructors and destructor:

promise():
The default constructor constructs a promise object containing a shared state. The shared state may be returned by the member get_future (see below), but that future has not yet been made ready;
promise(promise &&tmp) noexcept:
The move constructor constructs a promise object, transferring the ownership of tmp's shared state to the newly constructed object. After the object has been constructed, tmp no longer contains a shared state;
~promise():
The object's shared state (if any) is abandoned;

Member functions:

std::future<ReturnType> get_future():
A std::future object sharing the current object's shared state is returned. A future_error exception is thrown upon error, containing
- future_already_retrieved if get_future was already called on a packaged_task object containing the same shared state as the current object;
- no_state if the current object has no shared state.
Note: Any futures that share the object's shared state may access the result returned by the object's task;
promise &operator=(promise &&rhs) noexcept:
The move assignment operator first releases the current object's shared state (if available), after which the current object and tmp are swapped;
void promise<void>::set_value():
See below, at the last set_value member's description;
void set_value(ReturnType &&value):
See below, at the last set_value member's description;
void set_value(ReturnType const &value):
See the next member function's description;
void set_value(ReturnType &value):
The argument (value) is atomically stored in the shared state, which is then also made ready. A future_error exception is thrown upon error, containing
- promise_already_satisfied if the shared state has already been made ready;
- no_state if the current object does not have any shared state.
Alternatively, any exception thrown by value's move or copy constructor may be thrown;
void set_exception(std::exception_ptr obj):
Exception_ptr obj (cf. section 20.13.1) is atomically stored in the shared state, making that state ready. A future_error exception is thrown upon error, containing
- promise_already_satisfied if the shared state has already been made ready;
- no_state if the current object does not have any shared state;
void set_exception_at_thread_exit(exception_ptr ptr):
The exception pointer ptr is stored in the shared state without immediately making that state ready. The state becomes ready when the current thread exits, once all objects of thread storage duration which are associated with the ending thread have been destroyed. A future_error exception is thrown upon error, containing
- promise_already_satisfied if the shared state has already been made ready;
- no_state if the current object does not have any shared state;
void set_value_at_thread_exit():
See below, at the last set_value_at_thread_exit member's description;
void set_value_at_thread_exit(ReturnType &&value):
See below, at the last set_value_at_thread_exit member's description;
void set_value_at_thread_exit(ReturnType const &value):
See the next set_value_at_thread_exit member's description;
void set_value_at_thread_exit(ReturnType &value):
Stores value in the shared state without immediately making that state ready. The state becomes ready when the current thread exits, once all objects of thread storage duration which are associated with the ending thread have been destroyed. A future_error exception is thrown upon error, containing
- promise_already_satisfied if the shared state has already been made ready;
- no_state if the current object does not have any shared state;
void swap(promise& other) noexcept:
The shared states (if any) of the current object and other are exchanged.

The following non-member (free) function operating on promise objects is available:

void swap(promise<ReturnType> &lhs, promise<ReturnType> &rhs) noexcept:
Calls lhs.swap(rhs)

20.13.1: Exception propagation: std::exception_ptr

Std::promise's member set_exception does not expect an std::exception argument, but an object of the class std::exception_ptr. In this section we take a closer look at the class exception_ptr.

Before using the class exception_ptr the <future> header file must be included.

The class an exception_ptr's default constructor initializes it to a null-pointer. In the following code snippet the variable isNull is set to true:

    std::exception_ptr obj;
    bool isNull =  obj == nullptr && obj == 0;

The class exception_ptr provides copy and move constructors as well as copy and move assignment operators.

Two exception_ptr objects can be compared for equality. They are equal if they refer to the same exception. Move assignment transfers the exception referred to by the right-hand side operand to the left-hand side operand, and turns the right-hand side operand into a null pointer.

There is no published method directly retrieving the exception to which an exception_ptr object refers. However, there are some free functions constructing or handling exception_ptr objects:

std::exception_ptr std::current_exception() noexcept:
An exception_ptr object is returned referring to the currently handled exception (or a copy of the currently handled exception, or a default constructed exception_ptr object if no current exception is available). This function can also be called when a default exception catcher is used. E.g., assuming that obj refers to an available std::promise object, then the following code snippet assigns the exception caught by default catch clause to obj:
```
    ...
    catch (...)
    {
        obj.set_exception(std::current_exception());
    }
```
The exception referred to by current_exception does not have to be an object of the class std::exception. Any type of object or value thrown as an exception is retrieved as an exception_ptr by current_exception. The exception referred to by an exception_ptr object remains valid for at least as long as there exists an exception_ptr object that refers to it. Calling current_exception twice in a row then the two returned exception_ptr objects may or may not refer to the same exception object.
std::exception_ptr make_exception_ptr(Type value) noexcept:
This function template constructs an exception_ptr from a value of any type which is passed as its argument. Type does not necessarily have to be an std::exception. The constructed exception_ptr could, e.g., be assigned to a std::promise. When the promise's future's get member is subsequently called (possibly from within another thread) the exception will be thrown. Here are some examples, showing how values of different types can be passed as arguments to make_exception_ptr, and showing how the eventually constructed exception_ptr is assigned to the obj, which is assumed to be of a std::promise type:
```
    auto ptr = make_exception_ptr(exception());
    ptr = make_exception(string("hello world"));
    ptr = make_exception(12)
```
;
obj.set_exception(make_exception_ptr(ptr); )
void std::rethrow_exception(exception_ptr obj):
The exception to which obj refers is thrown. Note: obj cannot be a nullptr.

20.14: An example: multi-threaded compilations

In this section another program is developed. This section's example program illustrates the use of packaged_tasks.

Like the multi-threaded quicksort example a worker pool is used. However, in this example the workers in fact do not know what their task is. In the current example the tasks happens to be identical, but diffent tasks might as well have been used, without the need to update the workers.

The program uses a class Task containing a command-specification (d_command, and a task specification (d_task) (cf. Figure 23, the sources of the program are found in the yo/threading/examples/multicompile directory) of the C++ Annotations).

Figure 23 is shown here. — Figure 23: Data structure used for the multi-threading compilation

Like before main first starts its workforce as detached threads. Following this, the compilation jobs are performed by yet another detached thread. Eventually, the results of the compilation jobs are handled by results:

    int main()
    {
        workforce();                    // start the worker threads
        thread(jobs).detach();          // start working on the jobs
        results();                      // handle the results.
    }

The jobs-thread reads the names of the files to compile from the standard input stream, and passes them on to dispatch for further handling by the workers. Lines 7 and 8 are executed at the end, and are a safeguard against empty input. The safeguard is discussed below, at newResult's description.

     1: void jobs()
     2: {
     3:     string line;
     4:     while (getline(cin, line) && dispatch(line))
     5:         ;
     6:
     7:     g_ready = true;
     8:     g_resultCond.notify_one();
     9: }

Next the dispatcher. It ignores empty lines. Also, if a compilation has failed by the time the dispatcher is called, processing stops (lines 6-7). Otherwise, the dispacher waits for an available worker, prepares a new task, and notifies a worker to handle it:

    bool dispatch(string const &line)
    {
        if (line.empty())
            return true;

        if (g_done.load())
            return false;

        g_workforce.wait();
        newTask(line);
        g_worker.notify_all();
        return true;
    }

The function newTask prepares the program for the next task. First a Task object is created. Task contains the name of the file to compile, and a packaged_task. It encapsulates all activities that are associated with a packaged_task. Here is its (in-class) definition:

     1: typedef packaged_task<Result (string const &fname)> PackagedTask;
     2:
     3: class Task
     4: {
     5:     string d_command;
     6:     PackagedTask d_task;
     7:
     8:     public:
     9:         Task()  = default;
    10:
    11:         Task(string const &command, PackagedTask &&tmp)
    12:         :
    13:             d_command(command),
    14:             d_task(move(tmp))
    15:         {}
    16:
    17:         void operator()()
    18:         {
    19:             d_task(d_command);
    20:         }
    21:
    22:         shared_future<Result> result()
    23:         {
    24:             return d_task.get_future().share();
    25:         }
    26: };

Note (lines 22-25) that result returns a shared_future. Since the dispatcher runs in a different thread than the one processing the results, the futures created by the dispatcher must be shared with the futures required by the function processing the results. Hence the shared_futures returned by Task::result.

Once a Task object has been constructed its shared_future is pushed on the result queue. Although the actual results aren't available by this time, the result function (see below) is notified that something has been pushed on its queue. Additionally, the Task itself is pushed on the task queue, where a worker may retrieve it:

    typedef packaged_task<Result (string const &fname)> PackagedTask;

    class Task
    {
        string d_command;
        PackagedTask d_task;

        public:
            Task()  = default;

            Task(string const &command, PackagedTask &&tmp)
            :
                d_command(command),
                d_task(move(tmp))
            {}

            void operator()()
            {
                d_task(d_command);
            }

            shared_future<Result> result()
            {
                return d_task.get_future().share();
            }
    };

    void pushResultQ(shared_future<Result> const &sharedResult)
    {
        lock_guard<mutex> lk(g_resultQMutex);
        g_resultQ.push(sharedResult);
        g_ready = true;
        g_resultCond.notify_one();
    }

The workers have a simple task: retrieve the next task from the task queue's front, then perform that task. Whatever happens inside the tasks themselves is of no concern: the worker thread eventually ends when the program ends:

    void worker()
    {
        Task task;

        while (true)
        {
            g_worker.wait();

            g_taskQ.popFront(task);
            task();

            g_workforce.notify_all();
        }
    }

This completes the description of how tasks are handled. The task itself remains to be described. In the current example program source files are being compiled. The compilation command is passed to the constructor of a CmdFork object, which starts the compiler as a child process. The result of the compilation is retrieved via its childExit (returning the compiler's exit code) and childOutput (returning any textual output produced by the compiler) members. If compilation fails, the exit value won't be zero. In this case no further compilation tasks will be initiated (lines 11 and 12; the implementation of the class CmdFork is available from the C++ Annotations' yo/threading/examples/cmdfork directory). Here is the function compile:

     1: Result compile(string const &line)
     2: {
     3:     string command("/usr/bin/g++ -Wall --std=c++11 -c " + line);
     4:
     5:     CmdFork cmdFork(command);
     6:     cmdFork.fork();
     7:
     8:     Result ret {cmdFork.childExit() == 0,
     9:                 line + "\n" + cmdFork.childOutput()};
    10:
    11:     if (not ret.ok)
    12:         g_done = true;
    13:
    14:     return ret;
    15: }

The results function continues for as long as there are results. Once results are available they are displayed. Whenever a compilation has failed the Result's d_ok member is false and results ends:

    void results()
    {
        while (newResult())
        {
            Result const &result = g_resultQ.front().get();

            cerr << result.display;
            if (not result.ok)
                return;

            g_resultQ.pop();
        }
    }

The function newResult controls results' while-loop. It returns true when there are some results available in the result queue. When no filenames are presented at the standard input stream or when no result has as yet been pushed on the result queue newResult waits. It also waits when there are no results available in the results queue, but at least one worker thread is busy. Whenever a result is pushed on the result queue, and also once the input stream has been processed newResult is notified. Following the notification newResults returns, and results either ends or shows the results appearing at te result queue's front:

    bool newResult()
    {
        unique_lock<mutex> lk(g_resultQMutex);

        while
        (
            (
                g_resultQ.empty()
                &&
                g_workforce.size() != g_sizeofWorkforce
            )
            ||
            not g_ready.load()
        )
            g_resultCond.wait(lk);

        return not g_resultQ.empty();
    }