Chapter 5: The `string' Data Type

Don't hesitate to send in feedback: send an e-mail if you like the C++ Annotations; if you think that important material was omitted; if you find errors or typos in the text or the code examples; or if you just feel like e-mailing. Send your e-mail to Frank B. Brokken.

Please state the document version you're referring to, as found in the title (in this document: 10.3.0) and please state chapter and paragraph name or number you're referring to.

All received mail is processed conscientiously, and received suggestions for improvements are usually processed by the time a new version of the Annotations is released. Except for the incidental case I will normally not acknowledge the receipt of suggestions for improvements. Please don't interpret this as me not appreciating your efforts.

C++ offers many solutions for common problems. Most of these facilities are part of the Standard Template Library or they are implemented as generic algorithms (see chapter 19).

Among the facilities C++ programmers have developed over and over again are those manipulating chunks of text, commonly called strings. The C programming language offers rudimentary string support. C's NTBS is the foundation upon which an enormous amount of code has been built (An NTBS (null-terminated byte string, also NTB string) is a character sequence whose highest-addressed element with defined content has the value zero (the terminating null character); no other character in the sequence has the value zero.).

To process text C++ offers a std::string type. In C++ the traditional C library functions manipulating NTB strings are deprecated in favor of using string objects. Many problems in C programs are caused by buffer overruns, boundary errors and allocation problems that can be traced back to improperly using these traditional C string library functions. Many of these problems can be prevented using C++ string objects.

Actually, string objects are class type variables, and in that sense they are comparable to stream objects like cin and cout. In this section the use of string type objects is covered. The focus is on their definition and their use. When using string objects the member function syntax is commonly used:

        stringVariable.operation(argumentList)
For example, if string1 and string2 are variables of type std::string, then
        string1.compare(string2)
can be used to compare both strings.

In addition to the common member functions the string class also offers a wide variety of operators, like the assignment (=) and the comparison operator (==). Operators often result in code that is easy to understand and their use is generally preferred over the use of member functions offering comparable functionality. E.g., rather than writing

    if (string1.compare(string2) == 0)
the following is generally preferred:
    if (string1 == string2)

To define and use string-type objects, sources must include the header file <string>. To merely declare the string type the header can be included.

In addition to std::string, the header file string defines the following string types:

5.1: Operations on strings

Some of the operations that can be performed on strings return indices within the strings. Whenever such an operation fails to find an appropriate index, the value string::npos is returned. This value is a symbolic value of type string::size_type, which is (for all practical purposes) an (unsigned) int.

All string members accepting string objects as arguments also accept char const * (NTBS) arguments. The same usually holds true for operators accepting string objects.

Some string-members use iterators. Iterators are formally introduced in section 18.2. Member functions using iterators are listed in the next section (5.2), but the iterator concept itself is not further covered by this chapter.

Strings support a large variety of members and operators. A short overview listing their capabilities is provided in this section, with subsequent sections offering a detailed discussion. The bottom line: C++ strings are extremely versatile and there is hardly a reason for falling back on the C library to process text. C++ strings handle all the required memory management and thus memory related problems, which is the #1 source of problems in C programs, can be prevented when C++ strings are used. Strings do come at a price, though. The class's extensive capabilities have also turned it into a beast. It's hard to learn and master all its features and in the end you'll find that not all that you expected is actually there. For example, std::string doesn't offer case-insensitive comparisons. But in the end it isn't even as simple as that. It is there, but it is somewhat hidden and at this point in the C++ Annotations it's too early to study into that hidden corner yet. Instead, realize that C's standard library does offer useful functions that can be used as long as we're aware of their limitations and are able to avoid their traps. So for now, to perform a traditional case-insensitive comparison of the contents of two std::string objects str1 and str2 the following will do:

    strcasecmp(str1.c_str(), str2.c_str());

Strings support the following functionality:

5.2: A std::string reference

In this section the string members and string-related operations are referenced. The subsections cover, respectively the string's initializers, iterators, operators, and member functions. The following terminology is used throughout this section:

Both opos and apos must refer to existing offsets, or an exception (cf. chapter 10) is generated. In contrast, an and on may exceed the number of available characters, in which case only the available characters are considered.

Many members declare default values for on, an and apos. Some members declare default values for opos. Default offset values are 0, the default values of on and an is string::npos, which can be interpreted as `the required number of characters to reach the end of the string'.

With members starting their operations at the end of the string object's contents proceeding backwards, the default value of opos is the index of the object's last character, with on by default equal to opos + 1, representing the length of the substring ending at opos.

In the overview of member functions presented below it may be assumed that all these parameters accept default values unless indicated otherwise. Of course, the default argument values cannot be used if a function requires additional arguments beyond the ones otherwise accepting default values.

Some members have overloaded versions expecting an initial argument of type char const *. But even if that is not the case the first argument can always be of type char const * where a parameter of std::string is defined.

Several member functions accept iterators. Section 18.2 covers the technical aspects of iterators, but these may be ignored at this point without loss of continuity. Like apos and opos, iterators must refer to existing positions and/or to an existing range of characters within the string object's contents.

All string-member functions computing indices return the predefined constant string::npos on failure.

The C++14 standard offers the s literal suffix to indicate that a std::string constant is intended when a string literal (like "hello world") is used. When string literals are used in the context of std::string objects, this literal suffix is hardly ever required, but it may come in handy when using the auto keyword. E.g., auto str = "hello world"s defines std::string str, whereas it would have been a char const * if the literal suffix had been omitted.

5.2.1: Initializers

After defining string objects they are guaranteed to be in a valid state. At definition time string objects may be initialized in one of the following ways: The following string constructors are available:

5.2.2: Iterators

See section 18.2 for details about iterators. As a quick introduction to iterators: an iterator acts like a pointer, and pointers can often be used in situations where iterators are requested. Iterators usually come in pairs, defining a range of entities. The begin-iterator points to the first entity, the end-iterator points just beyond the last entity of the range. Their difference is equal to the number of entities in the iterator-range.

Iterators play an important role in the context of generic algorithms (cf. chapter 19). The class std::string defines the following iterator types:

5.2.3: Operators

String objects may be manipulated by member functions but also by operators. Using operators often results in more natural-looking code. In cases where operators are available having equivalent functionality as member function the operator is practically always preferred.

The following operators are available for string objects (in the examples `object' and `argument' refer to existing std::string objects).

5.2.4: Member functions

The std::string class offers many member function as well as additional non-member functions that should be considered part of the string class. All these functions are listed below in alphabetic order.

The symbolic value string::npos is defined by the string class. It represents `index-not-found' when returned by member functions returning string offset positions. Example: when calling `object.find('x')' (see below) on a string object not containing the character 'x', npos is returned, as the requested position does not exist.

The final 0-byte used in C strings to indicate the end of an NTBS is not considered part of a C++ string, and so the member function will return npos, rather than length() when looking for 0 in a string object containing the characters of a C string.

Here are the standard functions that operate on objects of the class string. When a parameter of size_t is mentioned it may be interpreted as a parameter of type string::size_type, but without defining a default argument value. The type size_type should be read as string::size_type. With size_type the default argument values mentioned in section 5.2 apply. All quoted functions are member functions of the class std::string, except where indicated otherwise.

5.2.5: Conversion functions

Several string conversion functions are available operating on or producing std::string objects. These functions are listed below in alphabetic order. They are not member functions, but class-less (free) functions declared in the std namespace. The <string> header file must be included before they can be used.