C++: Insert or Update a Map

A C++ STL map is used to store and manage key-value pairs. The elements are maintained in the sorted order of the keys. The keys in a map cannot be changed. Either a new key-value pair can be added or the value of an already existing key-value pair can be changed. These operations are supported by 2 interfaces: insert() and the subscript operator []. These 2 interfaces differ slightly and it is important to be aware how since it has implications on correctness and efficiency.

Consider a map:

#include <map>
using namespace std;

typedef map<KeyType, ValType> KVMap;

KVMap kvmap;

To insert a new key-value pair to the map use:

kvmap.insert( KVMap::value_type( key0, val0 ) );

It is important to note that if key0 already exists in the map, this operation silently fails, returning a bool value of false! This might seem obvious, put another way, the insert operation cannot be used to update an element of the map.

If the key does not already exist in the map, this is the most convenient and most efficient method to insert a key-value pair into the map.

To update the value of an existing element of the map, use the subscript operator:

kvmap[ key0 ] = val1;

This interface is very convenient, but the user needs to be acutely aware of how this works:

  1. The subscript operator [] uses the key0 to search for that element in the map. If the element exists, a reference to its value is returned. If the element does not exist, a new element with the given key and a value constructed using the default constructor is used.
  2. The assignment operator assigns the value to the reference returned by the subscript operator [].

What all of this means is that the subscript operator is both convenient and efficient to update the value of an already existing element in the map. Since the value is constructed by default once and then assigned again, it is not an efficient way to insert a key-value pair to the map!

The subscript operator can be dangerous when used without assigning any value! If the map does not have an element with key99, the following statement creates a new element with key99 in the map:

kvmap[ key99 ];
// or ...
cout << kvmap[ key99 ];

Inserting and updating elements in a map has these few potholes, so drive carefully when using them! :-)

C++: List

A STL list in C++ provides the features of a doubly-linked list. Using the list is similar to using any other STL container, with iterators. Here is a simple example that illustrates the common list operations:

#include <list>
using namespace std;

typedef list<int>               IntList;
typedef IntList::iterator       IListIter;
typedef IntList::const_iterator	IListCIter;

int main()
{
    IntList ilist;                                  // Empty list
    IntList ilist1( 100 );                          // List with 100 elements of 0
    IntList ilist2( 100, 3 );                       // List with 100 elements of 3
    IntList ilist3( ilist2.begin(), ilist2.end() ); // List copied from previous list

    // Create list: 10->20->30
    ilist.push_back( 10 );
    ilist.push_back( 20 );
    ilist.push_back( 30 );

    ilist.size();  // 3
    ilist.empty(); // false

    ilist.front(); // 10
    ilist.back();  // 30

    ilist.insert( ilist.begin(), 99 );          // 99 added, 99->10->20->30
    ilist.erase( ilist.begin() );               // 99 erased, 10->20->30
    ilist.erase( ilist.begin(), ilist.end() );	// Entire list erased

    return 0;
}

C++: STL Stack

The C++ STL provides a simple stack. This is not a STL container, but is rather called a container adapter. That is, it is a simple interface built upon a real STL container.

Using STL stack is super-easy. Push elements using push(). To get the top element call top() and to pop the top element (without returning it) use pop(). So, typically top() and pop() would be used together. empty() is used to check if stack is empty.

#include <stack>
using namespace std;

stack<int> istack;
istack.push( 10 );
int i = istack.top(); // 10
istack.pop();
bool isEmpty = istack.empty(); // true

Visual C++: Iterator Checks and Slow STL Code

Visual C++ turns on a lot of checking on STL iterators when compiled in Debug mode. This is very helpful since any mistake with an iterator leads immediately to an exception with a helpful exception message. This can catch latent bugs much much earlier while programming.

However, this huge amount of security is sometimes not tolerable by everyone. For certain applications the debug mode with all the iterator checking turned on can be extremely slow, to the point of being useless. I faced a similar problem and so had to investigate to find the reason for the slow behaviour. I discovered that there are 2 mutually exclusive checking mechanisms in the STL of Visual C++: checked iterators and iterator debugging.

Checked Iterators

Checked iterators is controlled by the preprocessor definition _SECURE_SCL. SCL is Microsoft jargon for Standard C++ Library. So, this is a feature by Microsoft to provide some amount of minimal security on the usage of STL iterators. The overhead of _SECURE_SCL is so low that by default it is ON for both Release and Debug modes. So, _SECURE_SCL is rarely the cause of your program being slow. It should be left at its default options.

Iterator Debugging

Iterator debugging is a whole another beast. It involves a lot more intensive checking on the validity of iterators. It does this both when iterators are dereferenced and when containers change internally. This was implemented by Dinkumware, who are the creators of the STL implementation that Visual C++ ships with.

By default, iterator debugging is turned ON for Debug mode and OFF for Release mode. Typically it is not slow for Debug mode, but if the C++ code has a lot of loops over STL containers and uses iterators heavily, then it can make everything very slow. How slow can it get? This is the difference I saw in one of my programs compiled in Debug mode:

Iterator Debugging ON: 51.83 sec
Iterator Debugging OFF: 0.27 sec

Yes, iterator debugging was a jaw-dropping 192 times slower! Not just that, at ~52 seconds of execution, this program was unusable for my purpose!

Iterator debugging is controlled by the preprocessor definition _HAS_ITERATOR_DEBUGGING. To turn off iterator debugging, add _HAS_ITERATOR_DEBUGGING=0 to the Preprocessor Definitions in C++ → Preprocessor section of the project properties.

References:

  • STL Iterator Debugging and Secure SCL video on Channel 9 by Stephan T. Lavavej. A must watch video for anyone using STL on Visual C++. Stephan explains both Secure SCL and Iterator Debugging and how they are implemented internally in the library. Also explained is when and which of these options to turn ON or OFF.

C++: STL Vector Capacity

The STL vector is a dynamic array. When it grows its internal memory is reallocated and the old content is copied over to the new space. Because of this, when the vector capacity increases the iterators, references and pointers to its elements become invalid.

The capacity increase factor of a vector depends on the STL implementation. The current capacity of a vector can be found by calling its capacity() method. By running a simple program (like below), one can figure out the factor. I found that this factor is 1.5 for Visual C++ 2010, which uses the STL implementation from Dinkumware and it is 2.0 on G++ 3.4.4 running under Cygwin.

#include <iostream>
#include <vector>
using namespace std;

int main()
{
	vector<int> ivec;
	for ( int i = 0; i < 1000; ++i )
	{
		const int before = ivec.capacity();
		ivec.push_back( i );
		const int after = ivec.capacity();

		if ( before != after )
			cout << "Capacity: " << after << endl;
	}

	return 0;
}

C++ STL: Inserting Vector into Set

Initializing a set with a vector during its construction is easy:

std::set<int> fooSet( fooVec.begin(), fooVec.end() );

Inserting the elements of a vector into an already existing set takes a bit more machinery:

std::copy( fooVec.begin(), fooVec.end(), std::inserter( fooSet, fooSet.end() ) );

std::copy should be familiar. It inserts the elements of a source container into a destination container. However, it requires the destination container to have enough space for the copied elements. To get around this hassle, the typical solution is to use insert iterators. The commonly used std::back_inserter and std::front_inserter do not work on sets. So, we resort to the std::inserter. But, this requires a position as the second parameter. Thankfully, for associative containers like sets, the second parameter is only a hint, so we can pass any position (like fooSet.end() or fooSet.end()).

Replace vector with any sequence container (list or deque) and set with any associative container (map), and the above operations will work the same on them.

C++ STL: Push Back Array into Container

Pushing the elements of an array into the back of a STL container, in a single statement, is easy using the STL std::copy algorithm. This algorithm relies on the destination having enough space to hold the source elements. So, a convenient way to apply it on a container without allocating the required space is to use a std::back_inserter. This works on all containers that have a push_back method: vector, list and deque.

#include <algorithm>
#include <iterator>
std::copy( fooArray, fooArray + ARRAY_SIZE, std::back_inserter( fooVector ) );

Notes from Effective STL

I recently read Effective STL by Scott Meyers. My review of the book can be found here. Below are my notes from the book.

  • One guideline that is missing in the book: Do not extend the STL containers. It is possible, but brings about nothing but misery.
  • Item 1: Choose your containers with care

    The standard sequence containers are: vector, list, deque, string
    The standard associative containers are: set, multiset, map, multimap

    string is typedef of basic_string<char>
    wstring is typedef of basic_string<wchar_t>

    Scott does not mention this, but I find this useful because when you get compilation errors on string, they will mention basic_string<> and not string.

  • Item 2: Beware of the illusion of container-independent code

    This item mentions the most important trick to cut the verbosity of STL: Typedef everything possible.

    For example:
    class Foo;
    typedef std::vector<Foo> FooVec;
    typedef FooVec::iterator FooVecIter;
    
  • Item 4: Call empty() instead of checking size() against zero

    empty() is a constant time operation on all containers. size() can sometimes be a linear time operation on certain containers, like std::list.
  • Item 5: Prefer range member functions to their single element counterparts

    Instead of doing:
    for (int i = 0; i < MAX; ++i)
        vec.push_back(arr[i]);
    

    Try this:

    std::copy( arr, arr, std::back_inserter(vec) );
    

    For an example usage of std::back_inserter see this post.

  • Item 9: Choose carefully among erasing options

    To me, this is STL’s biggest gotcha! std::remove() does not remove elements from the container.

    Use the erase-remove idiom to achieve actual removal:
    fooVec.erase( std::remove(), fooVec.end() );
    
  • Item 14: Use reserve to avoid unnecessary reallocations
  • Item 16: Know how to pass vector and string data to legacy APIs

    if ( !fooVec.empty() )  // This is important!
        someCFunctionCall( &v[0], v.size() );
    
  • Item 18: Avoid using vector<bool>

    It is a pseudo-container with a compressed representation of bools. Avoid using std::vector<bool>, use std::deque<bool> instead. It offers everything that the former does.
  • Item 21: Always have comparison functions return false for equal values
  • Item 23: Consider replacing associative containers with sorted vectors

    If the container is used in a phased manner, first phase only for insertions and the second phase only for lookups, then a sequence container might offer better performance than an associative container.
  • Item 27: Use distance and advance to convert a container’s const_iterator to iterator

    STLContainer v;
    STLContainer::const_iterator ci = SomeFunction();
    STLContainer::iterator i( v.begin() );
    std::advance( i, std::distance<STLContainer::const_iterator>( i, ci ));
    
  • Item 28: Understand how to use a reverse_iterator’s base iterator

    The base() iterator of a reverse_iterator points 1 element in front of the reverse_iterator position. Use with care.
  • Item 29: Consider istreambuf_iterator for character-by-character input

    Useful for unformatted input from files:

    // Read file to string
    std::ifstream iFile("haha.txt");
    std::string fileData( std::istreambuf_iterator<char>( iFile ) ), std::istreambuf_iterator<char>() );
    
  • Item 30: Make sure destination ranges are big enough

    This usually leads to bugs. Instead, whenever possible use one of the inserters: std::inserter, std::back_inserter or std::front_inserter.

    For an example usage of std::back_inserter see this post.
  • Item 31: Know your sorting options

    Ordered by performance, best to worst:
    partition
    stable_partition
    nth_element
    partial_sort
    sort
    stable_sort
  • Item 34: Note which algorithms expect sorted ranges

    These are:
    binary_search
    lower_bound
    upper_bound
    equal_range
    set_union
    set_intersection
    set_difference
    set_symmetric_difference
    merge
    inplace_merge
    includes
  • Item 37: Use accumulate or for_each to summarize ranges
  • Item 39: Make predicates pure functions

    Predicate is a function that returns bool.
    Pure function is a function whose return value depends only on its input parameters. It does not have any side effects.
  • Item 40: Make functor classes adaptable

    Inherit functors from std::unary_function or std::binary_function so that the necessary types are defined nicely for you.

    Defining functors as classes or structs is purely a matter of style. STL uses structs. Some may find it better since there is no need to declare it as public.
  • Item 43: Prefer algorithm calls to hand written loops
  • Item 44: Prefer member functions to algorithms with the same names
  • Item 45: Distinguish among count, find, binary_search, lower_bound, upper_bound and equal_range
  • Item 46: Consider function objects instead of functions as algorithm parameters

    Passing functors is actually faster than passing pointer to a function! :-)