==============================================================================
C-Scene Issue #2
Const Correctness in C++
Chad Loder
==============================================================================

Const Correctness in C++

Introduction

A popular USENET joke goes:
In C, you merely shoot yourself in the foot.

In C++, you accidentally create a dozen instances of yourself and shoot them all in the foot. Providing emergency medical care is impossible, because you can't tell which are bitwise copies and which are just pointing at others and saying, "That's me, over there."

While it is true that using the object-oriented features of C++ requires more thought (and hence, more opportunities to make mistakes), the language provides features that can help you create more robust and bug-free applications. One of these features is const, the use of which I will address in this article.

Used properly with classes, const augments data-hiding and encapsulation to provide full compile-time safety; violations of const cause compile-time errors, which can save you a lot of grief (from side-effects and other accidental modifications of data). Some C++ programmers believe const-correctness is a waste of time. I disagree - while it takes time to use const, the benefits almost always outweigh the time spent debugging. Furthermore, using const requires you to think about your code and its possible applications in more detail, which is a good thing. When you get used to writing const-correctly, it takes less time - this is a sign that you have achieved a state of enlightenment. Hopefully this article will help put you on the path of eternal bliss.

The Many Faces of Const

Like most keywords in C++, the const modifier has many shades of meaning, depending on context. Used to modify variables, const (not surprisingly) makes it illegal to modify the variable after its initialization. For example:
int  x = 4;        // a normal variable that can be modified
x = 10;            // legal

const int x = 2;   // const var can be initialized, not modified thereafter
x = 10;            // error - cannot modify const variable

Thus, const can replace the use of the #define to give names to manifest constants. Since preprocessor macros don't provide strong compile-time type checking, it is better to use const than #define. Moreover, some debugging environments will display the symbol which corresponds to a const value, but for #define constants, they will only display the value.

The const keyword is more involved when used with pointers. A pointer is itself a variable which holds a memory address of another variable - it can be used as a "handle" to the variable whose address it holds. Note that there is a difference between "a read-only handle to a changeable variable" and a "changeable handle to a read-only variable".


const int x;      // constant int
x = 2;            // illegal - can't modify x

const int* pX;    // changeable pointer to constant int
*pX = 3;          // illegal -  can't use pX to modify an int
pX = &someOtherIntVar;      // legal - pX can point somewhere else

int* const pY;              // constant pointer to changeable int
*pY = 4;                    // legal - can use pY to modify an int
pY = &someOtherIntVar;      // illegal - can't make pY point anywhere else

const int* const pZ;        // const pointer to const int
*pZ = 5;                    // illegal - can't use pZ to modify an int
pZ = &someOtherIntVar;      // illegal - can't make pZ point anywhere else


Const With Pointers and Type-Casting

A pointer to a const object can be initialized with a pointer to an object that is not const, but not vice versa.
int y;
const int* pConstY = &y;  // legal - but can't use pConstY to modify y
int* pMutableY = &y;      // legal - can use pMutableY to modify y
*pMutableY = 42;

In the above code, all you're really saying is that you can't use pConstY as a handle to modify the data it points to. If y is not const, then you can safely modify y via another pointer, pMutableY for instance. Pointing at y with a const int* does not make y const, it just means that you can't change y using that pointer. If y is const, however, forcing the compiler to let you mess with its value can yield strange results. Although you should never write code that does this, you can play tricks on the compiler and try to modify const data. All you need to do is somehow put the address of the const int into a normal int* that you can use to modify the const int.

C++ does not allow you to circumvent const easily because the assignment operator can't be used to put the contents of a const int* into a normal int* without explicit casts. C++ does not supply a standard conversion from a const type to a type that is not const. However, any sort of conversion can be specified with explicit type casts (including unsafe conversions). Thus, the type-system in C++ generally will not allow you to put the address of const data into a pointer to non-const data.

For example, try to put the address of x, which is const, into a normal int* so you can use it to modify the data:


const int x;             // x cannot be modified

const int* pX = &x;      // pX is the address of a const int
                         // and can't be used to change an int

*pX = 4;                 // illegal - can't use pX to change an int

int* pInt;       // address of normal int
pInt = pX;       // illegal - cannot convert from const int* to int*

Nor will compiler let you take the address of a const variable and store it in a pointer to non-const data using the address-of operator (&), for the same reason:
int *pInt;   // address of a normal int
pInt = &x;   // illegal - cannot convert from const int* to int*

The address-of operator returns a pointer to the variable; if the variable is a const int, it returns a const int*. If the variable is an int, & returns an int*. C++ makes it difficult to get a pointer to this data which can be used to modify it.

The const keyword can't keep you from purposely shooting yourself in the foot. Using explicit type-casting, you can freely blow off your entire leg, because while the compiler helps prevent accidental errors, it lets you make errors on purpose. Casting allows you to "pretend" that a variable is a different type. For example, C programmers learn early on that the result of dividing an integer by an integer is always an integer:


int x = 37;
int y = 8;

double quotient = x / y;   // classic mistake, result is rounded to an int
cout << quotient;          // prints " 4.000000" 
double quotient = (double)x/y;   // cast result as double so it's not rounded
cout << quotient;          // prints "4.625000"

With casting, you can force the compiler to let you put the address of a const int variable into a normal int*. Remember that const int* and int* are, in fact, separate types. So you can cast from a const int* to a normal int* and use the pointer to try and modify data. The result, however, is undefined. The compiler is free to store constants wherever it wants (including non-writeable memory), and if you trick the compiler into letting you try to modify the constant, the result is undefined. This means that it might work, it might do nothing, or it might crash your program.

The following code is a good illustration of how to mess yourself up with forced casting:


const int x = 4;           // x is const, it can't be modified
const int* pX = &x;        // you can't modify x through the pX pointer

cout << x << endl;         // prints "4"

int* pX2 = (int *)pX;      // explicitly cast pX as an int*
*pX2 = 3;                  // result is undefined

cout << x << endl;        // who knows what it prints?

On my system using , this code compiles and runs without crashing, but the x does not appear to be changed by the second assignment; it outputs '4' both times.

However, when you look at it more closely, strange things are happening. When you run the code, the output (from cout or printf) seems to show that x doesn't change in the second assignment. But when you step through the code, the debugger shows you that x does, in fact, change. So what is happening? If x changes, then why doesn't the output statement reflect this change?

Often in such bizarre situations, it is a good idea to look at the assembler code that was produced. In Visual C++, compile with the /Fa"filename.asm" option to output the assembler with the corresponding lines of code into a file so you can look at it. Don't panic if you don't know much about assembler - if you know how arguments are pushed onto the stack, it's really quite easy to see what's happening.


ASSEMBLER OUTPUT                       C++ CODE
Mov   eax, DWORD PTR _pX$[ebp]         int* pX2 = (int *)pX;
Mov   DWORD PTR _pXX$[ebp], eax
Mov   eax, DWORD PTR _pXX$[ebp]        *pX2 = 3;
Mov   DWORD PTR [eax], 3
Push  OFFSET FLAT:?endl@@.........     cout << x << endl;
Push  4

The important line is "Push 4". The assembler code shows that instead of pushing the value of x onto cout's stack frame, it pushes the literal constant 4 instead. The compiler assumes that since you declared x as const and initialized it as 4, it is free to optimize by pushing the literal constant 4 onto the stack rather than having to dereference x to get its value. This is a valid optmization, and happens in Visual C++ even with all optimization turned off. This code would work fine if we did not declare x as const. We could use a const int* to point at a non-const int, and have no trouble.

The Const_cast Operator

The above example is indicative of bad C++ casting manners. Another way to write functionally equivalent code is to use the const_cast operator to remove the const-ness from the const int*. The result is the same:


const int x = 4;      // x is const, it can't be modified
const int* pX = &x;   // you can't modify x through the pX pointer

cout << x << endl;    // prints "4"

int* pX2 = const_cast < int* > (pX);   // explicitly cast pX as non-const

*pX2 = 3;           // result is undefined
cout << x << endl;   // who knows what it prints?

Althought this is a naughty example, it's a good idea to use the const_cast operator. The const_cast operator is more specific than normal type-casts because it can only be used to remove the const-ness of a variable, and trying to change its type in other ways is a compile error. For instance, say that you changed x in the old-style cast version of the above example to an double and changed pX to double*. The code would still compile, but pX2 would be treating it as an int. It might not cause a problem (because ints and doubles are somewhat similar), but the code would certainly be confusing. Also, if you were using user-defined classes instead of numeric types, the code would still compile, but it would almost certainly crash your program. If you use const_cast, you can be sure that the compiler will only let you change the const-ness of a variable, and never its type.

Const Storage and String Literals

Another example of using pointers to play around with const storage is when you try to use a char* to modify a string literal. In C++, the compiler allows the use of string literals to initialize character arrays. A string literal consists of zero or more characters surrounded by double quotation marks ("). A string literal represents a sequence of characters that, taken together, form a null-terminated string. The compiler creates static storage space for the string, null-terminates it, and puts the address of this space into the char* variable. The type of a literal string is an array of const chars.

The C++ standard (section lex.string) states:


1 A  string  literal  is  a  sequence  of  characters  (as  defined   in
  _lex.ccon_) surrounded by double quotes, optionally beginning with the
  letter L, as in "..." or L"...".  A string literal that does not begin
  with  L  is  an  ordinary string literal, also referred to as a narrow
  string literal.  An ordinary string literal has type "array of n const
  char"  and  static storage duration (_basic.stc_), where n is the size
  of the string as defined below, and  is  initialized  with  the  given
  characters.   A string literal that begins with L, such as L"asdf", is
  a wide string literal.  A wide string literal has  type  "array  of  n
  const wchar_t" and has static storage duration, where n is the size of
  the string as defined below, and is initialized with the given charac-
  ters.

2 Whether  all  string  literals  are  distinct  (that is, are stored in
  nonoverlapping objects)  is  implementation-defined.   The  effect  of
  attempting to modify a string literal is undefined.

In the following example, the compiler automatically puts a null-character at the end of the literal string of characters "Hello world". It then creates a storage space for the resulting string - this is an array of const chars. Then it puts the starting address of this array into the szMyString variable. We will try to modify this string (wherever it is stored) by accessing it via an index into szMyString. This is a Bad Thing; the standard does not say where the compiler puts literal strings. They can go anywhere, possibly in some place in memory that you shouldn't be modifying.
char* szMyString = "Hello world.";   
szMyString[3] = 'q';         // undefined, modifying static buffer!!!

In James Coplien's book, Advanced C++ Programming Styles & Idioms, I came across the following code (p. 400):
char *const a = "example 1";   // a const pointer to (he claims) non-const data
a[8] = '2';         // Coplien says this is OK, but it's actually undefined

Both of these examples happen to work on my system, but you shouldn't rely on this kind of code to function correctly. Whether or not the literal strings you point to are explicitly declared const, you shouldn't try to modify them, because the standard states that they are in fact const.

If you've been paying attention, you'll remember that the type-system in C++ will not allow you to put the address of const data into a pointer to non-const data without using explicit type casts, because there is no standard conversion between const types and types that are not const. Example:


   const char constArray[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
   char nonConstArray[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
   char* pArray = constArray;            // illegal
   char* pArray = nonConstArray;         // legal

If, as the standard says, an ordinary string literal has type "array of n const char", then the following line of code should cause an error just like the above example:
   // should be illegal - converts array of 6 const char to char*
   char* pArray = "Hello";

Of course, this code is a common idiom and it's perfectly legal. This appears to be an inconsistency in the language standard. A lot of these inconsistencies exist because older C and C++ code would break if the standard were strictly consistent. The standards people are afraid to break old code, because it would mean a decrease in the popularity of the language.

Notice item 2 in the above quote from the language standard: literal strings don't have to be distinct. This means that it is legal for implementations to use string pooling, where all equal string literals are stored at the same place. For example, the help in Visual C++ states:

"The /GF option causes the compiler to pool strings and place them in read-only memory. By placing the strings in read-only memory, the operating system does not need to swap that portion of memory. Instead, it can read the strings back from the image file. Strings placed in read-only memory cannot be modified; if you try to modify them, you will see an Application Error dialog box. The /GF option is comparable to the /Gf option, except that /Gf does not place the strings in read-only memory. When using the /Gf option, your program must not write over pooled strings. Also, if you use identical strings to allocate string buffers, the /Gf option pools the strings. Thus, what was intended as multiple pointers to multiple buffers ends up as multiple pointers to a single buffer."
To test this, you can write a simple program as follows:
#include <stdio.h>

int main()
{
   char* szFirst = "Literal String";
   char* szSecond = "Literal String";

   szFirst[3] = 'q';
   printf("szFirst (%s) is at %d, szSecond (%s) is at %d\n",
         szFirst, szFirst, szSecond, szSecond);

   return 0;
}

On my system, this program outputs:
szFirst (Litqral String) is at 4266616, szSecond (Litqral String) is at 4266616
Sure enough. Although there was only one change, since string pooling was activated, both char* variables pointed to the same buffer. The output reflects this.

Const and Data-Hiding

It is often useful to use const variables when you have private data in a class, but you want to easily access the data outside of the class without changing it. For example:
class Person
{
   public:
      Person(char* szNewName)
      {
         // make a copy of the string
         m_szName = _strdup(szNewName);
      };

      ~Person() { delete[] m_szName; };

   private:
      
      char* m_szName; 
};

Now, what if I wanted to easily print out the person's name? I could do the following:
class Person
{
   public:
      Person(char* szNewName)
      {
         // make a copy of the string
         m_szName = _strdup(szNewName);
      };

      ~Person() { delete[] m_szName; };

      void PrintName()
      {
         cout << m_szName << endl;
      };

   private:
      
      char* m_szName; 
};

Now I can call Person::PrintName() and it will print the name out to the console. There is a design problem with this code, however. It builds dependencies on the iostream libraries and the console I/O paradigm right into the Person class. Since a Person inherently has nothing to do with console I/O, one shouldn't tie the class to it. What if you want to print out the name in a Windows or X-Windows application? You'd need to change your class, and that reeks.

So, we can do something like the following:


class Person
{
   public:
      Person(char* szNewName)
      {
         // make a copy of the string
         m_szName = _strdup(szNewName);
      };

      ~Person() { delete[] m_szName; };

      void GetName(char *szBuf, const size_t nBufLen)
      {
         // ensure null termination in the copy
         strncpy(szBuf, m_szName, nBufLen - 1);
      };
   private:
      char* m_szName; 
};

Now we can print the name out by doing something like this:
Person P("Fred Jones");
char szTheName = new char[256];
P.GetName(szTheName, 256);
cout << szTheName << endl;

Wow, three lines of code just to print out a name. And I bet you didn't even notice that we forgot to delete the dynamic memory for szTheName! There must be a better way to do this. Why don't we just return a pointer to the string?
class Person
{
   public:
      Person(char* szNewName)
      {
         // make a copy of the string
         m_szName = _strdup(szNewName);
      };

      ~Person() { delete[] m_szName; };

      char* GetName()
      {
         return m_szName;
      };

   private:
      
      char* m_szName; 
};

With this, you can print out the code in one line:
Person P("Fred Jones");
cout << P.GetName() << endl;

Much shorter, but as you may have noticed, the m_szName variable is private inside the Person class! What's the point of declaring it as private if you're going to pass out non-const pointers to it? What if you wrote a buggy print function that modified what it was printing?
// this function overwrites szString 
// (which may have held the address of dynamically allocated memory)
void MyBuggyPrint(char* szString)
{
   // make a copy of the string and print out the copy
   szString = _strdup(szString);

   cout << szString << endl;
   
   free (szString);
}

Person P("Fred Jones");
MyBuggyPrint(P.GetName());

The MyBuggyPrint function makes a new string, puts the new string's address in its first parameter, prints it, then deletes it. This results in two related problems. We pass in a pointer to the string data that was allocated in the Person constructor, the pointer gets set to the location of the string copy, which then gets deleted. So P.m_szName is left pointing to garbage. Second, since you lose the original location of the string pointed to by m_szName, you never free the string, so it's a memory leak.

Fortunately, the const keyword comes in handy in situations like this. At this point, I'm sure some readers will object that if you write your code correctly, you won't need to protect yourself from your own mistakes - "You can either buy leaky pens and wear a pocket protector, or just buy pens that don't leak, period." While I agree with this philosophy, it is important to remember that when you're writing code, you're not buying pens - you're manufacturing pens for other people to stick in their pockets. Using const helps in manufacturing quality pens that don't leak.


class Person
{
   public:
      Person(char* szNewName)
      {
         // make a copy of the string
         m_szName = _strdup(szNewName);
      };

      ~Person() { delete[] m_szName; };

      const char* const GetName()
      {
         return m_szName;
      };

   private:
      
      char* m_szName; 
};

Person P("Fred Jones");

MyBuggyPrint(P.GetName());   // error! Can't convert const char* const to char*


This time, we're returning a const char* const from the class, which means that you can't change the pointer to point somewhere else, and you can't modify what the pointer points to. Now your code won't even compile, because your MyBuggyPrint function expects a char*.

This brings up an interesting point. If you wrote your code this way, you'd have to go back and rewrite your MyBuggyPrint function to take a const char* const (hopefully fixing it in the process). This is a pretty inefficient way to code, so remember that you should use const as you go - don't try to make everything const correct after the fact. As you're writing a function like MyBuggyPrint, you should think "Hmmm...do I need to modify what the pointer points to? No...do I need to point the pointer somewhere else? No...so I will use a const char* const argument." Once you start thinking like this, it's easy to do, and it will keep you honest; once you start using const correctness, you have to use it everywhere.

With this philosophy, we could further modify the above example by having the Person constructor take a const char* const, instead of a char*. We could also further modify the GetName member function. We can declare it as:


class Person
{
   public:
      Person(char* szNewName)
      {
         // make a copy of the string
         m_szName = _strdup(szNewName);
      };

      ~Person() { delete[] m_szName; };

      const char* const GetName() const
      {
         return m_szName;
      };

   private:
      
      char* m_szName; 
};

Declaring a member function as const tells the compiler that the member function will not modify the object's data and will not invoke other member functions that are not const. The compiler won't take you at your word; it will check to make sure that you really don't modify the data. You can call a const member function for either a const or a non-const object, but you can't call a non-const member function for a const object (because it could modify the object).

If we declare GetName() as a const member function, then the following code is legal:


void PrintPerson(const Person* const pThePerson)
{
   cout << pThePerson->GetName() << endl;   // OK
}

// a const-reference is simply an alias to a const variable
void PrintPerson2(const Person& thePerson)
{
   cout << thePerson.GetName() << endl;   // OK
}

But if we don't declare it as const, then the code won't even compile.
void PrintPerson(const Person* const pThePerson)
{
   // error - non-const member function called
   cout << pThePerson->GetName() << endl;
}

void PrintPerson2(const Person& thePerson)
{
   // error - non-const member function called
   cout << thePerson.GetName() << endl;
}

Remember that non-static member functions take as their implicit first parameter a pointer called this, which points to a specific instance of the object. The this pointer is always const - you cannot make this point to anything else (in earlier versions of C++, this was legal).

A const member function in class Person would take a const class Person* const (const pointer to const Person) as its implicit first argument, whereas a non-const member function in class Person would take a class Person* const (const pointer to changeable Person) as its first argument.

The Mutable Storage Specifier

What if you wanted to have a const member function which did an expensive calculation and returned the result? It would be nice to be able to cache this result and avoid recalculation for subsequent calls to the function. But since it's a const member function, you can't store the cached result inside the class, because to do so, you'd have to modify a member variable (thereby violating const).

You could make a fake this pointer using explicit casting:


class MyData
{
   public:
      /*
      the first time, do calculation, cache result in m_lCache, and set
      m_bCacheValid to TRUE. In subsequent calls, if m_bCacheValid is TRUE
      then return m_lCache instead of recalculating
      */

      long ExpensiveCalculation() const
      {
         if (FALSE == m_bCacheValid)
         {
            MyData* fakeThis = const_cast<MyData*>this;
            fakeThis->m_bCacheValid = TRUE;
            fakeThis->m_lCache = ::SomeFormula(m_internalData);
         }
         return m_lCache;
      };

      // change internal data and set m_bCacheValid to FALSE to force recalc next time
      void ChangeData()
      {
         m_bCacheValid = FALSE;
         m_internalData = ::SomethingElse();
      };

   private:

      data m_internalData;
      long m_lCache;
      bool m_bCacheValid;
            
};

This works, but it's somewhat ugly and unintuitive. The mutable storage specifier was added for this reason. A mutable member variable can be modified even by const member functions. With mutable, you can distinguish between "abstract const", where the user cannot tell that anything has been changed inside the class, and "concrete const", where the implementation will not modify anything, period. This caching of results is a perfect example of abstract const-ness. Anyone calling the const member function will not know or care whether the result has been cached or recalculated. For example:
class MyData
{
   public:
      /*
      the first time, do calculation, cache result in m_lCache, and set
      m_bCacheValid to TRUE. In subsequent calls, if m_bCacheValid is TRUE
      then return m_lCache instead of recalculating
      */

      long ExpensiveCalculation() const
      {
         if (FALSE == m_bCacheValid)
         {
            m_bCacheValid = TRUE;
            m_lCache = ::SomeFormula(m_internalData);
         }
         return m_lCache;
      };

      // change data and set m_bCacheValid to FALSE to force recalc next time
      void ChangeData()
      {
      };

   private:

      data m_internalData;
      mutable long m_lCache;
      mutable bool m_bCacheValid;
            
};

References

This paper represents a synthesis and compilation of information from the following sources:
C Scene Official Web Site : http://cscene.oftheinter.net
C Scene Official Email : cscene@mindless.com
This page is Copyright © 1997 By C Scene. All Rights Reserved