Pointer Address Assignment

Pointers

In earlier chapters, variables have been explained as locations in the computer's memory which can be accessed by their identifier (their name). This way, the program does not need to care about the physical address of the data in memory; it simply uses the identifier whenever it needs to refer to the variable.

For a C++ program, the memory of a computer is like a succession of memory cells, each one byte in size, and each with a unique address. These single-byte memory cells are ordered in a way that allows data representations larger than one byte to occupy memory cells that have consecutive addresses.

This way, each cell can be easily located in the memory by means of its unique address. For example, the memory cell with the address always follows immediately after the cell with address and precedes the one with , and is exactly one thousand cells after and exactly one thousand cells before .

When a variable is declared, the memory needed to store its value is assigned a specific location in memory (its memory address). Generally, C++ programs do not actively decide the exact memory addresses where its variables are stored. Fortunately, that task is left to the environment where the program is run - generally, an operating system that decides the particular memory locations on runtime. However, it may be useful for a program to be able to obtain the address of a variable during runtime in order to access data cells that are at a certain position relative to it.

Address-of operator (&)

The address of a variable can be obtained by preceding the name of a variable with an ampersand sign (), known as address-of operator. For example:



This would assign the address of variable to ; by preceding the name of the variable with the address-of operator (), we are no longer assigning the content of the variable itself to , but its address.

The actual address of a variable in memory cannot be known before runtime, but let's assume, in order to help clarify some concepts, that is placed during runtime in the memory address .

In this case, consider the following code fragment:



The values contained in each variable after the execution of this are shown in the following diagram:



First, we have assigned the value to (a variable whose address in memory we assumed to be ).

The second statement assigns the address of , which we have assumed to be .

Finally, the third statement, assigns the value contained in to . This is a standard assignment operation, as already done many times in earlier chapters.

The main difference between the second and third statements is the appearance of the address-of operator ().

The variable that stores the address of another variable (like in the previous example) is what in C++ is called a pointer. Pointers are a very powerful feature of the language that has many uses in lower level programming. A bit later, we will see how to declare and use pointers.

Dereference operator (*)

As just seen, a variable which stores the address of another variable is called a pointer. Pointers are said to "point to" the variable whose address they store.

An interesting property of pointers is that they can be used to access the variable they point to directly. This is done by preceding the pointer name with the dereference operator (). The operator itself can be read as "value pointed to by".

Therefore, following with the values of the previous example, the following statement:



This could be read as: " equal to value pointed to by ", and the statement would actually assign the value to , since is , and the value pointed to by (following the example above) would be .


It is important to clearly differentiate that refers to the value , while (with an asterisk preceding the identifier) refers to the value stored at address , which in this case is . Notice the difference of including or not including the dereference operator (I have added an explanatory comment of how each of these two expressions could be read):



The reference and dereference operators are thus complementary:
  • is the address-of operator, and can be read simply as "address of"
  • is the dereference operator, and can be read as "value pointed to by"

Thus, they have sort of opposite meanings: An address obtained with can be dereferenced with .

Earlier, we performed the following two assignment operations:



Right after these two statements, all of the following expressions would give true as result:



The first expression is quite clear, considering that the assignment operation performed on was . The second one uses the address-of operator (), which returns the address of , which we assumed it to have a value of . The third one is somewhat obvious, since the second expression was true and the assignment operation performed on was . The fourth expression uses the dereference operator () that can be read as "value pointed to by", and the value pointed to by is indeed .

So, after all that, you may also infer that for as long as the address pointed to by remains unchanged, the following expression will also be true:



Declaring pointers

Due to the ability of a pointer to directly refer to the value that it points to, a pointer has different properties when it points to a than when it points to an or a . Once dereferenced, the type needs to be known. And for that, the declaration of a pointer needs to include the data type the pointer is going to point to.

The declaration of pointers follows this syntax:



where is the data type pointed to by the pointer. This type is not the type of the pointer itself, but the type of the data the pointer points to. For example:



These are three declarations of pointers. Each one is intended to point to a different data type, but, in fact, all of them are pointers and all of them are likely going to occupy the same amount of space in memory (the size in memory of a pointer depends on the platform where the program runs). Nevertheless, the data to which they point to do not occupy the same amount of space nor are of the same type: the first one points to an , the second one to a , and the last one to a . Therefore, although these three example variables are all of them pointers, they actually have different types: , , and respectively, depending on the type they point to.

Note that the asterisk () used when declaring a pointer only means that it is a pointer (it is part of its type compound specifier), and should not be confused with the dereference operator seen a bit earlier, but which is also written with an asterisk (). They are simply two different things represented with the same sign.

Let's see an example on pointers:



Notice that even though neither nor are directly set any value in the program, both end up with a value set indirectly through the use of . This is how it happens:

First, is assigned the address of firstvalue using the address-of operator (). Then, the value pointed to by is assigned a value of . Because, at this moment, is pointing to the memory location of , this in fact modifies the value of .

In order to demonstrate that a pointer may point to different variables during its lifetime in a program, the example repeats the process with and that same pointer, .

Here is an example a little bit more elaborated:



Each assignment operation includes a comment on how each line could be read: i.e., replacing ampersands () by "address of", and asterisks () by "value pointed to by".

Notice that there are expressions with pointers and , both with and without the dereference operator (). The meaning of an expression using the dereference operator (*) is very different from one that does not. When this operator precedes the pointer name, the expression refers to the value being pointed, while when a pointer name appears without this operator, it refers to the value of the pointer itself (i.e., the address of what the pointer is pointing to).

Another thing that may call your attention is the line:



This declares the two pointers used in the previous example. But notice that there is an asterisk () for each pointer, in order for both to have type (pointer to ). This is required due to the precedence rules. Note that if, instead, the code was:



would indeed be of type , but would be of type . Spaces do not matter at all for this purpose. But anyway, simply remembering to put one asterisk per pointer is enough for most pointer users interested in declaring multiple pointers per statement. Or even better: use a different statement for each variable.

Pointers and arrays

The concept of arrays is related to that of pointers. In fact, arrays work very much like pointers to their first elements, and, actually, an array can always be implicitly converted to the pointer of the proper type. For example, consider these two declarations:



The following assignment operation would be valid:



After that, and would be equivalent and would have very similar properties. The main difference being that can be assigned a different address, whereas can never be assigned anything, and will always represent the same block of 20 elements of type . Therefore, the following assignment would not be valid:



Let's see an example that mixes arrays and pointers:



Pointers and arrays support the same set of operations, with the same meaning for both. The main difference being that pointers can be assigned new addresses, while arrays cannot.

In the chapter about arrays, brackets () were explained as specifying the index of an element of the array. Well, in fact these brackets are a dereferencing operator known as offset operator. They dereference the variable they follow just as does, but they also add the number between brackets to the address being dereferenced. For example:



These two expressions are equivalent and valid, not only if is a pointer, but also if is an array. Remember that if an array, its name can be used just like a pointer to its first element.

Pointer initialization

Pointers can be initialized to point to specific locations at the very moment they are defined:



The resulting state of variables after this code is the same as after:



When pointers are initialized, what is initialized is the address they point to (i.e., ), never the value being pointed (i.e., ). Therefore, the code above shall not be confused with:



Which anyway would not make much sense (and is not valid code).

The asterisk () in the pointer declaration (line 2) only indicates that it is a pointer, it is not the dereference operator (as in line 3). Both things just happen to use the same sign: . As always, spaces are not relevant, and never change the meaning of an expression.

Pointers can be initialized either to the address of a variable (such as in the case above), or to the value of another pointer (or array):



Pointer arithmetics

To conduct arithmetical operations on pointers is a little different than to conduct them on regular integer types. To begin with, only addition and subtraction operations are allowed; the others make no sense in the world of pointers. But both addition and subtraction have a slightly different behavior with pointers, according to the size of the data type to which they point.

When fundamental data types were introduced, we saw that types have different sizes. For example: always has a size of 1 byte, is generally larger than that, and and are even larger; the exact size of these being dependent on the system. For example, let's imagine that in a given system, takes 1 byte, takes 2 bytes, and takes 4.

Suppose now that we define three pointers in this compiler:



and that we know that they point to the memory locations , , and , respectively.

Therefore, if we write:



, as one would expect, would contain the value 1001. But not so obviously, would contain the value 2002, and would contain 3004, even though they have each been incremented only once. The reason is that, when adding one to a pointer, the pointer is made to point to the following element of the same type, and, therefore, the size in bytes of the type it points to is added to the pointer.


This is applicable both when adding and subtracting any number to a pointer. It would happen exactly the same if we wrote:



Regarding the increment () and decrement () operators, they both can be used as either prefix or suffix of an expression, with a slight difference in behavior: as a prefix, the increment happens before the expression is evaluated, and as a suffix, the increment happens after the expression is evaluated. This also applies to expressions incrementing and decrementing pointers, which can become part of more complicated expressions that also include dereference operators (). Remembering operator precedence rules, we can recall that postfix operators, such as increment and decrement, have higher precedence than prefix operators, such as the dereference operator (). Therefore, the following expression:



is equivalent to . And what it does is to increase the value of (so it now points to the next element), but because is used as postfix, the whole expression is evaluated as the value pointed originally by the pointer (the address it pointed to before being incremented).

Essentially, these are the four possible combinations of the dereference operator with both the prefix and suffix versions of the increment operator (the same being applicable also to the decrement operator):



A typical -but not so simple- statement involving these operators is:



Because has a higher precedence than , both and are incremented, but because both increment operators () are used as postfix and not prefix, the value assigned to is before both and are incremented. And then both are incremented. It would be roughly equivalent to:



Like always, parentheses reduce confusion by adding legibility to expressions.

Pointers and const

Pointers can be used to access a variable by its address, and this access may include modifying the value pointed. But it is also possible to declare pointers that can access the pointed value to read it, but not to modify it. For this, it is enough with qualifying the type pointed to by the pointer as . For example:



Here points to a variable, but points to it in a -qualified manner, meaning that it can read the value pointed, but it cannot modify it. Note also, that the expression is of type , but this is assigned to a pointer of type . This is allowed: a pointer to non-const can be implicitly converted to a pointer to const. But not the other way around! As a safety feature, pointers to are not implicitly convertible to pointers to non-.

One of the use cases of pointers to elements is as function parameters: a function that takes a pointer to non- as parameter can modify the value passed as argument, while a function that takes a pointer to as parameter cannot.



Note that uses pointers that point to constant elements. These pointers point to constant content they cannot modify, but they are not constant themselves: i.e., the pointers can still be incremented or assigned different addresses, although they cannot modify the content they point to.

And this is where a second dimension to constness is added to pointers: Pointers can also be themselves const. And this is specified by appending const to the pointed type (after the asterisk):



The syntax with and pointers is definitely tricky, and recognizing the cases that best suit each use tends to require some experience. In any case, it is important to get constness with pointers (and references) right sooner rather than later, but you should not worry too much about grasping everything if this is the first time you are exposed to the mix of and pointers. More use cases will show up in coming chapters.

To add a little bit more confusion to the syntax of with pointers, the qualifier can either precede or follow the pointed type, with the exact same meaning:



As with the spaces surrounding the asterisk, the order of const in this case is simply a matter of style. This chapter uses a prefix , as for historical reasons this seems to be more extended, but both are exactly equivalent. The merits of each style are still intensely debated on the internet.

Pointers and string literals

As pointed earlier, string literals are arrays containing null-terminated character sequences. In earlier sections, string literals have been used to be directly inserted into , to initialize strings and to initialize arrays of characters.

But they can also be accessed directly. String literals are arrays of the proper array type to contain all its characters plus the terminating null-character, with each of the elements being of type (as literals, they can never be modified). For example:



This declares an array with the literal representation for , and then a pointer to its first element is assigned to . If we imagine that is stored at the memory locations that start at address 1702, we can represent the previous declaration as:


Note that here is a pointer and contains the value 1702, and not , nor , although 1702 indeed is the address of both of these.

The pointer points to a sequence of characters. And because pointers and arrays behave essentially in the same way in expressions, can be used to access the characters in the same way arrays of null-terminated character sequences are. For example:



Both expressions have a value of (the fifth element of the array).

Pointers to pointers

C++ allows the use of pointers that point to pointers, that these, in its turn, point to data (or even to other pointers). The syntax simply requires an asterisk () for each level of indirection in the declaration of the pointer:



This, assuming the randomly chosen memory locations for each variable of , , and , could be represented as:


With the value of each variable represented inside its corresponding cell, and their respective addresses in memory represented by the value under them.

The new thing in this example is variable , which is a pointer to a pointer, and can be used in three different levels of indirection, each one of them would correspond to a different value:

  • is of type and a value of
  • is of type and a value of
  • is of type and a value of

void pointers

The type of pointer is a special type of pointer. In C++, represents the absence of type. Therefore, pointers are pointers that point to a value that has no type (and thus also an undetermined length and undetermined dereferencing properties).

This gives pointers a great flexibility, by being able to point to any data type, from an integer value or a float to a string of characters. In exchange, they have a great limitation: the data pointed to by them cannot be directly dereferenced (which is logical, since we have no type to dereference to), and for that reason, any address in a pointer needs to be transformed into some other pointer type that points to a concrete data type before being dereferenced.

One of its possible uses may be to pass generic parameters to a function. For example:



is an operator integrated in the C++ language that returns the size in bytes of its argument. For non-dynamic data types, this value is a constant. Therefore, for example, is 1, because has always a size of one byte.

Invalid pointers and null pointers

In principle, pointers are meant to point to valid addresses, such as the address of a variable or the address of an element in an array. But pointers can actually point to any address, including addresses that do not refer to any valid element. Typical examples of this are uninitialized pointers and pointers to nonexistent elements of an array:



Neither nor point to addresses known to contain a value, but none of the above statements causes an error. In C++, pointers are allowed to take any address value, no matter whether there actually is something at that address or not. What can cause an error is to dereference such a pointer (i.e., actually accessing the value they point to). Accessing such a pointer causes undefined behavior, ranging from an error during runtime to accessing some random value.

But, sometimes, a pointer really needs to explicitly point to nowhere, and not just an invalid address. For such cases, there exists a special value that any pointer type can take: the null pointer value. This value can be expressed in C++ in two ways: either with an integer value of zero, or with the keyword:



Here, both and are null pointers, meaning that they explicitly point to nowhere, and they both actually compare equal: all null pointers compare equal to other null pointers. It is also quite usual to see the defined constant be used in older code to refer to the null pointer value:



is defined in several headers of the standard library, and is defined as an alias of some null pointer constant value (such as or ).

Do not confuse null pointers with pointers! A null pointer is a value that any pointer can take to represent that it is pointing to "nowhere", while a pointer is a type of pointer that can point to somewhere without a specific type. One refers to the value stored in the pointer, and the other to the type of data it points to.

Pointers to functions

C++ allows operations with pointers to functions. The typical use of this is for passing a function as an argument to another function. Pointers to functions are declared with the same syntax as a regular function declaration, except that the name of the function is enclosed between parentheses () and an asterisk () is inserted before the name:



In the example above, is a pointer to a function that has two parameters of type . It is directly initialized to point to the function :


In computer science, a pointer is a programming language object, whose value refers to (or "points to") another value stored elsewhere in the computer memory using its memory address. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on the indexed page.

Pointers to data significantly improve performance for repetitive operations such as traversing strings, lookup tables, control tables and tree structures. In particular, it is often much cheaper in time and space to copy and dereference pointers than it is to copy and access the data to which the pointers point.

Pointers are also used to hold the addresses of entry points for called subroutines in procedural programming and for run-time linking to dynamic link libraries (DLLs). In object-oriented programming, pointers to functions are used for bindingmethods, often using what are called virtual method tables.

A pointer is a simple, more concrete implementation of the more abstract referencedata type. Several languages support some type of pointer, although some have more restrictions on their use than others. While "pointer" has been used to refer to references in general, it more properly applies to data structures whose interface explicitly allows the pointer to be manipulated (arithmetically via pointer arithmetic) as a memory address, as opposed to a magic cookie or capability where this is not possible.[citation needed] Because pointers allow both protected and unprotected access to memory addresses, there are risks associated with using them particularly in the latter case. Primitive pointers are often stored in a format similar to an integer; however, attempting to dereference or "look up" a pointer whose value was never a valid memory address would cause a program to crash. To alleviate this potential problem, as a matter of type safety, pointers are considered a separate type parameterized by the type of data they point to, even if the underlying representation is an integer. Other measures may also be taken (such as validation & bounds checking), to verify the contents of the pointer variable contain a value that is both a valid memory address and within the numerical range that the processor is capable of addressing.

History[edit]

Harold Lawson is credited with the 1964 invention of the pointer.[2] In 2000, Lawson was presented the Computer Pioneer Award by the IEEE “[f]or inventing the pointer variable and introducing this concept into PL/I, thus providing for the first time, the capability to flexibly treat linked lists in a general-purpose high level language”.[3] According to the Oxford English Dictionary, the wordpointer first appeared in print as a stack pointer in a technical memorandum by the System Development Corporation.

Formal description[edit]

In computer science, a pointer is a kind of reference.

A data primitive (or just primitive) is any datum that can be read from or written to computer memory using one memory access (for instance, both a byte and a word are primitives).

A data aggregate (or just aggregate) is a group of primitives that are logically contiguous in memory and that are viewed collectively as one datum (for instance, an aggregate could be 3 logically contiguous bytes, the values of which represent the 3 coordinates of a point in space). When an aggregate is entirely composed of the same type of primitive, the aggregate may be called an array; in a sense, a multi-byte word primitive is an array of bytes, and some programs use words in this way.

In the context of these definitions, a byte is the smallest primitive; each memory address specifies a different byte. The memory address of the initial byte of a datum is considered the memory address (or base memory address) of the entire datum.

A memory pointer (or just pointer) is a primitive, the value of which is intended to be used as a memory address; it is said that a pointer points to a memory address. It is also said that a pointer points to a datum [in memory] when the pointer's value is the datum's memory address.

More generally, a pointer is a kind of reference, and it is said that a pointer references a datum stored somewhere in memory; to obtain that datum is to dereference the pointer. The feature that separates pointers from other kinds of reference is that a pointer's value is meant to be interpreted as a memory address, which is a rather low-level concept.

References serve as a level of indirection: A pointer's value determines which memory address (that is, which datum) is to be used in a calculation. Because indirection is a fundamental aspect of algorithms, pointers are often expressed as a fundamental data type in programming languages; in statically (or strongly) typed programming languages, the type of a pointer determines the type of the datum to which the pointer points.

Use in data structures[edit]

When setting up data structures like lists, queues and trees, it is necessary to have pointers to help manage how the structure is implemented and controlled. Typical examples of pointers are start pointers, end pointers, and stack pointers. These pointers can either be absolute (the actual physical address or a virtual address in virtual memory) or relative (an offset from an absolute start address ("base") that typically uses fewer bits than a full address, but will usually require one additional arithmetic operation to resolve).

Relative addresses are a form of manual memory segmentation, and share many of its advantages and disadvantages. A two-byte offset, containing a 16-bit, unsigned integer, can be used to provide relative addressing for up to 64 kilobytes of a data structure. This can easily be extended to 128K, 256K or 512K if the address pointed to is forced to be aligned on a half-word, word or double-word boundary (but, requiring an additional "shift left" bitwise operation—by 1, 2 or 3 bits—in order to adjust the offset by a factor of 2, 4 or 8, before its addition to the base address). Generally, though, such schemes are a lot of trouble, and for convenience to the programmer absolute addresses (and underlying that, a flat address space) is preferred.

A one byte offset, such as the hexadecimal ASCII value of a character (e.g. X'29') can be used to point to an alternative integer value (or index) in an array (e.g. X'01'). In this way, characters can be very efficiently translated from 'raw data' to a usable sequential index and then to an absolute address without a lookup table.

Use in control tables[edit]

Control tables that are used to control program flow usually make extensive use of pointers. The pointers, usually embedded in a table entry, may, for instance, be used to hold the entry points to subroutines to be executed, based on certain conditions defined in the same table entry. The pointers can however be simply indexes to other separate, but associated, tables comprising an array of the actual addresses or the addresses themselves (depending upon the programming language constructs available). They can also be used to point to earlier table entries (as in loop processing) or forward to skip some table entries (as in a switch or "early" exit from a loop). For this latter purpose, the "pointer" may simply be the table entry number itself and can be transformed into an actual address by simple arithmetic.

Architectural roots[edit]

Pointers are a very thin abstraction on top of the addressing capabilities provided by most modern architectures. In the simplest scheme, an address, or a numeric index, is assigned to each unit of memory in the system, where the unit is typically either a byte or a word – depending on whether the architecture is byte-addressable or word-addressable – effectively transforming all of memory into a very large array. The system would then also provide an operation to retrieve the value stored in the memory unit at a given address (usually utilizing the machine's general purpose registers).

In the usual case, a pointer is large enough to hold more addresses than there are units of memory in the system. This introduces the possibility that a program may attempt to access an address which corresponds to no unit of memory, either because not enough memory is installed (i.e. beyond the range of available memory) or the architecture does not support such addresses. The first case may, in certain platforms such as the Intel x86 architecture, be called a segmentation fault (segfault). The second case is possible in the current implementation of AMD64, where pointers are 64 bit long and addresses only extend to 48 bits. Pointers must conform to certain rules (canonical addresses), so if a non-canonical pointer is dereferenced, the processor raises a general protection fault.

On the other hand, some systems have more units of memory than there are addresses. In this case, a more complex scheme such as memory segmentation or paging is employed to use different parts of the memory at different times. The last incarnations of the x86 architecture support up to 36 bits of physical memory addresses, which were mapped to the 32-bit linear address space through the PAE paging mechanism. Thus, only 1/16 of the possible total memory may be accessed at a time. Another example in the same computer family was the 16-bit protected mode of the 80286 processor, which, though supporting only 16 MB of physical memory, could access up to 1 GB of virtual memory, but the combination of 16-bit address and segment registers made accessing more than 64 KB in one data structure cumbersome.

In order to provide a consistent interface, some architectures provide memory-mapped I/O, which allows some addresses to refer to units of memory while others refer to device registers of other devices in the computer. There are analogous concepts such as file offsets, array indices, and remote object references that serve some of the same purposes as addresses for other types of objects.

Uses[edit]

Pointers are directly supported without restrictions in languages such as PL/I, C, C++, Pascal, FreeBASIC, and implicitly in most assembly languages. They are primarily used for constructing references, which in turn are fundamental to constructing nearly all data structures, as well as in passing data between different parts of a program.

In functional programming languages that rely heavily on lists, pointers and references are managed abstractly by the language using internal constructs like cons.

When dealing with arrays, the critical lookup operation typically involves a stage called address calculation which involves constructing a pointer to the desired data element in the array. In other data structures, such as linked lists, pointers are used as references to explicitly tie one piece of the structure to another.

Pointers are used to pass parameters by reference. This is useful if the programmer wants a function's modifications to a parameter to be visible to the function's caller. This is also useful for returning multiple values from a function.

Pointers can also be used to allocate and deallocate dynamic variables and arrays in memory. Since a variable will often become redundant after it has served its purpose, it is a waste of memory to keep it, and therefore it is good practice to deallocate it (using the original pointer reference) when it is no longer needed. Failure to do so may result in a memory leak (where available free memory gradually, or in severe cases rapidly, diminishes because of an accumulation of numerous redundant memory blocks).

C pointers[edit]

The basic syntax to define a pointer is:[4]

This declares as the identifier of an object of the following type:

  • pointer that points to an object of type

This is usually stated more succinctly as " is a pointer to ."

Because the C language does not specify an implicit initialization for objects of automatic storage duration,[5] care should often be taken to ensure that the address to which points is valid; this is why it is sometimes suggested that a pointer be explicitly initialized to the null pointer value, which is traditionally specified in C with the standardized macro :[6]

Dereferencing a null pointer in C produces undefined behavior,[7] which could be catastrophic. However, most implementations[citation needed] simply halt execution of the program in question, usually with a segmentation fault.

However, initializing pointers unnecessarily could hinder program analysis, thereby hiding bugs.

In any case, once a pointer has been declared, the next logical step is for it to point at something:

inta=5;int*ptr=NULL;ptr=&a;

This assigns the value of the address of to . For example, if is stored at memory location of 0x8130 then the value of will be 0x8130 after the assignment. To dereference the pointer, an asterisk is used again:

This means take the contents of (which is 0x8130), "locate" that address in memory and set its value to 8. If is later accessed again, its new value will be 8.

This example may be clearer if memory is examined directly. Assume that is located at address 0x8130 in memory and at 0x8134; also assume this is a 32-bit machine such that an int is 32-bits wide. The following is what would be in memory after the following code snippet is executed:

inta=5;int*ptr=NULL;
AddressContents
0x81300x00000005
0x81340x00000000

(The NULL pointer shown here is 0x00000000.) By assigning the address of to :

yields the following memory values:

AddressContents
0x81300x00000005
0x81340x00008130

Then by dereferencing by coding:

the computer will take the contents of (which is 0x8130), 'locate' that address, and assign 8 to that location yielding the following memory:

AddressContents
0x81300x00000008
0x81340x00008130

Clearly, accessing will yield the value of 8 because the previous instruction modified the contents of by way of the pointer .

C arrays[edit]

In C, array indexing is formally defined in terms of pointer arithmetic; that is, the language specification requires that be equivalent to .[8] Thus in C, arrays can be thought of as pointers to consecutive areas of memory (with no gaps),[8] and the syntax for accessing arrays is identical for that which can be used to dereference pointers. For example, an array can be declared and used in the following manner:

intarray[5];/* Declares 5 contiguous integers */int*ptr=array;/* Arrays can be used as pointers */ptr[0]=1;/* Pointers can be indexed with array syntax */*(array+1)=2;/* Arrays can be dereferenced with pointer syntax */*(1+array)=2;/* Pointer addition is commutative */2[array]=4;/* Subscript operator is commutative */

This allocates a block of five integers and names the block , which acts as a pointer to the block. Another common use of pointers is to point to dynamically allocated memory from malloc which returns a consecutive block of memory of no less than the requested size that can be used as an array.

While most operators on arrays and pointers are equivalent, the result of the operator differs. In this example, will evaluate to (the size of the array), while will evaluate to , the size of the pointer itself.

Default values of an array can be declared like:

intarray[5]={2,4,3,1,5};

If is located in memory starting at address 0x1000 on a 32-bit little-endian machine then memory will contain the following (values are in hexadecimal, like the addresses):

0123
10002000
10044000
10083000
100C1000
10105000

Represented here are five integers: 2, 4, 3, 1, and 5. These five integers occupy 32 bits (4 bytes) each with the least-significant byte stored first (this is a little-endian CPU architecture) and are stored consecutively starting at address 0x1000.

The syntax for C with pointers is:

  • means 0x1000;
  • means 0x1004: the "+ 1" means to add the size of 1 , which is 4 bytes;
  • means to dereference the contents of . Considering the contents as a memory address (0x1000), look up the value at that location (0x0002);
  • means element number , 0-based, of which is translated into .

The last example is how to access the contents of . Breaking it down:

  • is the memory location of the (i)th element of , starting at i=0;
  • takes that memory address and dereferences it to access the value.

C linked list[edit]

Below is an example definition of a linked list in C.

/* the empty linked list is represented by NULL * or some other sentinel value */#define EMPTY_LIST NULLstructlink{voiddata;/* data of this link */structlink*next;/* next link; EMPTY_LIST if there is none */};

This pointer-recursive definition is essentially the same as the reference-recursive definition from the Haskell programming language:

dataLinka=Nil|Consa(Linka)

is the empty list, and is a cons cell of type with another link also of type .

The definition with references, however, is type-checked and does not use potentially confusing signal values. For this reason, data structures in C are usually dealt with via wrapper functions, which are carefully checked for correctness.

Pass-by-address using pointers[edit]

Pointers can be used to pass variables by their address, allowing their value to be changed. For example, consider the following C code:

/* a copy of the int n can be changed within the function without affecting the calling code */voidpassByValue(intn){n=12;}/* a pointer to m is passed instead. No copy of m itself is created */voidpassByAddress(int*m){*m=14;}intmain(void){intx=3;/* pass a copy of x's value as the argument */passByValue(x);// the value was changed inside the function, but x is still 3 from here on/* pass x's address as the argument */passByAddress(&x);// x was actually changed by the function and is now equal to 14 herereturn0;}

Dynamic memory allocation[edit]

In some programs, the required memory depends on what the user may enter. In such cases the programmer needs to allocate memory dynamically. This is done by allocating memory at the heap rather than on the stack, where variables usually are stored. (Variables can also be stored in the CPU registers, but that's another matter) Dynamic memory allocation can only be made through pointers, and names (like with common variables) can't be given.

Pointers are used to store and manage the addresses of dynamically allocated blocks of memory. Such blocks are used to store data objects or arrays of objects. Most structured and object-oriented languages provide an area of memory, called the heap or free store, from which objects are dynamically allocated.

The example C code below illustrates how structure objects are dynamically allocated and referenced. The standard C library provides the function for allocating memory blocks from the heap. It takes the size of an object to allocate as a parameter and returns a pointer to a newly allocated block of memory suitable for storing the object, or it returns a null pointer if the allocation failed.

/* Parts inventory item */structItem{intid;/* Part number */char*name;/* Part name */floatcost;/* Cost */};/* Allocate and initialize a new Item object */structItem*make_item(constchar*name){structItem*item;/* Allocate a block of memory for a new Item object */item=(structItem*)malloc(sizeof(structItem));if(item==NULL)returnNULL;/* Initialize the members of the new Item */memset(item,0,sizeof(structItem));item->id=-1;item->name=NULL;item->cost=0.0;/* Save a copy of the name in the new Item */item->name=(char*)malloc(strlen(name)+1);if(item->name==NULL){free(item);returnNULL;}strcpy(item->name,name);/* Return the newly created Item object */returnitem;}

The code below illustrates how memory objects are dynamically deallocated, i.e., returned to the heap or free store. The standard C library provides the function for deallocating a previously allocated memory block and returning it back to the heap.

/* Deallocate an Item object */voiddestroy_item(structItem*item){/* Check for a null object pointer */if(item==NULL)return;/* Deallocate the name string saved within the Item */if(item->name!=NULL){free(item->name);item->name=NULL;}/* Deallocate the Item object itself */free(item);}

Memory-mapped hardware[edit]

On some computing architectures, pointers can be used to directly manipulate memory or memory-mapped devices.

Assigning addresses to pointers is an invaluable tool when programming microcontrollers. Below is a simple example declaring a pointer of type int and initialising it to a hexadecimal address in this example the constant 0x7FFF:

int*hardware_address=(int*)0x7FFF;

In the mid 80s, using the BIOS to access the video capabilities of PCs was slow. Applications that were display-intensive typically used to access CGA video memory directly by casting the hexadecimal constant 0xB8000 to a pointer to an array of 80 unsigned 16-bit int values. Each value consisted of an ASCII code in the low byte, and a colour in the high byte. Thus, to put the letter 'A' at row 5, column 2 in bright white on blue, one would write code like the following:

#define VID ((unsigned short (*)[80])0xB8000)voidfoo(void){VID[4][1]=0x1F00|'A';}

Typed pointers and casting[edit]

In many languages, pointers have the additional restriction that the object they point to has a specific type. For example, a pointer may be declared to point to an integer; the language will then attempt to prevent the programmer from pointing it to objects which are not integers, such as floating-point numbers, eliminating some errors.

For example, in C

would be an integer pointer and would be a char pointer. The following would yield a compiler warning of "assignment from incompatible pointer type" under GCC

because and were declared with different types. To suppress the compiler warning, it must be made explicit that you do indeed wish to make the assignment by typecasting it

which says to cast the integer pointer of to a char pointer and assign to .

A 2005 draft of the C standard requires that casting a pointer derived from one type to one of another type should maintain the alignment correctness for both types (6.3.2.3 Pointers, par. 7):[9]

char*external_buffer="abcdef";int*internal_data;internal_data=(int*)external_buffer;// UNDEFINED BEHAVIOUR if "the resulting pointer// is not correctly aligned"

In languages that allow pointer arithmetic, arithmetic on pointers takes into account the size of the type. For example, adding an integer number to a pointer produces another pointer that points to an address that is higher by that number times the size of the type. This allows us to easily compute the address of elements of an array of a given type, as was shown in the C arrays example above. When a pointer of one type is cast to another type of a different size, the programmer should expect that pointer arithmetic will be calculated differently. In C, for example, if the array starts at 0x2000 and is 4 bytes whereas is 1 byte, then will point to 0x2004, but would point to 0x2001. Other risks of casting include loss of data when "wide" data is written to "narrow" locations (e.g. ), unexpected results when bit-shifting values, and comparison problems, especially with signed vs unsigned values.

Although it is impossible in general to determine at compile-time which casts are safe, some languages store run-time type information which can be used to confirm that these dangerous casts are valid at runtime. Other languages merely accept a conservative approximation of safe casts, or none at all.

Making pointers safer[edit]

As a pointer allows a program to attempt to access an object that may not be defined, pointers can be the origin of a variety of programming errors. However, the usefulness of pointers is so great that it can be difficult to perform programming tasks without them. Consequently, many languages have created constructs designed to provide some of the useful features of pointers without some of their pitfalls, also sometimes referred to as pointer hazards. In this context, pointers that directly address memory (as used in this article) are referred to as raw pointers, by contrast with smart pointers or other variants.

One major problem with pointers is that as long as they can be directly manipulated as a number, they can be made to point to unused addresses or to data which is being used for other purposes. Many languages, including most functional programming languages and recent imperative languages like Java, replace pointers with a more opaque type of reference, typically referred to as simply a reference, which can only be used to refer to objects and not manipulated as numbers, preventing this type of error. Array indexing is handled as a special case.

A pointer which does not have any address assigned to it is called a wild pointer. Any attempt to use such uninitialized pointers can cause unexpected behavior, either because the initial value is not a valid address, or because using it may damage other parts of the program. The result is often a segmentation fault, storage violation or wild branch (if used as a function pointer or branch address).

In systems with explicit memory allocation, it is possible to create a dangling pointer by deallocating the memory region it points into. This type of pointer is dangerous and subtle because a deallocated memory region may contain the same data as it did before it was deallocated but may be then reallocated and overwritten by unrelated code, unknown to the earlier code. Languages with garbage collection prevent this type of error because deallocation is performed automatically when there are no more references in scope.

Some languages, like C++, support smart pointers, which use a simple form of reference counting to help track allocation of dynamic memory in addition to acting as a reference. In the absence of reference cycles, where an object refers to itself indirectly through a sequence of smart pointers, these eliminate the possibility of dangling pointers and memory leaks. Delphi strings support reference counting natively.

The Rust programming language introduces a borrow checker, pointer lifetimes, and an optimisation based around optional types for null pointers to eliminate pointer bugs, without resorting to a garbage collector.

Null pointer[edit]

Main article: Null pointer

A null pointer has a value reserved for indicating that the pointer does not refer to a valid object. Null pointers are routinely used to represent conditions such as the end of a list of unknown length or the failure to perform some action; this use of null pointers can be compared to nullable types and to the Nothing value in an option type.

Autorelative pointer[edit]

An autorelative pointer is a pointer whose value is interpreted as an offset from the address of the pointer itself; thus, if a data structure has an autorelative pointer member that points to some portion of the data structure itself, then the data structure may be relocated in memory without having to update the value of the auto relative pointer.[10]

The cited patent also uses the term self-relative pointer to mean the same thing. However, the meaning of that term has been used in other ways:

  • to mean an offset from the address of a structure rather than from the address of the pointer itself;[citation needed]
  • to mean a pointer containing its own address, which can be useful for reconstructing in any arbitrary region of memory a collection of data structures that point to each other.[11]

Based pointer[edit]

A based pointer is a pointer whose value is an offset from the value of another pointer. This can be used to store and load blocks of data, assigning the address of the beginning of the block to the base pointer.[12]

Multiple indirection[edit]

In some languages, a pointer can reference another pointer, requiring multiple dereference operations to get to the original value. While each level of indirection may add a performance cost, it is sometimes necessary in order to provide correct behavior for complex data structures. For example, in C it is typical to define a linked list in terms of an element that contains a pointer to the next element of the list:

structelement{structelement*next;intvalue;};structelement*head=NULL;

This implementation uses a pointer to the first element in the list as a surrogate for the entire list. If a new value is added to the beginning of the list, has to be changed to point to the new element. Since C arguments are always passed by value, using double indirection allows the insertion to be implemented correctly, and has the desirable side-effect of eliminating special case code to deal with insertions at the front of the list:

// Given a sorted list at *head, insert the element item at the first// location where all earlier elements have lesser or equal value.voidinsert(structelement**head,structelement*item){structelement**p;// p points to a pointer to an elementfor(p=head;*p!=NULL;p=&(*p)->next){if(item->value<=(*p)->value)break;}item->next=*p;*p=item;}// Caller does this:insert(&head,item);

In this case, if the value of is less than that of , the caller's is properly updated to the address of the new item.

A basic example is in the argv argument to the main function in C (and C++), which is given in the prototype as —this is because the variable itself is a pointer to an array of strings (an array of arrays), so is a pointer to the 0th string (by convention the name of the program), and is the 0th character of the 0th string.

Function pointer[edit]

In some languages, a pointer can reference executable code, i.e., it can point to a function, method, or procedure. A function pointer will store the address of a function to be invoked. While this facility can be used to call functions dynamically, it is often a favorite technique of virus and other malicious software writers.

intsum(intn1,intn2){// Function with two integer parameters returning an integer valuereturnn1+n2;}intmain(void){inta,b,x,y;int(*fp)(int,int);// Function pointer which can point to a function like sumfp=&sum;// fp now points to function sumx=(*fp)(a,b);// Calls function sum with arguments a and by=sum(a,b);// Calls function sum with arguments a and b}

Dangling pointer[edit]

Main article: Dangling pointer

A dangling pointer is a pointer that does not point to a valid object and consequently may make a program crash or behave oddly. In the Pascal or C programming languages, pointers that are not specifically initialized may point to unpredictable addresses in memory.

The following example code shows a dangling pointer:

intfunc(void){char*p1=malloc(sizeof(char));/* (undefined) value of some place on the heap */char*p2;/* dangling (uninitialized) pointer */*p1='a';/* This is OK, assuming malloc() has not returned NULL. */*p2='b';/* This invokes undefined behavior */}

Here, may point to anywhere in memory, so performing the assignment can corrupt an unknown area of memory or trigger a segmentation fault.

Back pointer[edit]

In doubly linked lists or tree structures, a back pointer held on an element 'points back' to the item referring to the current element. These are useful for navigation and manipulation, at the expense of greater memory use.

Pointer declaration syntax overview[edit]

These pointer declarations cover most variants of pointer declarations. Of course it is possible to have triple pointers, but the main principles behind a triple pointer already exist in a double pointer.

charcff[5][5];/* array of arrays of chars */char*cfp[5];/* array of pointers to chars */char**cpp;/* pointer to pointer to char ("double pointer") */char(*cpf)[5];/* pointer to array(s) of chars */char*cpF();/* function which returns a pointer to char(s) */char(*CFp)();/* pointer to a function which returns a char */char(*cfpF())[5];/* function which returns pointer to an array of chars */char(*cpFf[5])();/* an array of pointers to functions which return a char */

The () and [] have a higher priority than *. [13]

Wild branch[edit]

Where a pointer is used as the address of the entry point to a program or start of a function which doesn't return anything and is also either uninitialized or corrupted, if a call or jump is nevertheless made to this address, a "wild branch" is said to have occurred. The consequences are usually unpredictable and the error may present itself in several different ways depending upon whether or not the pointer is a "valid" address and whether or not there is (coincidentally) a valid instruction (opcode) at that address. The detection of a wild branch can present one of the most difficult and frustrating debugging exercises since much of the evidence may already have been destroyed beforehand or by execution of one or more inappropriate instructions at the branch location. If available, an instruction set simulator can usually not only detect a wild branch before it takes effect, but also provide a complete or partial trace of its history.

Simulation using an array index[edit]

It is possible to simulate pointer behavior using an index to an (normally one-dimensional) array.

Primarily for languages which do not support pointers explicitly but do support arrays, the array can be thought of and processed as if it were the entire memory range (within the scope of the particular array) and any index to it can be thought of as equivalent to a general purpose register in assembly language (that points to the individual bytes but whose actual value is relative to the start of the array, not its absolute address in memory). Assuming the array is, say, a contiguous 16 megabyte character data structure, individual bytes (or a string of contiguous bytes within the array) can be directly addressed and manipulated using the name of the array with a 31 bit unsigned integer as the simulated pointer (this is quite similar to the C arrays example shown above). Pointer arithmetic can be simulated by adding or subtracting from the index, with minimal additional overhead compared to genuine pointer arithmetic.

It is even theoretically possible, using the above technique, together with a suitable instruction set simulator to simulate anymachine code or the intermediate (byte code) of any processor/language in another language that does not support pointers at all (for example Java / JavaScript). To achieve this, the binary code can initially be loaded into contiguous bytes of the array for the simulator to "read", interpret and action entirely within the memory contained of the same array. If necessary, to completely avoid buffer overflow problems, bounds checking can usually be actioned for the compiler (or if not, hand coded in the simulator).

Support in various programming languages[edit]

Ada[edit]

Ada is a strongly typed language where all pointers are typed and only safe type conversions are permitted. All pointers are by default initialized to , and any attempt to access data through a pointer causes an exception to be raised. Pointers in Ada are called access types. Ada 83 did not permit arithmetic on access types (although many compiler vendors provided for it as a non-standard feature), but Ada 95 supports “safe” arithmetic on access types via the package .

BASIC[edit]

Several old versions of BASIC for the Windows platform had support for STRPTR() to return the address of a string, and for VARPTR() to return the address of a variable. Visual Basic 5 also had support for OBJPTR() to return the address of an object interface, and for an ADDRESSOF operator to return the address of a function. The types of all of these are integers, but their values are equivalent to those held by pointer types.

Newer dialects of BASIC, such as FreeBASIC or BlitzMax, have exhaustive pointer implementations, however. In FreeBASIC, arithmetic on pointers (equivalent to C's ) are treated as though the pointer was a byte width. pointers cannot be dereferenced, as in C. Also, casting between and any other type's pointers will not generate any warnings.

dimasintegerf=257dimasanyptrg=@fdimasintegerptri=gassert(*i=257)assert((g+4)=(@f+1))

C and C++[edit]

In C and C++ pointers are variables that store addresses and can be null. Each pointer has a type it points to, but one can freely cast between pointer types (but not between a function pointer and an object pointer). A special pointer type called the “void pointer” allows pointing to any (non-function) object, but is limited by the fact that it cannot be dereferenced directly (it shall be cast). The address itself can often be directly manipulated by casting a pointer to and from an integral type of sufficient size, though the results are implementation-defined and may indeed cause undefined behavior; while earlier C standards did not have an integral type that was guaranteed to be large enough, C99 specifies the typedef name defined in , but an implementation need not provide it.

C++ fully supports C pointers and C typecasting. It also supports a new group of typecasting operators to help catch some unintended dangerous casts at compile-time. Since C++11, the C++ standard library also provides smart pointers (, and ) which can be used in some situations as a safer alternative to primitive C pointers. C++ also supports another form of reference, quite different from a pointer, called simply a reference or reference type.

Pointer arithmetic, that is, the ability to modify a pointer's target address with arithmetic operations (as well as magnitude comparisons), is restricted by the language standard to remain within the bounds of a single array object (or just after it), and will otherwise invoke undefined behavior. Adding or subtracting from a pointer moves it by a multiple of the size of its datatype. For example, adding 1 to a pointer to 4-byte integer values will increment the pointer's pointed-to byte-address by 4. This has the effect of incrementing the pointer to point at the next element in a contiguous array of integers—which is often the intended result. Pointer arithmetic cannot be performed on pointers because the void type has no size, and thus the pointed address can not be added to, although gcc and other compilers will perform byte arithmetic on as a non-standard extension, treating it as if it were .

Pointer arithmetic provides the programmer with a single way of dealing with different types: adding and subtracting the number of elements required instead of the actual offset in bytes. (Pointer arithmetic with pointers uses byte offsets, because is 1 by definition.) In particular, the C definition explicitly declares that the syntax , which is the -th element of the array , is equivalent to , which is the content of the element pointed by . This implies that is equivalent to , and one can write, e.g., or equally well to access the fourth element of an array .

While powerful, pointer arithmetic can be a source of computer bugs. It tends to confuse novice programmers, forcing them into different contexts: an expression can be an ordinary arithmetic one or a pointer arithmetic one, and sometimes it is easy to mistake one for the other. In response to this, many modern high-level computer languages (for example Java) do not permit direct access to memory using addresses. Also, the safe C dialect Cyclone addresses many of the issues with pointers. See C programming language for more discussion.

The pointer, or , is supported in ANSI C and C++ as a generic pointer type. A pointer to can store the address of any object (not function), and, in C, is implicitly converted to any other object pointer type on assignment, but it must be explicitly cast if dereferenced. K&R C used for the “type-agnostic pointer” purpose (before ANSI C).

intx=4;void*p1=&x;int*p2=p1;// void* implicitly converted to int*: valid C, but not C++inta=*p2;intb=*(int*)p1;// when dereferencing inline, there is no implicit conversion

C++ does not allow the implicit conversion of to other pointer types, even in assignments. This was a design decision to avoid careless and even unintended casts, though most compilers only output warnings, not errors, when encountering other casts.

intx=4;void*p1=&x;int*p2=p1;// this fails in C++: there is no implicit conversion from void*int*p3=(int*)p1;// C-style castint*p4=static_cast<int*>(p1);// C++ cast

In C++, there is no (reference to void) to complement (pointer to void), because references behave like aliases to the variables they point to, and there can never be a variable whose type is .

Pointer 'a' pointing to the memory address associated with variable 'b'. In this diagram, the computing architecture uses the same address space and data primitive for both pointers and non-pointers; this need not be the case.

0 thoughts on “Pointer Address Assignment”

    -->

Leave a Comment

Your email address will not be published. Required fields are marked *