Exploration: Ownership-2

Introduction

Module 2 includes an exploration on Memory Allocation in which we discussed how each process is allocated its own memory and how that memory is organized into various segments. Once the process has been allocated its memory, the process needs to manage this memory. Programming languages differ in how programs written in that language manage the memory allocated to them. In this exploration, we first contrast different approaches that programming languages take towards memory management. We then study Rust’s approach towards memory management and the safety benefits of this approach.

Approaches to Memory Management

At a high-level, there are three different approaches to memory management.

1. Full Programmer Control

Some languages give great control to the programmer over memory management. C is a primary example of this approach. The functions malloc and calloc allow a C program to dynamically allocate memory on the heap. Memory that is allocated on the heap is not deallocated until program calls the function free on the memory. Thus programmers have complete control over when memory is allocated and deallocated on the heap.

Furthermore, C pointers provide direct access to the memory and pointer arithmetic can be used for manipulating memory addresses. C compilers put very few restrictions on the use of pointers. C’s support for fast access to memory and control over memory management are major reasons for the wide use of C for writing system software. However, if these powerful features are not used in a correct and safe manner, they open the door for dangerous bugs in C code.

Exercise: Memory Related Bugs in C Code

The following C program contains three bugs related to the use of memory and pointers. Review the code comments and run the program to understand these bugs. Then using the code comments, fix these three bugs.

Answer Links to an external site.

2. Garbage Collection

To eliminate bugs due to improper memory management, some languages provide automatic memory management via garbage collection. The basic idea is that the language run-time is responsible for finding when an object is no longer accessible to any code in the program and then reclaiming the memory allocated to that object. The language specification of some programming languages mandate that any implementation of the language must provide garbage collection. Examples of such programming languages include Java and Go. Other languages, such as Python, do not mandate garbage collection, but actual implementations of the language, such as CPython, perform garbage collection.

But the safety provided by automatic memory management via garbage collection is not free and comes with some performance penalties. The language run-time needs to keep track of which objects are still accessible and must not be garbage collected, and which objects are no longer accessible and their associated memory can be reclaimed. Additionally, since the timing of when memory is freed is no longer under the control of the programmer, the performance of a program with garbage collection can be somewhat unpredictable.

3. Programmer Control With Restrictions Enforced By the Compiler

Rust takes a different approach towards memory management providing safety without the overhead of garbage collection. Rust achieves this by restricting how programs can use pointers and pass arguments.

Ownership Rules for Memory Management in Rust

The central feature for memory management in Rust is called ownership. There are three ownership rules that Rust enforces:

  1. A variable owns its value.
  2. When the variable, i.e., the owner of a value, goes out of scope, the value is dropped.
  3. At any given time there is only one owner of a value.

Let us now look at the details of memory management in Rust to understand these concepts.

Memory Allocation

For string literals and constants, the contents are known at compile time. Rust hardcodes these values in a section of the executable binary file.

For variables of a type whose size is known at compile time, Rust allocates memory entirely on the stack. For example, variables of the following types are allocated on the stack:

  • All integer types
  • All floating-point types
  • Boolean type
  • Character type char
  • Tuples if they only contain types that are allocated on the stack

To understand memory allocation and deallocation on the heap, let us examine the String type in Rust. Memory for a variable of type String cannot be completely allocated on the stack as the size of a string may not always be known at compile time. As an example, consider the following snippet of code where we read a user’s input into a variable of type String and thus cannot know the size of the string at compile time:

println!("Please enter your name:");
let mut username = String::new();
io::stdin().read_line(&mut username).expect("Failed to read line");

Here is how Rust allocates memory for variables of type String:

  • Stack memory is allocated for three values which internally represent a String and which are of fixed size
    • A pointer to a buffer allocated on the heap (see below)
    • A length value which is the number of bytes currently stored in the buffer
    • A capacity value which is the size of the buffer in bytes
  • Heap memory is allocated for the buffer in which the string’s data is stored.

If we append a literal to the string, the size of the string will grow. If the new length of the string goes over the capacity value, Rust automatically increases its capacity by allocating heap memory for the string. This ensures that the value of length will always be no more than the value of capacity.

Contrast this automatic heap memory allocation for a growing string in Rust with what the programmer needs to do in a C program to handle a growing string. If a variable in a C program holds the value of one string and now we want this variable to hold the value of a longer string, we need to explicitly request heap memory allocation for the longer string.

Example

Let us look at two programs, one in Rust and the other in C, to understand how memory is allocated by Rust and by C when we want to increase the size of a string.

Let us first consider the following Rust program:

In the program, we declare three variables, str_len, str_capacity and message. All three of these variables are mutable.

  • The variables str_len and str_capacity have integer data type. Since integer types are of fixed size, the memory for these two variables is allocated on the stack.
  • The variable message is of type String.
    • Memory is allocated on the stack for message to hold the three fixed size values that internally represent a String.
    • Initially, heap memory is allocated for a buffer to hold the 5 characters 'H', 'e', 'l', 'l', 'o'.
    • Later in the program when we append the literal " World!" to message, Rust automatically allocates a buffer with enough capacity to hold the 12 characters that are now in the string.

Here is a program in C with similar functionality:

  • The memory for the variable str_len is allocated on the stack.
  • But we are responsible for calling malloc to have memory dynamically allocated for the variable message.
    • We first call malloc so that message can hold the string "Hello".
    • When we want message to hold a longer string, we call malloc again to allocate a bigger buffer.

Memory Deallocation

One of the three ownership rules for Rust states that when the owner of a value goes out of scope, the value is dropped. Rust automatically enforces this rule for memory allocated on the stack as well as memory allocated on the heap by automatically dropping the memory for a variable as soon as the variable goes out of scope.

Example

In the above Rust program the three variable go out of scope when the function print_message returns. As soon as this happens, the memory allocated for these variables is automatically dropped. This means that the buffer allocated on the heap for message is also dropped when print_message returns.

Contrast this with the C program. Here the memory for str_len will be dropped when the stack frame is removed from the stack. However, we need to explicitly call free on both the buffers we had allocated for the variable message. If we had not done this, this memory will not be released until the program ended and we will have a memory leak.

Summary

In this exploration, we studied how Rust provides memory management via the concept of ownership. Rust enforces the ownership rules by rejecting compilation of programs that violate these rules. We explained the rule that every value has an owner, and the rule that a value is automatically dropped when the owner goes out of scope. In the next exploration, we will study the third ownership rule that requires that each value must have one owner at a time.

Additional Resources

Here are some references to learn more about the topics we discussed in this exploration.