There are 2 types in Rust that colloquially could be referred to as a string: String slice (str
) and String
. Now we are going to explore how they work, how they are related and how they are different.
String slice (str
)
String slice (str
) is UTF-8 encoded byte array of a fixed length. The length of the array is fixed at run-time but unknown at compile time. And since the size is unknown for the compiler, it means that str
itself could not be stored on the stack directly and, for example, can't be used as a type for local variables. Hence we work with string slices via a reference &str
: a reference size is known at compile time and can be allocated on the stack.
So, intuitively, we should think of a string slice as a reference to some UTF-8 encoded byte array of a fixed length. Another analogy, is to think of it as a view into a block of memory. Practically, we can even think of it as a read-only view of that data, even though that is not exactly correct (&mut str
is also possible in some cases, though it's rather unusual).
And an underlying referent object could be allocated in the static storage, on the heap or on the stack . Let's consider each case separately:
// [Example 1]:
let s1: &str = "Hi";
// [Example 2]:
let u2: String = String::from("Hi");
let s2: &str = &u2[..];
// [Example 3]:
let mut u3: String = String::from("Hi");
let s3: &mut str = &mut u3[..];
s3.make_ascii_lowercase();
assert_eq!(s3, "hi");
// [Example 4]:
let u4: &[u8] = &[b'H', b'i'];
let s4: &str = str::from_utf8(u4).unwrap();
In [Example 1], we have a string literal Hi
which is stored in the static storage, which basically means that it's hard-coded directly into the final binary. Obviously, it means that, by definition, it's immutable in this case.
In [Example 2], u2
variable points to String
data allocated on the heap. And then we get an immutable view on that String
data (&str
).
In [Example 3], we instead take a mutable view on the data located on the heap, and then we call a mutating method make_ascii_lowercase
. It's quite uncommon to work with &mut str
and there is a reason for that... Since a slice's length is fixed, mutating operations should not lead to any slice size changes. For instance, we should not be able to replace a 1-byte ASCII character with a 2-byte UTF-8 character because that would overflow our slice. Also, we should not be able to apply operations that would attempt to shrink our slice in any way. Both of these situations would violate safety guarantees - which is why only a limited number of safe mutating functions is available for string slices, like make_ascii_lowercase
in our example. This function operates only 1-byte ASCII characters and doesn't change a slice length in any way - so it's safe.
In [Example 4], s4
string slice represents an immutable view on the byte array u4
allocated on the stack (arrays have a fixed size and thus they are allocated on the stack). This example demonstrates that it's possible to have a string slice view on some data on the stack, in case that data has a known size.
In the examples above, our string slices were referencing the full original allocated strings. But one of the main benefits of using slices is that we can take a view on the part of the string, without allocating a new string. For example:
let s: String = String::from("Hi there");
let s1: &str = &s[3..8];
assert_eq!("there", s1);
Here s1
variable is bound to a reference to there
portion of the original string.
String
Earlier we defined String slice (str
) as UTF-8 encoded byte array of a fixed length. In contrast to that, String
can be defined as UTF-8 encoded vector of bytes of a dynamic length. Since String
size could grow as needed, its size could not be known at compile time, which means the actual data is always allocated on the heap.
When we create a String
like the below:
let s: String = String::from("Hi");
let s1: String = s;
let s2: &str = &s1[..];
Variable s
is bound not to the actual string content Hi
(which is allocated on the heap), but rather to a small data structure allocated on the stack which contains a pointer to the actual string, together with information about the length/capacity of the string:
When we assign s
to another variable s1
, the actual string content Hi
is not copied, but instead the variable s1
is bound to a copy of the data structure (allocated on the stack) referencing the original string Hi
(allocated on the heap).
And when we create a string slice s2
, it also references the same Hi
string allocated on the heap.
Note that s
, s1
and s2
are not plain references. These can be thought of as special small data structures (sometimes called "fat pointers", though some people would argue this term is not correct in this specific case) that contain actual pointers, together with additional information about the string/slice size.
And String
values could grow, as needed:
let mut s: String = String::from("Hi");
s.push_str("!!!");
assert_eq!(s, "Hi!!!");
Adding another string at the end of the original string might require allocating new memory, if that would exceed a currently allocated capacity of the string (recall that Capacity
field in the diagram we saw earlier).
Conclusion
In this article we touched the surface of strings in Rust: explored String slices (str
) and String
types. It's a common belief that strings in Rust are more complex than in some other programming languages, but actually they could be rather more transparent, flexible and safe. Thanks for reading!