String Types in Rust 2016 / 03 / 27· Rust ·Reading time: 10 minutes
Rust has many pairs of string types, each with a distinct purpose. Understanding those purposes helps with understanding Rust itself.
This seems to be a common issue, so let’s talk about the different string types in the Rust programming language. In this post I’m going to explain the organization of Rust’s string types with String
and str
as examples, then get into the lesser-used string types—CString
, CStr
, OsString
, OsStr
, PathBuf
, and Path
—and how the Cow
container can make working with Rust strings easier.
This was originally published on a different site.This is a reproduction of a post of mine which was originally published on a now-defunct programming blog called “Suspect Semantics”. I am reproducing it here to give me something to refer to, as there have been times where I wanted to point people to this article. I don’t have any intent of bringing the original site back in full, but at least some of the content was good and is still reasonably correct and relevant to be worth preserving.
The most important thing to understand is that string types in Rust come in pairs, which I’ll call the “owned” variety and “slice” variety (the term “variety” is being used here to mean roughly “set of types”, to make clear that “owned” and “slice” are not actual Rust types, but a notion of organization for Rust’s string types). The “owned” variety of strings—String
, CString
, OsString
, and PathBuf
—have ownership over their contents (hence the name!), and can grow or shrink. The “slice” variety of strings—str
, CStr
, OsStr
, and Path
—are views into some collection of characters. There’s also the Cow
wrapper type, which can make working with the two varieties of strings easier while retaining good performance characteristics.
“Owned” vs. “Slice” varieties of Strings
String
and str
are, by far, the most common string types in Rust, so I will use them to illustrate the difference between the two varieties of string types. String
(the “owned” variety of string type) is a wrapper for a heap-allocated buffer of UTF-8–encoded unicode codepoints. str
(the “slice” variety of string type) is a buffer of UTF-8–encoded unicode codepoints that may be on the heap or in the program memory itself.
When you create a string literal in Rust, it is by default of type &str
. There are two things this could mean, depending on how that buffer was created.
“Slice” on the Heap
If the reference is taken from a String
, it will be a reference to the contents of the String
’s internal buffer, which is on the heap. Handing out these references is a common pattern for effectively using “owned” variety strings in Rust. For convenience’s sake, all references to “owned” variety strings coerce to references to their “slice” variety equivalent. That is, &String
becomes &str
. It is considered good practice to use the latter type for function parameters taking a reference to a string.
“Slice” in Program Memory
If the buffer is a string literal, with or without an explicit lifetime, then the buffer will be located in program memory. In fact, all string literals have the 'static
lifetime by default, meaning they can be safely referenced from anywhere in the program. Note though that having a buffer with a 'static
lifetime is not the same as declaring a static
variable. A buffer with a 'static
lifetime may be safely referenced from anywhere, but must still be passed to functions explicitly; it is not globally visible. A variable must be declared with the static
keyword to be globally visible.
There is some nuance here though. When returning a &str
from a function, the lifetime of the reference must be tied to either a single input lifetime, or a method taking either &self
or &mut self
. In the first case, the lifetime of the returned reference will be the lifetime of the single input reference. In the second case, the lifetime will be the lifetime of the reference to self
.
If your function does not fit either of these cases, rustc
will return an error, as in the following example:
fn get_string() -> &str {
"A string!"
}
fn main() {
let string = get_string();
println!("{}", string);
}
You can avoid the lifetime elision problems entirely by explicitly annotating the function in one of the following ways:
fn get_string_static() -> &'static str {
"A string!"
}
fn get_string_function_call<'a>() -> &'a str {
"A string!"
}
fn main() {
let string_1 = get_string_static();
let string_2 = get_string_function_call();
println!("{}", string_1);
println!("{}", string_2);
}
So what do you do if you want to return a string from a function without giving it a 'static
lifetime, or want to return a string whose contents aren’t known at compile time? You use the “owned” variety of string type, in this case: String
.
Why “Owned” Strings Exist
The “owned” varities of strings are allocated on the heap, and do not suffer the same limitations of the “slice” varities. They may be freely moved around without safety issues, handing out slices of their internal buffer as needed. The “slice” varities do not have the same guarantees, and so can’t be used as freely.
If you want to avoid dealing with ownership and borrowing for strings, you can just turn every “slice” variety into an “owned” variety via the to_owned()
method (provided by the ToOwned
trait, which all “owned” varities of string types implement), and only work with the “owned” varities, but doing so would incur performance penalties when you unecessarily allocate string buffers on the heap, and would be considered poor Rust style. Instead, it’s best to develop an understanding of when the two varities of strings are needed.
Another important thing to note is that because the “owned” varities of strings abstract away the underlying buffer, they can grow or shrink, possibly allocating a new underlying buffer and copying their contents to this new buffer. The “slice” varities of strings cannot be resized, as they may not even be on the heap.
The “slice” variety strings can only be accessed via what’s called a “fat pointer.” This is because slices are “dynamically-sized types,” meaning they do not carry information about their own length. They are simply some collection of contiguous memory. A “fat pointer” to a slice stores both a pointer to the memory in question and the length of the data stored at that memory location. This is all handled automatically by Rust, but it means that the “slice” variety of strings are interacted with via references, rather than being handled directly. For more detail about dynamically-sized types, check out “The Rustonomicon,” which covers them in detail.
The String Type Pairs
The pairs of string types are differentiated from each other by the guarantees provided by the underlying buffer.
String
and str
: String
and str
are guaranteed to be valid UTF-8 encoded Unicode strings. If you’re wondering why UTF-8 is the standard encoding for Rust strings, check out the Rust FAQ’s answer to that question. CString
and CStr
: CString
and CStr
are guaranteed to be compatible with C strings, and are usually used in FFI code. OsString
and OsStr
: OsString
and OsStr
are guaranteed to be platform-native strings (that is, they use the encoding of the current platform) that can be cheaply converted into String
and str
types. They are usually used when interacting directly with the operating system. PathBuf
and Path
: PathBuf
and Path
are wrappers around OsString
and OsStr
that provide convenient methods for operating on paths according to the rules of the current system. They are usually used when interacting with system paths.
There are times when working with strings in Rust that you want to cleanly abstract over the two varities of string types. Maybe you want a container that may hold either “owned” variety or “slice” variety strings, or you want to use slices except for cases where an “owned” variety is absolutely necessary. In these situations, use Cow
.
Cow
(which stands for “Clone on Write”) is a container that can take in a “slice” variety of string type, and only convert that “slice” variety into an “owned” variety when absolutely necessary (when something attempts to write to the slice). This has the advantage of reducing the complexity of reasoning about ownership, while also helping keep unecessary allocations down. It is not a panacea, but it can be very helpful sometimes.
Conclusion
That was a relatively quick explanation of string types in Rust. Hopefully that helped to clarify a bit of why the different string types exist, and why they are useful. As a final reference, here is a table of the different types:
Updated (3/28/2016):
- This post has been updated based on corrections from CryZe92 and Ilogiq on the Rust subreddit. Thanks to both of them for the help!
Updated (3/29/2016):
- Replaced the word “sort” with “variety” based on feedback from rkangel on Hacker News.
- Clarified the wording of the unicode explanation, based on feedback from nothrabannosir, also on Hacker News.
- Fixed a typo based on feedback from respeccing on Twitter.
- Reworded the transition into the explanation of “owned” string types, based on feedback from bjzaba from the Rust subreddit.
Updated (3/31/2016):
- Replaced “three” with “two,” fixing an ommission from earlier changes.