/* What You Must Know About Rust I've written a bunch of Rust programs in order to learn it, but I've distilled the most important points in the following program, which I've tried to make as clear and simple as possible. I expect you to study this program and understand why the commented out code would produce errors. Go through each line in main and study the comments. Try uncommenting the lines of code that would cause compiler errors and study the compiler's error messages. The best way to understand this is to experiment with it. Get all resources from rust-lang.org. Rust is installed on the Linux VM but you should do a 'rustup update' to update to the latest verions. Do rustc wymk.rs to compile and ./wymk (add .exe on pc) to run. Rust's ability to offer memory safety without a garbage collector rests on principles called OWNERSHIP, BORROWING and LIFETIME. Data types in Rust can be divided into two categories: those that can be copied and those that can't be. Specifically, a data type can be copied if it implements the 'Copy' trait (a trait is similar to an interface). A type can implement this trait only if it is of fixed size and is contiguous in memory (and therefore can be completely allocated on the stack). Data that can be copied include primitive types for numerical values (i32, u32, f64, etc), booleans, and string literals or "slices" (rust has different types of strings, some cannot be copied). If you define a struct (or enum) that contains only :Copy-able values, then that struct can also be copied (you can implement, and often automatically derive the Copy trait for it). For example, Rust arrays must have their size known at compile time, and so an array of :Copy values can itself be copied. However, if some part of your data structure contains pointers to heap-allocated data, such as in the case of a linked list or tree, then it cannot be copied. In such a case, you must understand the Rust rules of OWNERSHIP: each such data structure is 'owned' by a name (a variable), which is often allocated on the stack. When this variable goes out of scope, any heap-allocated data it refers to is also deallocated. Ownership is always unique, so the data cannot be deallocated twice. At runtime, the LIFETIME of a value is how long it's guaranteed to be in memory. At compile time, the lifetime of a variable is at most the static scope in which it exists. But it can also end when the data it owns is "moved" to another owner: this happens during variable assignment, as well as when passing the variable to a function, and returning one from a function. During such an operation, if a data type is :Copy, then a copy will be produced, otherwise it is "moved". **You cannot use a value once it's moved.** If you're familiar with "move semantics" of smart pointers in modern C++, these concepts are similar: but there are two key differences: 1. In Rust, move semantics is enforced at *compile time* 2. Rust has a "borrow checker", the most dreaded and loved part of Rust. You can avoid transfers of ownership by creating BORROWS of the values. You might think that these are just pointers, but their compile-time meaning is much more than that. The lifetime of a borrow cannot be greater than that of the value that it borrows from (regardless of whether the value can be copied). An immediate consequence of this restriction is that you can't return a borrow (pointer) to a local variable of a function, which is something a lot of new C programmers will try to do: int* f() { int x=1; return &x; } // this compiles in C In this program, I use fixed arrays of i32 (32-bit signed) integers, which are :Copy, and heap-allocated vectors of i32's, which are not :Copy to illustrate these essential concepts of Rust. Please proceed to 'main' for details. ALL THE COMMENTED-OUT LINES OF CODE WOULD CAUSE COMPILER ERRORS. */ // Compiler directives to avoid certain warnings // we'll have a lot of these because of commented-out code: #![allow(unused_variables)] #![allow(unused_mut)] #![allow(unused_assignments)] #![allow(dead_code)] #![allow(path_statements)] fn main() ////////// main { let mut fixeda = [2,4,6]; // A statically fixed-sized array of integers... let mut fixedb = fixeda; // can be copied (:Copy trait) fixeda[0] = 9; // fixeda can still be used because it has its own copy println!("fixedb[0] is still {}",fixedb[0]); // 2, as you'd expect. // fixeda and fixedb are both allocated on the stack. //// If all you have are :Copy types, then Rust can mostly look like any //// other language (but the rules concerning borrowing still apply). // If something does not :Copy, that's when it gets interesting... let mut vct1 = vec![1,2,3]; //a vector is heap-allocated, size can change. vct1 = vec![2,4,6]; // previous vector [1,2,3] deallocated vct1.push(8); // adds new element, expand size of vector let mut vct2 = vct1; //a transfer of ownership or "MOVE" occurs, NOT A COPY. //let mut vct3 = vct1; // COMPILER ERROR: "value used after move" // We can't use vct1 at all after it's been transferred ("moved") to a new // owner, because otherwise, when both vct2 and vct1 go out of scope, the // same data will get deallocated twice (seg fault in C). The lifetime of // vct1 terminates here. However, the variable is still declared and you // can bind it to something else: vct1 = vec![8,9,10]; // move something back into vct1 // Ownership is also transferred when you pass a value to a function, let mut vct3 = f(vct2); //... and when a value is returned //println!("Do I still have access to vct2? {}",vct2[0]); // NO! // So how can I do anything with a variable if gets "moved" so easily? // I can BORROW it with a reference (alias), without taking ownership: let rv3 = &vct3; // equivalent to let ref rv3 = vct3; rv3 is a borrow. println!("{} and {}",rv3[4], vct3[4]); // vct3 is still alive, and for x in &vct3 { println!("{} ",x); } // can be borrowed more than once // Borrows can be created explicitly or implicitly. When you declare // rv3 to be a & to something, it's an explicit borrow. However, // when you write vct3[4], you're implicitly creating a borrow. // Think of vct3[4] as vct->[4] or (&vct).[4]. // Rust borrows are sometimes called references, but they're much // more flexible than C++ references. They can be reassigned. //// But the lifetime of a borrowed reference does not outlive its lender: let mut vct4 = vct3; // data "moved out" of vct3 //println!("Can I cheat and still use the borrow? {}",rv3[4]); // NO WAY! ///// Here is another thing, which can cause you some headache when first // learning rust: when you have a borrow of a non-Copy value, don't // to try to dereference it as it would try to create a temporary copy // of the value, which of course it could not: let b4 = &vct4; //let vct5 = *b4; // "move occurs ... does not implement the Copy trait" ///////////////// Mutable and Immutable Borrows: //////////////////// ///// There are also two categories of borrows: mutable and immutable, // the ones above were all immutable. A mutable borrow can both // access and change a mutable structure. YOU CANNOT MIX A MUTABLE // BORROW WITH OTHER BORROWS (immutable or mutable) of the same value. // Their lifetimes cannot intersect. Why? One reason is that // the mutable ref invalidates the immutable status of the immutable // borrows. In the case of vectors, destructive operations on a vector // can cause the underlying memory allocated for the vector to change // (it could be copied to somewhere else in memory due to resizing). // Try the equivalent of the following code in any other language: let mut v:Vec = vec![1,2,3,4]; //for x in &v {v.push(*x);} //*x is ok here because i32 is :Copy. // This code will not compile in Rust because the borrow &v is alive // for the duration of the for-loop while there's a mutable borrow // of v, implicitly created by v.push, that must be alive at the same // time. Do the equivalent experiment in C#/Java and you will see a // runtime error, but the code will compile. Do it in C++ and ... // well, why don't you find out for yourself. /* ========== +++++++++++++++++ The fundamental reason for restricting a language is that it gives us more *invariants* that we can assume about successfully compiled programs. These invariants provide properties including memory and type safety as well as greater opportunities for optimization, parallelization, fault tolerance, etc. The typed lambda calculus gives us many such invariants, such as statelessness and the Church-Rosser property. We can write purely functional programs and such programs will be more parallelizable, transportable, and fault tolerant (running it anywhere and any amount of times will give the same result). This is the appeal of functional programming and why it's becoming more prominent, with most languages supporting the paradigm to some degree. But most of these languages are not *pure* because there's too much of a disconnect between lambda calculus and actual hardware. That is to say that *zero-cost abstraction* is hard to obtain if you have *too much* abstraction. Most "functional" languages, including F# and Scala, still allow us to easily write programs that are not consistent with lambda calculus. Take in particular, the Church-Rosser property, which says that we can evaluate an expression any way we want and it will give us the same result at the end: this means we have the greatest freedom in finding the *optimal* way to run a program. Unfortunately we can't guarantee this property often enough in practice. let x = 2: let f = (lambda y. (x = x*0; y;)): let g = (lambda y. (x = x+1; y; x;)): ( (g (f 1)); ) This little program (in the language "mongoose") would return 1 under call-by-value (if f is called first) and 0 under call-by-name (if g is called first). This program is clearly not admissible in lambda calculus (typed or untyped) as it violates the Church-Rosser property. What is the source of the impurity? It's easy to point to destructive assignment as the culprit: in a language without destructive assignment (Haskell, Elm), this example would not be possible. But let's try to write it in Rust, which allows destructive assignment: it's hardly possible to implement all algorithms efficiently without it. The syntax of lambda expression lambda x.E in Rust is |x|{E} (similar to Ruby's syntax). A proper closure that not only captures but changes free variables have type FnMut in Rust. */ { let mut x = 2; let mut f = |y:i32|{x=x*0; y }; let mut g = |y:i32|{x=x+1; y; x }; // Note that f and g are both "mut" because the closure itself // changes each time it's called. ////So far it compiles because we didn't call f and so the compiler // deduced that the lifetime of f has already ended. But in fact, // f and g CANNOT BOTH EXIST because each makes a mutable borrow of x. //println!("g(f(1)): {}", g(f(1))); // example not possible in Rust*** println!("g(1)==g(1): {}", g(1)==g(1)); // prints false println!("x is now {}",x); // prints x is now 4 } /* **The fact that Rust allows destructive assignment does not by itself mean that there's a loss of the Church-Rosser property.** It's not just destructive assignment but destructive assignment from two different borrows. Uncomment the call to g(f(1)) and look at the compiler error messages, which will include: let mut f = |y:i32|{x=x*0; y }; | ------- - first borrow occurs due to use of `x` in closure | | | first mutable borrow occurs here let mut g = |y:i32|{x=x+1; y; x }; | ^^^^^^^ - second borrow occurs due to use of `x` in closure println!("g(f(1)): {}", g(f(1))); // example not possible in Rust | - first borrow later used here But g by itself might also be a problem: surely we want g(1)==g(1) to be true? Actually, that can be explained by the little 'mut' found in the definition of g: when we called g(1), it in fact *changed* g. So g is changed to a different function each time it's called since the x it refers to changed. So a destructive assignment is made to g as well as x. We should not expect g(1)==g(1) just as we should not expect x==x+1. Let's look at this problem another way. Not all destructive assignments are equally "destructive." We know that many loops with "i++" at the end can be rewritten as tail-recursive functions without destructive assignment. This means that at least some loops are actually admissible as purely functional programs, does it not? Of course, not all loops will be admissible: not the one above that tries to iterate over a vector while changing it. The infamous "x=x+1" can be explained in the following way in lambda calculus: Suppose we have a program with the following structure: lambda x. ... let f = (lambda x.E): ... f(x+1) ... We know that there are actually two x's in the program, one bound by the outer lambda and one bound by the inner one. We can alpha-convert one of them to something else. When implementing this program, we should allocate different memory locations for the two x's. But what if THERE IS ONLY ONE OCCURRENCE of the outer-bound x in the entire term. That is, what if the x in f(x+1) is the only place that it occurs. In that case, we can make an optimization (similar to tail-recursion optimization) to REUSE the memory for the outer x to store the inner x when evaluating E. The x inside E "becomes" x+1. However, if the outer x also occurred elsewhere (in the ...) then such an optimization cannot be made: the destruction of the outer x clearly must invalidate all other references to x, EXCEPT ONE. The optimization - the mutation - does not compromise the laws of lambda calculus **if there are no other references to what's being mutated.** The mutation of a value must invalidate all existing borrows of that value. */ let mut u = 2; let bu = &u; // an explicit borrow u += 1; // mutates u. think of this as making an implicit mut borrow //println!("{}",bu); // compiler error -bu invalidated by mutation. /* Rust is still not pure lambda calculus (it has I/O). But Rust have greater invariants of the kind that we want from pure programs. When using Rust, especially as a replacement for C++, it will have the look and feel of an imperative language with loops, references and destructive assignment. In fact, Rust doesn't even guarantee tail-recursive optimization in all cases. But despite how it looks, it is actually closer to pure typed lambda calculus than many "functional" programming languages, including F# and Scala. */ // One more example along these lines. Take the following Python function // def reverse(A,B): // if len(A) != len(B): return // for i in range(0,len(B)): B[i] = A[len(A)-1-i] // Can you determine, looking at just the static code, what's the effect // of calling the above function on a pair of lists in python? // I'll bet some of you got it wrong. Consider: // v = [1,2,3,4,5] // reverse(v,v) # What's v now? It's not [5,4,3,2,1] but [5,4,3,4,5] // It's not just that such situations might cause confusion for the // programmer, but not being able to determine the behavior of a function // statically may cost us opportunities to optimize the function (either // by ourselves or by the compiler's static analysis capabilities). // Rust would not compile the equivalent call to reverse(v,v) as it // would mean passing a mutable reference along with another reference // to the same structure. The Rust version of reverse (below) can ONLY // do one thing: copy the reverse of A into B. let mbv = &mut vct4; // creates an mutable borrow of vct4 (defined earlier), mbv[1] += 1; // with which you can mutate the borrowed structure. //reverse(mbv,mbv); // ERROR: creates two mut borrows ra,rb inside reverse let mut vct5 = vec![0;5]; //create vector of 5 zeros; reverse(mbv, &mut vct5); // mbv is still borrowing vct4 for x in &vct5 {print!("{} ",x);} println!(""); // vct4 in reverse // Note that the &mut vct5 created by the function call was a temporary, // and is no longer used in the rest of main, so it's ok to create another // borrow of vct5 afterwards. // Also note that these borrowing rules applies to both :Copy and // non- :Copy types: // let mut x = 1; let px = &mut x; let px2 = &x; *px += 1; //won't compile // let b = {let x =1; &x}; // "borrowed value does not live long enough" // the {}'s limit the lifetime of x. ///////// QUIZ! // Let's see if you have learned these concepts well enough. Look at the // g function that's commented out at the end. Will it compile if // uncommented? How about the function h? // Want to know the answer to the quiz? COMPILE IT AND SEE FOR YOURSELF! //////////////////////// NOW FOR A BAD PART ////////////////////////// // I've tried so far to explain Rust using examples that are as simple as // I can make them. But the truth is that Rust's ownership and borrowing // rules will have a major impact on how you write programs. You have // to think much harder than you normally would with a language that // relies on a garbage collector. ///// Moving out of a larger structure: // You can assign one vector to another vector, which moves ownership of // the entire vector. But you can't move a value from inside the vector: let mut sv:Vec = vec!["abc".to_string(), "xyz".to_string()]; // here we created a vector of "owned strings", which are string that are // allocated on the heap. Such strings do not :Copy. //let s0 = sv[0]; // This kind of move is not allowed, as it would leave a hole in the vector. // You can call sv.remove(0), but that would require shifting values. // To refer to an owned object inside a vector, you can assign a ref: let sb:&String = &sv[0]; // or, you can "swap" some other value of the same type into the vector: let mut s1 = String::new(); // empty string std::mem::swap(&mut s1, &mut sv[1]); // only if both s1 and sv are "mut" ///// Here's an even more nasty surprise: Consider the following // (mbv is still a live mutable borrow of vector vct4): mbv[0] = mbv[mbv.len()-1]; //no problem, sets first value to the last value // we are using a single mutable borrow to access then change the // vector that mbv is borrowing. No conflicts. But now consider: //mbv[mbv.len()-1] = mbv[0]; // Won't compile, but why? // What's wrong with the above? It's simply trying to change the // last value of the vector; it's something you can write without // any problems in other languages. You have to really think // hard to understand the compiler's error message: "immutable borrow // occurs here ... mutable borrow later used here". It may look like // we're not doing anything different from the previous line, which // compiled. But there's a difference between using the expression // mbv[mbv.len()-1] as an 'rvalue' (on the right-hand side of =) than // as an 'lvalue' (on the left side of =). rvalues are temporary: they // are created then destroyed, while lvalues are 'names': they express // ownership and persist after the assignment. The two uses of // mbv in mbv[mbv.len()-1] on the right-hand side of = are both immutable // uses: they only access and do not modify the vector. Once they are used // they can be forgotten: they do not interfere with the mutable use of // mbv[0] on the left side of =. But when mbv[mbv.len()-1] is an lvalue // on the left side of =, which must persist after the assignment, // there is a mutable use of mbv when we change the vector, which cannot // coexist with the other use of mbv in 'mbv.len()'. // It's not hard to get around these restrictions once you understand them: let lastindex = mbv.len()-1; mbv[lastindex] = mbv[0]; // Perhaps in the future the Rust compiler can be refined to recognize // some safe exceptions to its rules, to make life slightly easier for // programmers, but understanding the above example carefully is useful. // Rust's restrictions on ownership and borrowing are sometimes too rigid // that compromises must be made. This is especially true if you want to // create a recursive and mutable data structure. There are limits to // what static analysis can achieve (ultimately because of the // undecidability of the Halting Problem). Thus there are times when // you will have to use the RefCell smart pointer (which I tried to // simulate in C++) and Ref, RefMut in place of regular borrows. Their // usage sacrifices compile-time memory checks and only give you errors // if you violate the borrowing rules at runtime. There are also times // when even these might not be enough, and which is why Rust does // permit you to write 'unsafe' code, as long as it's done in isolation. // An important concept in high-level programming languages is // referential transparency, which basically means 'what you see // is what you get'. This becomes much harder to achieve with Rust. // If lastindex is the same as mbv.len()-1, then why isn't mbv[lastindex] // the sames as mbv[mbv.len()-1]? That's a reasonable question, and // it's one reason we can call Rust a 'low-level' language. Here, // 'high' and 'low' are just technical terms and do not represent any // qualitative judgment. If you're a typical applications-oriented // programmer, then you probably should just stick with languages with // a garbage collector. But if you're interested in systems-level // programming and are as persistent as I am, then learning Rust will // benefit you immensely. Even if you continue to use C/C++, it will // make you a much better programmer in those languages. }//main fn f(v:Vec) -> Vec // a function that takes and returns ownership { //ownership of value passed-in moves when it's implicitly assigned to v. // do stuff with v ... let mut v2 = v; // must define local mut var in order to change things v2.push(100); return v2; // transer ownership of local var } // a function that takes borrows, and where rb must be mutable: fn reverse(ra:&Vec, rb:&mut Vec) // actual ra can be mut or immutable { if ra.len() != rb.len() {return;} for i in 0..rb.len() { rb[i] = ra[ra.len()-1-i]; } } /* // Does the following compile if uncommented? fn g(x:i32) -> &mut i32 //calculates a number and returns a mutable reference { let cubex = x*x*x; return &mut cubex; // Type checks ok, doesn't it? } */ /* How about this function? Will it compile? fn h() { let mut v = vec![1,2,3,4]; let bv = &v; // immutable borrow { let mv = &mut v; // a mutable borrow mv[0] = 7; } let bv2 = &v; // another immutable borrow for x in bv2 { print!("{} ",x); } println!(""); } */ /////////////// // Educational Program by Chuck Liang, Hofstra University