Using OCaml

CSci 5106: Programming Languages

The typical mode in which we run OCaml when developing programs is an interactive one: when we start OCaml up, it gets into what we might call a "read-analyze-evaluate-display" loop. In this interaction loop you can define functions and ask for expressions that use these functions to be evaluated; the "analyze" part of the description refers to the fact that there may be errors in what you present to OCaml, in which case it has to figure these errors out and tell you about them so that you can fix them and try again.

Simple interactions with OCaml

Let us get a bit more specific to understand these notions better. To start OCaml up, you would type ocaml at the command level. OCaml will then print some introductory text and present you with a prompt:

lalit:~/teaching/umn/CSCI5106/code> ocaml OCaml version 4.05.0 #

Note the # in the last line above. This is OCaml's prompt: it signals that OCaml is ready for you to present it expressions that it will analyze and calculate for you.

Perhaps the simplest kind of expression that you can type in is an arithmetic one. An example interaction of this kind may be the following:

# 2 + 3 ;; - : int = 5 #

What has happened here is that the user has typed in the expression 2 + 3, OCaml has calculated the value of the expression as 5, displayed this and it has then become ready for another expression to be input.

A few fine points to note in the above example:

  1. The user typically needs to signify the end of the expression he/she wants evaluated. At the interaction level, this is done by typing two semicolons at the end; this is why 2 + 3 is followed by these two semicolons.
  2. Before OCaml evaluates an expression, it checks if it is type correct. OCaml is what we call a strongly typed language: what this means is that OCaml must know the type of an expression before it even thinks of computing it. In this case, OCaml was able to determine that the type of the expression it has been given is int (for integer) and it displays this to you. OCaml can often infer the type of expressions, freeing you from having to be explicit about every little detail. However, don't let this lull you into complacency: types determine the very structure of a program so you should always be thinking about them even if you do not write them out explicitly.
  3. Once OCaml has figured out the expression is syntactically okay and that it also type-checks, it goes ahead and computes the expression and displays the result to you together with the type it has figured out for the expression. This is what you see in the second line above, that line is what OCaml causes to be displayed.
  4. Having completed this action, OCaml returns to the beginning of the loop, i.e., it is ready for the next expression or interaction, something that it signals by displaying the prompt again.

Lets pause for a moment here to consider a common kind of "mistake" one might make: what happens if you forget to type the two semicolons to end your input to OCaml? Not a problem: OCaml will think that you are continuing your input on to the next line so you can type the semicolons there. Similarly, you can spread your input across two lines. The following examples show these two kinds of interactions:

# 2 + 3 ;; - : int = 5 # 2 + 3;; - : int = 5 #

Binding identifiers to expressions

In any significant programming task, you will typically want to build up what you want to compute in pieces: associate names with particular expressions, define functions that you can later use over these expressions, etc, and then eventually combine all of these to realize a complex computation. OCaml allows you to realize this style of programming essentially by giving you a mechanism to associate names with expressions. This is what is known as a let declaration.

Simple let declarations

The simplest form for a let declaration is the following:

let <name> = <exp> ;;

Generally, when we write something within angle brackets (as we have done in writing <name> for example), we mean it to stand for a category of expressions. Here, <name> stands for a token that is recognized as a identifier (we won't get into the details here of what constitutes such a token, we will use simple examples for which things will be clear even without such an explanation) and <exp> stands for OCaml expressions of the kind we have already seen.

Here is an example of the use of such a declaration:

# let five = 2 + 3 ;; val five : int = 5 # five ;; - : int = 5 #

In this interaction, we have first asked OCaml to associate the name five with the result of evaluating the expression 2 + 3. OCaml tells us that it has done this through the display on the second line. Then we have checked that it has actually done this by asking it to evaluate the identifier five. OCaml responds in the way we have understood earlier.

Binding a name to a function expression

Let declarations can also be used to associate function expressions with names. The structure of such a declaration is the following

let <fnname> <argname-1> ... <argname-n> = <exp>

We will understand various aspects of function expressions later in the course. For the moment, it is enough to understand the following: what such a binding does is associate a function with the identifier used for <fnname>. The function is something that essentially produces a value using <exp> after replacing <argname-1>,...,<argname-n> in it by given values.

That was a mouthful! But the concept is very simple, especially when you see it in an example. Here is one.

# let plus x y = x + y ;; val plus : int -> int -> int = <fun> #

What the declaration has done is associate the name plus with a function that takes two integers and produces an integer. This is what OCaml is telling us in the display. We can now use the function in further expressions:

# plus 2 3 ;; - : int = 5 #

Binding a name to a recursive function expression

The function above was quite simple. Often we will want to write more complex functions. One particular way in which the function might be complicated is that it may be recursive, i.e., it may use itself in its definition. We can still use a let declaration to bind such a function to a name but OCaml wants us to tell it first that we will be using the name in the definition itself. One way to do this is to use the keyword rec in the let declaration. Here is an example.

# let rec fact n = if (n = 0) then 1 else n * (fact (n - 1)) ;; val fact : int -> int = <fun> #

Suggested Exercise: Try defining fact by using a let declaration without the rec and try to understand the error message that OCaml will give you in this case.

Once we have built up a few associations of expressions with names, we can combine them into a larger expression as the following example shows.

# fact (plus five 1) ;; - : int = 720 #

Interacting with OCaml through program files

Up to this point, we have constructed all the expressions, associations of names with expressions, etc., directly through an interaction with OCaml. This may work for simple expressions but is quite difficult to sustain when the program gets more complicated. More importantly, it does not leave a record of the expressions that you have constructed once you close the interaction session. The last is a serious problem: usually, what you want to do is construct a program and check it through an interaction and then, once you have it working, you want to reuse all that code in other programs that you construct.

The way to realize the kind of interaction we really want is to construct the program separately in a data file so that it available beyond an interaction session. Then we "load" all the expressions, let declarations, etc., in the data file into an interaction session using a use directive. This way, the program persists beyond an interaction but what it contains can be used in any given interaction.

The structure of a use directive is the following.

#use <filename>

This tells OCaml to load what is contained in the indicated file into the interactive session. Note that the initial # is part of what you have to type, i.e. it is part of the directive itself. It is in fact what tells OCaml that what follows is a directive and should not be confused with the name of an expression. (We will see an example of one more directive in this introduction shortly.)

Here is a concrete example of the described process in the context of the few declarations we have considered. Let us supposed that we have put the following lines in a file called myfirst.ml.

let five = 2 + 3 let plus x y = x + y let rec fact n = if (n = 0) then 1 else n * (fact (n - 1))

We can then load these declarations into an interaction session and use them as shown below.

# #use "myfirst.ml";; val five : int = 5 val plus : int -> int -> int = <fun> val fact : int -> int = <fun> # fact (plus five 1) ;; - : int = 720

Pay careful attention to the response that OCaml displays as it loads the file. It is actually processing the declarations in the file one at a time and displaying what it would have done if the declaration had been typed in at the interaction level. You must examine what OCaml displays carefully each time you interact with it this way to convince yourself that it has accepted all the declarations. In particular, OCaml will stop processing your declarations at the first error it finds, which means that your entire collection could be useless unless you can see that all of the declarations have been accepted without errors.

A few further points about the example above:

  1. Observe the extension .ml that I used for the file containing the OCaml declarations. This is the extension for OCaml programs by convention and you will find it useful to stick to it: there are several useful tools that people have built that function "automatically" on files with this extension and you will eventually want to benefit from these tools.
  2. You might have observed that I did not terminate each declaration with two semicolons in the file I created. OCaml does not need these because it can figure out the end of the declaration in other ways when these are contained in a file. You can use the semicolons in the file if you want, but I personally find them to be unnecessary clutter and I therefore never use them in a file.

Structured data and type inference (again)

The examples we have looked at above used data of a simple, atomic kind: they were all integers. Almost as soon as you get started with programming, you want to deal with more complicated, structured data. One example of such data are lists. As we shall discuss in class, these come in two kinds: an empty list or a structured object that has a head element and a tail which is itself a list. OCaml provides a built-in treatment of lists. The empty list is written as [] and the non-empty list is written as h::t where h is the desired head element and t is the desired tail list. OCaml also provides us match expression for differentiating between the different kinds of expressions of the same type. The following function binding that defines the append function in OCaml illustrates the use of this expression.

let rec app l1 l2 = match l1 with | [] -> l2 | (h::l1) -> h :: app l1 l2

The idea here is that when this function is used on a particular list l1, the match expression will check what kind of list it is and will pick one of the cases accordingly.

Let us assume that this definition is put into a file called lists.ml. Then consider the following interaction with OCaml:

# #use "lists.ml";; val app : 'a list -> 'a list -> 'a list = #

As expected, OCaml has processed the contents of the file and figured out that it contains a binding for app. More impressively, it has inferred a fairly non-trivial type for app. We will talk about these things more carefully as the course progresses, but, quickly, the expression 'a list represents a list of elements of type 'a for any chosen value for a. Thus, OCaml has figured out that app is a function that takes two lists of the same type of elements as input (they must have the same type of elements because the scope of the 'a is the entire type expression shown) and it produces a list of elements of that type. The same type inference procedure is used in OCaml as in SML and Haskell and we will understand how it works in due course. For now, just be wowed by the fact that these languages can infer non-trivial types and focus on understanding what the type they show you mean.

OCaml has a more convenient way of showing lists of many elements. For example, you can write

[1; 2; 3]

for the expression

(1 :: (2 :: (3 :: [])))

This is like in SML, except that a semicolon is used as the separator rather than a comma. Also, all these languages allow users to define their own structured data types and they automatically extend matching and type inference to such data. These are things we will study and understand in detail later in the course.

Quitting OCaml

When you are finished with your OCaml session, you would want to say "good bye" to OCaml gracefully. You can do this in one of two ways.

You can use the quit directive as shown below.

# #quit;; gopalan@rishabh (~/teaching/umn/CSCI2041) %

Notice that, like the use directive, this directive also starts with a #.

The other way to end an OCaml session is to type the end of input character that happens to be a ^D. This appears a bit abrupt and also harsh but I think OCaml has gotten used to it—we have been parting company with it this way for several years now.