OCaml functor example : a progress bar

Reading time ~7 minutes

Introduction

Python has an amazing progress bar library tqdm. In a for loop or a map statement, tqdm allows the user to know how much time is needed for the loop to end. Though simple, this happens to be very useful when performing treatment on a dataset with hundreds of thousands of lines.

76%|████████████████████████████         | 7568/10000 [00:33<00:10, 229.00it/s]
What tdqm does


[##############-------------------------] 38.00% ending in : 3.21s
What I would be happy with

Regarding machine learning pipelines, Python does a great job. However, it lacks the power of compiler. First, pre-treatment of the data is sometimes done using functions written in pure Python, not particularly fast.

But more important, imagine the following pipeline in OCaml:

let new_text = my_text 
|> Str.split
|> List.map fix_spelling
|> remove_stopwords
|> List.map vectorize
|> FloatArray.average

Maybe these functions do not really exist, but a sentence (my_text) is splitted in a list of words, the spelling of each word is fixed (List.map fix_spelling), the stopwords are removed, the words are vectorized (with something like word2vec) and at last, the list of vectorized words is averaged. It is pretty clear that we input a sentence (a string) and return a float array.

Now imagine that for some reason, the pipeline is misspelled in something like this:

let new_text = my_text 
|> Str.split
|> List.map fix_spelling
|> List.map vectorize
|> remove_stopwords
|> FloatArray.average

The compiler would immediatly complain that remove_stopwords does not apply to float array list (but was expecting a string list). On the other hand, depending on the implementation, it may take a while for the same program in Pyhton to realize that some steps cannot be performed.

Therefore, I believe that OCaml could be an amazing fit for machine learning: the computing intensive libraries (random forests, SVMs…) can be implemented in another language (which is the case with scikit-learn) and the pipeline would be much more robust in OCaml.

As I love to use tqdm in Python, I wanted an OCaml equivalent.

OCaml functors

A functor is simply a module, which accepts another module as a parameter. Let’s look at the following module signature, a finite collection. It has a map function, which transforms every element and a length function. Note that length could be inferred from map and a ref. However, modules such as Seq allow infinite collections: they have a map function, but cannot evaluate the length of a sequence.

module type FINITECOLLECTION = sig
  
  type 'a t
  val map : ('a -> 'b) -> 'a t -> 'b t 
  val length : 'a t -> int

end

Given this signature we can now define a module which will operate on finite collections. Without paying much attention to the contents of map, simply note that the only functions called in map are the ones specified in the signature: Collection.length and Collection.map.

(** Creates a new map function, reporting the progress in the terminal *)
module Prokope(Collection : FINITECOLLECTION) = struct

  (** Applies f to a finite collection l, while reporting progress in the terminal *)
  let map f l =
  
    let open ProkopeUtils in

    let starting_time = Unix.gettimeofday() in
    let wrapped_f = progress_wrapper default_progress_bar (Collection.length l) starting_time f in
    
    let res = add_indices_to_collection Collection.map l
    |> Collection.map wrapped_f in
    
    print_string "\n";
    res

end

Now, we would like to be able to use this new map method on array and list types. For this, we just have to reproduce the signature of the FINITECOLLECTION. The following does it.

module UsualArray  = struct 
  type 'a t = 'a array
  include Array
end

module Array = Prokope(UsualArray)


module UsualList  = struct 
  type 'a t = 'a list
  include List
end


module List = Prokope(UsualList)

Now this function can simply be invoked appending Prokope before

let dummy a = 
  ignore (Unix.select [] [] [] 0.1);
  a

let double x = 2 * x 

let id x = x

let () =
  let sample = List.init 50 id in
  assert(List.map double sample = Prokope.List.map double sample);
  let _ = Prokope.List.map dummy sample in
  ()

And you would see:

[#######################################] 100.00% ending in : 0.00s
[##############-------------------------] 38.00% ending in : 3.21s

The implementation

A couple of tricks

A couple of things to know is that OCaml may not flush to the standard output as you may expect. But this can be forced by calling

flush stdout;

Now to write a line on the actual line of the terminal, the only thing to do is to call

print_string ("\r"^(progress_bar_string config percentage)^"ending in : "^remaining_time_str^"s");

You may find more informations on \n and \r on StackOverflow though.

The implementation

I decided to call this library Prokope. Maybe on opam someday ;)

Prokope utils

Let’s start with the helpers needed to display and configure a progress bar.

prokopeUtils.ml

(** Various helpers for Prokope *)

(** Graphical parameters for the progress bar *)
type progress_bar_config = {
  width: int;
  cell_filled: string;
  cell_empty: string
}


(** The width of the terminal *)
let default_width =  
  match Terminal_size.get_columns () with  
  | Some(a) -> a
  | None -> 80 


(** A default progress bar *)
let default_progress_bar = {
  width = default_width; 
  cell_filled = "#";
  cell_empty = "-"
}
  

(** For a collection 'a t returns a collection of type (int, 'a) t *)
let add_indices_to_collection map collection =
  
  let index = ref (-1) in

  let add_index_to_element e =
    index := !index + 1;
    (!index, e)
  in
  
  map add_index_to_element collection


(** Repeats the string s n times *)
let repeat_string s n = 
  List.init n (fun _ -> ()) 
  |> List.fold_left (fun e _ -> s^e) ""


(** Returns the string represented by the progress bar configuration and the percentage *)
let progress_bar_string config percentage =
  
  let gauge_string n_steps inner_width =
    let remaining_steps = inner_width - n_steps in
    "["^
    (repeat_string config.cell_filled n_steps)^
    (repeat_string config.cell_empty remaining_steps)^
    "]"
  in

  let inner_width = max (config.width - 40) 0 in
  
  let n_steps = int_of_float (percentage *. (float_of_int inner_width)) in
  
  let rounded_percentage_str = Printf.sprintf "%.2f" (100. *. percentage) in
  
  (gauge_string n_steps inner_width)^" "^rounded_percentage_str^"% "


(** For a fonction ('a -> 'b) returns a function ( ('a, int) -> 'b )
 * "aware" of its starting time and operations to perform *)
let progress_wrapper config list_length starting_time f a =

  let i, x = a in
  
  let percentage = (float_of_int i +. 1.) /. (float_of_int list_length) in
  
  let time_taken = (Unix.gettimeofday() -. starting_time) in
  
  let speed = (float_of_int i) /. time_taken in
  
  let remaining_time_str = (float_of_int (list_length - i)) /. speed
    |> Printf.sprintf "%.2f" 
  in
  
  print_string ("\r"^(progress_bar_string config percentage)^"ending in : "^remaining_time_str^"s");
  
  flush stdout;
  
  f x 

The functor

Why a functor ? To enable a user to generate a progress bar for every finite collection ! Imagine that your data is stored on a binary tree for some reason. Having a map and a length function over trees is trivial (and it may also belong to your tree module). Calling Prokope(MyTree) would therefore return a new module, for which map shows a progress bar.

prokope.ml

(** User intended functor *)

(** Any collection whose elements can be counted and mapped.
 Note that length has to be implemented (this disqualifies
 modules such as Seq). *)
module type FINITECOLLECTION = sig
  
  type 'a t
  val map : ('a -> 'b) -> 'a t -> 'b t 
  val length : 'a t -> int

end


(** Creates a new map function, reporting the progress in the terminal *)
module Prokope(Collection : FINITECOLLECTION) = struct

  (** Applies f to a finite collection l, while reporting progress in the terminal *)
  let map f l =
  
    let open ProkopeUtils in

    let starting_time = Unix.gettimeofday() in
    let wrapped_f = progress_wrapper default_progress_bar (Collection.length l) starting_time f in
    
    let res = add_indices_to_collection Collection.map l
    |> Collection.map wrapped_f in
    
    print_string "\n";
    res

end


module UsualArray  = struct 
  type 'a t = 'a array
  include Array
end

module Array = Prokope(UsualArray)


module UsualList  = struct 
  type 'a t = 'a list
  include List
end


module List = Prokope(UsualList)

Testing and displaying

run.ml

let dummy a = 
  ignore (Unix.select [] [] [] 0.1);
  a

let double x = 2 * x 

let id x = x

let () =
  let sample = List.init 50 id in
  assert(List.map double sample = Prokope.List.map double sample);
  let _ = Prokope.List.map dummy sample in
  ()

run.sh

ocamlfind opt -package unix,terminal_size -linkpkg -o run.byte prokopeUtils.ml prokope.ml run.ml 

./run.byte

Hope you liked it! If you feel you may need it some day, or have improvements suggestions, please let me know, I would improve it and publish it!

Estimating the parameters of a CEV Process

The CEV ProcessIn mathematical finance, the CEV or constant elasticity of variance model is a stochastic volatility model which was devel...… Continue reading