AI with OCaml : the tic tac toe game

In the ever-evolving landscape of artificial intelligence, the pursuit of perfection in game-playing algorithms has been an enduring quest. One of the earliest and simplest games to captivate the minds of AI enthusiasts is Tic-Tac-Toe. Although this classic game might seem elementary, it serves as a testing ground for cutting-edge AI strategies. In this article, we embark on a journey into the world of OCaml, a functional programming language, and its application in crafting an unbeatable Tic-Tac-Toe-playing AI.

Me affronting my machine

I am playing the red cross

At the heart of our AI lies the powerful Min-Max strategy, a method that has revolutionized the way machines approach decision-making in two-player games. With OCaml’s elegance and expressiveness, we delve into the intricacies of this strategy, exploring how it enables our AI to not just play, but dominate, the age-old game of Tic-Tac-Toe. Whether you’re a seasoned programmer, a curious AI enthusiast, or simply a Tic-Tac-Toe aficionado, this article promises to shed light on the inner workings of AI-driven game strategy and the wonders that OCaml brings to the table. So, let’s embark on this exciting journey and unlock the secrets behind a Tic-Tac-Toe AI that can never be beaten.

Let’s start with the definition of the pieces:

type piece = Cross | Circle

let piece_opposite = function
  | Cross -> Circle
  | Circle -> Cross

let add_color color s = "\027[3" ^ color ^ "m" ^ s ^ "\027[39m"

let piece_option_to_string = function
  | Some(Cross) -> add_color "1" "X"
  | Some(Circle) -> add_color "6" "0"
  | None -> " "

We will represent the board as a piece option array array. The option is None if the board is empty at a specific position, otherwise, it is Some(piece)

We also want to render the board so that a human can easily play :)

type board = piece option array array

let board_at board i j = board.(i).(j)

let board_init _ =
  Array.init 3 (fun _ -> Array.make 3 None)

let board_copy b =
  let res = board_init () in
  let n,p = 3, 3 in
  for i = 0 to n - 1 do
    for j = 0 to p - 1 do
      res.(i).(j) <- b.(i).(j)
  done ;

let board_place board piece i j =
  let board' = board_copy board in
  let () = board'.(i).(j) <- Some(piece) in

let board_transpose b =
  let res = board_init () in
  let n,p = 3, 3 in
  for i = 0 to n - 1 do
    for j = 0 to p - 1 do
      res.(j).(i) <- b.(i).(j)
  done ;

let board_print b =
  let print_separator () =
    print_string "+" ;
    Array.iter (fun _ -> print_string "-") b.(0) ;
    print_string "+\n"

  let print_row r =
    print_string "|" ; piece_option_to_string r |> Array.fold_left ( ^ ) "" |> print_string ;
    print_string "|\n"

  print_separator () ;
  Array.iter print_row b ;
  print_separator ()

So, we only have pieces and boards so far, time for some game logic.

let has_won piece board =

  let winning_line = Array.for_all (fun x -> x = Some(piece)) in
  let is_main_diagonal_winning = Array.for_all (fun x -> x = Some(piece)) [| board.(0).(0); board.(1).(1); board.(2).(2) |] in
  let is_other_diagonal_winning = Array.for_all (fun x -> x = Some(piece)) [| board.(0).(2); board.(1).(1); board.(2).(0) |] in

  Array.exists winning_line board 
  || Array.exists winning_line (board_transpose board)
  || is_main_diagonal_winning
  || is_other_diagonal_winning

let has_lost piece board =
  has_won (piece_opposite piece) board

let winning_board_cross = [|
  [|Some(Cross); None; None|];
  [|Some(Cross); Some(Circle); None|];
  [|Some(Cross); Some(Circle); None|];

let winning_board_circle = [|
  [|Some(Cross); None; Some(Circle)|];
  [|Some(Cross); Some(Circle); None|];
  [|Some(Circle); Some(Circle); None|];

let empty_board = board_init ()

let () =  assert (has_won Cross winning_board_cross)
let () =  assert (has_won Circle winning_board_circle)
let () =  assert (has_lost Circle winning_board_cross)
let () =  assert (has_lost Cross winning_board_circle)
let () =  assert (not (has_lost Cross empty_board) && not (has_lost Circle empty_board))
let () =  assert (not (has_won Cross empty_board) && not (has_won Circle empty_board))
let possible_moves board =
  let all_moves = List.init 3 (fun i -> List.init 3 (fun j -> (i,j))) |> List.flatten in
  List.filter (fun p -> board_at board (fst p) (snd p) |> Option.is_none) all_moves

let _ = assert (List.length (possible_moves winning_board_cross) = 4)

let is_game_over board = 
  has_won Cross board 
  || has_won Circle board
  || Array.for_all (Array.for_all Option.is_some) board

let () = assert (is_game_over winning_board_cross)
let () = assert (is_game_over winning_board_circle)

The evaluation function is a critical component of an AI-powered Tic-Tac-Toe player utilizing the Min-Max strategy. Its primary role is to assess the current game state and assign a numerical value that reflects the desirability of that state for the AI player. This numerical value serves as a basis for the Min-Max algorithm to make informed decisions about which moves to choose during the game.

let eval player board =
  if has_won player board then
  else if has_lost player board then
    ~- 1

let () = assert ((eval Cross winning_board_cross) = 1)
let () = assert ((eval Circle winning_board_cross) = ~-1)
let () = assert ((eval Circle winning_board_circle) = 1)
let () = assert ((eval Cross winning_board_circle) = ~-1)

In the context of the Min-Max algorithm, the evaluation function often works in tandem with recursive calls. As the Min-Max algorithm explores possible future game states through a tree of possibilities, the evaluation function helps assess these states at different depths. The AI calculates scores for each state, and these scores are propagated up the tree to make informed decisions.

Let’s define a tree

type 'a tree =
  | Node of 'a * 'a tree list
  | Leaf

let tree_to_list tree =
  let rec aux = function
    | Leaf ->
    | Node (n, children) ->
        n :: List.fold_left List.append [] ( aux children)
  aux tree

And the tree of moves.

let make_moves_tree max_depth board player = 

  let rec aux depth board player =
    let moves = possible_moves board in
    match moves, depth with 
    | [], _ -> Node(board, [Leaf])
    | _, 0 -> Node(board, [Leaf])
    | l, d -> if (is_game_over board) then
        Node(board, [Leaf])
                  (fun m -> aux (d - 1) (board_place board player (fst m) (snd m)) (piece_opposite player) ) 
  aux max_depth board player

Now we are in a position to implement the evaluation function, taking into account future states. As the min max name suggests, we will need the following two functions:

let list_max l = List.fold_left Int.max Int.min_int l

let list_min l = List.fold_left Int.min Int.max_int l

let () = assert (list_min [1;3;2] = 1)
let () = assert (list_max [1;3;2] = 3)

And we are now in a position to have a full evaluation function!

let evaluate_board max_depth board player =

  let tree = make_moves_tree max_depth board player in

  let rec aux d tree = match tree with
    | Node(b, [Leaf]) -> eval player b
    | Node(b, l) -> 
      if (d mod 2 = 0) then list_max ( (aux (d+1)) l)
      else list_min ( (aux (d+1)) l)
    | Leaf -> failwith "Should not happen"

  in aux 0 tree

let () = assert ((evaluate_board 1 winning_board_cross Cross) = 1)
let () = assert ((evaluate_board 1 winning_board_cross Circle) = ~-1)

Finding the best move is just a matter of minimizing the opponent evaluation function:

let find_best_move max_depth board player =
  let moves = possible_moves board in
  let possible_boards = (fun m -> board_place board player (fst m) (snd m)) moves in
  let scores = (fun b -> evaluate_board max_depth b (piece_opposite player)) possible_boards in
  let moves_and_scores = List.combine moves scores in
  let best_move_and_score = List.sort (fun x y -> (snd x) (snd y)) moves_and_scores
                           |> List.hd in
  best_move_and_score |> fst

let almost_winning_board = [|
  [|Some(Cross); None; None|];
  [|Some(Cross); None; None|];
  [|None; Some(Circle); Some(Circle)|];

let () = assert ((find_best_move 2 almost_winning_board Circle) = (2,0))
let () = assert ((find_best_move 2 almost_winning_board Cross) = (2,0))

And the playing loop:

let max_depth = 9

let rec play board player =
  let () = board_print board in

  if (is_game_over board) then begin
      print_string "Game over!\n" ;
        if has_won Cross board then 
          print_string "You won!\n"
        else if has_won Circle board then
          print_string "Computer won!\n"
          print_string "Draw!"
    match player with
    | Cross -> 
      let () = print_string "\nEnter move...\n" in
      let () = flush stdout in
      let command = input_line stdin |> int_of_string in
      let i, j = (command / 3), (command mod 3) in
      let board' = board_place board Cross i j in
      play board' Circle

    | Circle ->
      let i, j = find_best_move max_depth board Circle in
      let board' = board_place board Circle i j in
      play board' Cross

let () = play empty_board Cross
Add arguments to Python decorators

Python decorator are a convenient way to wrap a function with another one. Per example, when timing a function, it is nice to call the timer before and after without having to rewrite the same timer.start() and timer.stop() calls everywhere.

Among other nice features of decorators, there are a couple of them which I want to stress here:

  • they apply to classes,
  • they accept arguments.

Here, I want to implement a printer for my classes showing exactly which class has called some of its methods and when. Ideally, the colors should be specified for each class, so that it makes it easy to monitor the execution of the code when various objects called different methods.

This is something I often face when dealing with machine learning pipelines: they consist of many objects (blend of models, feature pipelines) and it is hard to know exactly who is running an when, and identify bottlenecks at a glance.

Class decorators

Applying a decorator to a class allows to add method on classes directly, a little bit like inheritance. It drove my puzzled: why would one prefer a class decorator to inheritance ? See per example this question. The thing is that inheritance actually has a meaning in terms of your object: a cat is an animal, therefore, inheritance makes sense. On the other hand, if you want to add generic helpers that do not really fit into your object definitions, a class decorator is preferable.

import datetime

def class_printer(cls):

      def print_with_time(self, s):
          now_str ="%m/%d/%y, %h:%m:%s")
          print(f"[{type(self).__name__} - {now_str}] {s}")

      setattr(cls, 'print_with_time', print_with_time)

      return cls

class a:

    def __init__(self):

    def run(self):
        self.print_with_time("run method called")

if __name__ == "__main__":
    a = a()

And it outputs:

[A - 03/08/2022, 11:20:44] run method called

Great! Now, every time we need this specific printer, we don’t have much to do. Time to improve it!

Passing arguments to a decorator


import datetime

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKCYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'

def color_class_printer(color):
    def class_printer(cls):

        def print_with_time(self, s):
            now_str ="%m/%d/%Y, %H:%M:%S")
            print(f"[{color}{type(self).__name__} - {now_str}{bcolors.ENDC}] {s}")

        setattr(cls, 'print_with_time', print_with_time)

        return cls
    return class_printer
Unique elements in a list OCaml

Unique elements in a list in OCaml

First, one should note that there is a function List.sort_uniq which is proposed in OCaml. However, it requires the implementation of a comparison function (like > for integers of float numbers) and this comparison it not always obvious to implement (think about a list of arrays per example).

Instead, the following snippet will do the job!


let unique l =

  let rec aux l acc =
    match l with
    | [] ->
        List.rev acc
    | h :: t ->
        if List.mem h acc then aux t acc else aux t (h :: acc)
  aux l []

And we can run it in the toplevel:

# let unique l =

  let rec aux l acc =
    match l with
    | [] ->
        List.rev acc
    | h :: t ->
        if List.mem h acc then aux t acc else aux t (h :: acc)
  aux l []                  ;;
val unique : 'a list -> 'a list = <fun>

# unique [1;2;3] ;;
- : int list = [1; 2; 3]

# unique [1;2;3;2;4;1];;
- : int list = [1; 2; 3; 4]

Why the List.rev

The List.rev may be seen as a useless operation. It just makes sure that the elements are returned in their order of first occurence.

In order to be closer to the standard terminology used in the language, one could maybe implement two functions: unique and stable_uniq.

let unique l =

  let rec aux l acc =
    match l with
    | [] ->
    | h :: t ->
        if List.mem h acc then aux t acc else aux t (h :: acc)

let stable_unique l =

  let rec aux l acc =
    match l with
    | [] ->
        List.rev acc
    | h :: t ->
        if List.mem h acc then aux t acc else aux t (h :: acc)
  aux l []

List.mem and List.memq

Note that I used List.mem but you should be aware of the following differences (from the docs):

val mem : ‘a -> ‘a list -> bool

mem a set is true if and only if a is equal to an element of set.

val memq : ‘a -> ‘a list -> bool

Same as List.mem, but uses physical equality instead of structural equality to compare list elements.

List intersection in OCaml

Intersection of two lists in OCaml

List intersection is a simple operation that is often needed, unfortunately, it is not implemeted directly in OCaml (at least, in 4.12). Note that with large collections, sets are a better container for intersections. However, when you know you are working with a limited number of items and that performance should not be an issue, it is perfectly fine to intersect lists.

The following one liner will do the job:

let intersection l1 l2 = List.filter (fun e -> List.mem e l2) l1 

That’s it. ````List.mem``` makes sure that an element is a __mem__ber of a list, and we filter according to this condition.

You can test it in the toplevel! I recommend using it for these simple functions. It does not replace a proper unit testing, but if you are working quickly on a prototype, nobody will blame you ;)

#  let intersection l1 l2 = List.filter (fun e -> List.mem e l2) l1  ;;
val intersection : 'a list -> 'a list -> 'a list = <fun>
# intersection [1;2;3] [4] ;;
- : int list = []
# intersection [1;2;3] [1] ;;
- : int list = [1]

Intersection of a list of list

Let’s leverage the ````fold_left``` function ! It makes sense to define the intersection of a single list as the list itself.

let lists_intersection lists = match lists with
  | [] -> []
  | [h] -> h
  | h::t -> List.fold_left intersection h t

And we can test it quickly as well:

# let lists_intersection lists = match lists with                     
  | [] -> []
  | [h] -> h
  | h::t -> List.fold_left intersection h t      ;;
val lists_intersection : 'a list list -> 'a list = <fun>
# lists_intersection [[1;2;3]; [1]];;   
- : int list = [1]
# lists_intersection [[1;2;3]; [1;2]; [1;2]];;
- : int list = [1; 2]

It works!

Best books on Fermat's last theorem

This article was updated on 26th february 2023 to add a summary and new books.


I haven’t published many articles these days, the main reason being that I got attracted into the history of Fermat’s last theorem. This theorem states that the equation:

Only has trivial integer solutions (i.e. one of the elements is 0) for . This problem fascinated mathematicians for more than three centuries before it was finally solved! Indeed, it is easy to see that for , the triple does the job.

Regarding the “Maths contents”, do not be scared, all books are a good fit for motivated undergrads in mathematics. Those with one star are even more accessible.

Book Maths contents Topics Interest
Invitation to the Mathematics of Fermat-Wiles ** Approach to Wiles proof without leaving historical approaches ***
Fermat’s Last Theorem: A Genetic Introduction to Algebraic Number Theory ** Exposes the theory of algebraic integers, Kummer’s proof for regular prime and many historical details ***
Fermat’s Last Theorem for Amateurs * Many partial historical results related to FLT presented in great details. ***
13 Lectures on Fermat’s last theorem ** A lot of partial historical results related to FLT presented without details. **
Fermat a-t-il démontré son grand théorème ? L’hypothèse « Pascal » (in French) * A historical speculation around a possible proof of Fermat and a detailed analysis of his correspondance. **
Fermat last theorem None General introduction and history **
Le théorème de Fermat : son histoire by E. Nogues ** A history of research around FLT, written at the beginning of the 20th century! *


Top 3

We can decompose the history of the main approaches of Fermat’s last theorem in three epochs: the arithmetical one (Sophie Germain, Cauchy, Legendre, Pellet…), the algebraic integers epoch (Kummer, Mirimanoff…) and the modular form (Wiles) epoch. Obviously, this misses many other approaches (sieves, analysis…) but it covers a large part of the historical publications.

These three books would cover these epochs and I strongly recommended all three of them to have a beautiful journey into the study of FLT.

Invitation to the Mathematics of Fermat-Wiles by Yves Hellegouarch

This book aims to present the proof given by Wiles. Obviously, some details will not be presented, but this is an amazing introduction. Topics presented contains introction topic (the case ), Kummer’s proof… Soon enough, the author jumps to Elliptic Functions, Elliptic Curves and Modular Forms which are essential.

It contains a lot of exercises and clear proofs, if there is only one book to read on the topic (and you are not afraid of mathematical details), this is probably the best one to read!

Fermat’s Last Theorem: A Genetic Introduction to Algebraic Number Theory

This one is a detailed study of Kummer’s approach. Ribenboim books only scratch the surface of this proof while here, (almost) the whole book is dedicated to give a detailed proof of Kummer’s result. Some beautiful relations between Fermat’s equation, the Zeta function and Bernoulli numbers are prestend in a very clear way, with various numeric examples. Besides, the author also dug into the archives of the French academy of science, yielding thrilling historical notes about sessions where members wrongly speculated about their possible proofs.

Fermat’s Last Theorem for Amateurs by Paulo Ribenboim

This book presents many special cases of proofs of FLT with many details. Though the original problem is to find solutions in the set of integers, the author propose to study this equation in other sets as well: Gaussian integers (), -adic numbers (they truely are a hidden gem of number theory)… As the title states, it is for amateurs though this is presented as your usual math book: with theorem and proofs. Note that however, an emphasis is put on examples.

Other books

13 Lectures on Fermat’s last theorem by Paulo Ribenboim

This one is interesting: it has been written in 1979, before Fermat’s last theorem was actually proven. It summarizes many of the efforts put into trying to prove FLT and features some of the proofs. The historical context is essential, as many of the theorems are presented with notes regarding their history.

A large part of the book is devoted to the fascinating proof by Kummer and its limitations. Many other results with various importance are also presented: estimates, equivalence of FLT with other statements… One I found particularly interesting is:

  1. The equation has only the trivial solutions in
  2. For every non-zero, the polynomial is irreducible over

A minor regret is that many of the statements are not proved and left “as exercises” with no clue regarding their level of difficulty nor hints. It is particularly interesting to read it with Fermat’s Last Theorem for Amateurs as some of the proofs not present in this book are in the other.

However, it is remains an amazing introduction if you do not want to dig into all the details but instead are just looking for a nice overview of the history, with some details.

Fermat last theorem by Simon Singh

Requires absolutely no mathematical background. It is a very nice introduction to the problem, mostly focusing on the history of (and around) Fermat’s last theorem. Some mathematical details, along with various interesting puzzles are presented in the text and in the appendix. Really nice if you want to know what this theorem is about, and why so important it became, without digging into the mathematical details.

Fermat a-t-il démontré son grand théorème ? L’hypothèse « Pascal » (in French) by Laurent Hua and Jean Rousseau

This book is split in two parts: the first one is purely historical and does not require to know much about mathematics. The authors present the possibility that Fermat may have proved his theorem. Though the consensus seems to claim the opposite, proving than Fermat did not prove his theorem has never been done!

The first part is a thorough analysis of Fermat’s correspondance with other mathematicians of his time and it does become moving, especially towards the end. This analysis is so detailed that you even get a sense of Fermat’s sense of humor in some of the letters presented!

The second one is more mathematical, but a high school student could understand most of it. It gives what could be the starting point of a proof with the tools known by Fermat at his time. Unfortunately, it does not go too far, but the approach is interesting!

Le théorème de Fermat : son histoire by E. Nogues

Written at the beginning of the 20th century. It consists in translatiobns (in French) of all important articles regarding Fermat’s last theorem at this time. It is particularly intersting if you want to dig into the details of the history of the theorem.

Extra readings

In some of the books, you will need (on top of usual algebra, basic number theory and analysis tools) to know more advanced topics, such as Galois theory. To that end Galois Theory Through Exercises by Juliusz Brzeziński was my best read on this topic. It has a chapter completely dedicated to cyclotomic fields, which are an object commonly used in the books above!

Number Theory 1: Fermat’s Dream, by Kazuya Kato, Nobushige Kurokawa, Takeshi Saito this is actually the book that drove me into the study of Fermat’s Last Theorem. It is really well written, with a lot of figures, exercises and corrections.

Not read yet

The books below are my to read list. If you have any opinion on them, let me know in the comments!

Three Lectures on Fermat’s Last Theorem by Louis Joel Mordell

Mordell is a name that I now met many times! I want to read it mostly for its historical value, and to know what he had to say about this theorem, a century and a half (almost) ago!

A Course in Arithmetic by Jean-Pierre Serre

Please note that all the links above are affiliate links. However, having read these books, I am confident about the quality of my recommendations!

Benchmark Fossil Demand Forecasting Challenge


Zindi is hosting the Fossil Demand Forecasting Challenge, where competitors have to predict the amount of units sold for various products.

Note that the rules state that the metric to optimize is not is usual squared error, but instead, the absolute error:

The evaluation metric for this challenge is Mean Absolute Error.

All the models relying on the minimization of least squares (usual regressions, random forests with default parameters) are likely to perform poorly since they will return the mean over subsambles, while minimizing the absolute error returns the mean of the sample.

In a mathematical language:

A simple benchmark

With that knowledge, the benchmark below simply returns, for each product, the median of units sold over the year 2021. The score should be around 192xxx

import numpy as np
import pandas as pd
import random

train = pd.read_csv("../raw_data/Train.csv")
sku_names = train["sku_name"].unique()
train["year_month"] = train["year"].astype(
    str) + "/" + train["month"].astype(str)
train["date"] = pd.to_datetime(train["year_month"])
train_recent = train[train["date"] >= "2021/01"]

medians = train_recent.groupby("sku_name")["sellin"].median().to_dict()

test = pd.read_csv("../raw_data/Test.csv")
sku_names_test = test["sku_name"].unique()

missing = {}
for sku_name_test in sku_names_test:
    missing[sku_name_test] = 0

test["Target"] = test["sku_name"].replace(medians).replace(missing).astype(int)

test["Item_ID"] = test["sku_name"] + "_" + \
    test["month"].astype(str) + "_" + test["year"].astype(str)
test[["Item_ID", "Target"]].to_csv("./submission_.csv", index=False)
Minimize L1 penalty with (univariate) linear regression


Most of the regression problems I dealt with focused on minimizing the L2 norm of the difference between the predictions of a model and the true value. However, from a business point of view, it may make more sense to minimize directly the L1 norm.

Indeed, using a least square approach will mainly penalize large errors. But if the cost of an error is constant (once again, from the business perspective), then, the L1 penalty makes sense as well.

However, note that the estimator may be vastly different, if we want to use a constant model, the values of the intercept differ, one being the mean, the other one, the median.

More formally:

So far, so good. However, things become complex quite quickly: a theoretical advantage of L2 penalty is that it makes the penalty differientiable, and we enjoy many closed formulas for various problems relying on L2 penatlies.


I will focus on implementing the univariate case, without intercept. Including intercept or multivariate case relies on a (much) more complex optimization.

Calling we note that is a convex function, whose derivative is not continuous.

It derivative, where it is defined, is:

Given that is convex, must be monotonic. Besides and .

Therefore, we will look for a zero of using dichotomy.


As detailed above, one we have the penalty and the dichotomy algorithm implemented, there is nothing else to do:

import numpy as np

def dichotomy(n, f, l, r):

    c = (l + r) / 2
    if n ==0:
        return c

        if f(c) < 0 :
            return dichotomy(n-1, f, c, r)
            return dichotomy(n-1, f, l, c)

def penalty(x, y, b):
    return -np.sum(x * np.sign(y - b * x))

class L1Regressor:

    def __init__(self, n_iterations=20):
        self.n_iterations = n_iterations
        self.b = None

    def fit(self, x, y):
        ratios = y / x
        l, r = np.min(ratios), np.max(ratios)
        self.b = dichotomy(self.n_iterations, lambda b: penalty(x, y, b), l, r)
        return self


If we append the following:

if __name__ == "__main__":

    import matplotlib.pyplot as plt

    x = np.random.rand(100)
    y = x * 0.5 + 0.1 * np.random.normal(size=100)

    slope = L1Regressor().fit(x, y).b

    plt.scatter(x, y)
    plt.plot(x, x*slope, c = 'r')

We obtain:

L1 regression

Comparison with L2

We can add some noise to the observations and we expect to have a more robust regression with L1 penalty.

Below are plotted the two slopes: in red, the L1 penalty is minimized, in green, the L2 penalty.

L1 regression vs L2

    from sklearn.linear_model import LinearRegression

    x = np.random.rand(100)
    y = x * 0.5 + 0.1 * np.random.normal(size=100)

    x[:10] = 0.9 + 0.1 * np.random.rand(10)
    y[:10] = 2 + x[:10] * 0.5 + np.random.normal(size=10)

    slope = L1Regressor().fit(x, y).b
    slopel2 = LinearRegression(fit_intercept=False).fit(x.reshape(-1,1), y).coef_[0]

    plt.scatter(x, y)
    plt.plot(x, x*slope, c = 'r')
    plt.plot(x, x*slopel2, c = 'g')
Random number generation in Cython


In one of my programs, I had to perform (a lot of) random sampling from Python lists. So much that it ended up being my bottleneck.

Without going in too much details, the function was mostly generating random numbers and accessing elements in lists. I gave it a simple Cython try with something along these lines:

import random
import cython

def sample(int n_sampling, l):
    a = []
    for _ in range(n_sampling):
    return a

def rando(int n_sampling, l):
    a = []
    for _ in range(n_sampling):
    return a

Cython usually needs the following setup code:

from setuptools import setup
from Cython.Build import cythonize

    ext_modules = cythonize("fast_sampler.pyx")

And the following command will build the code:

python build_ext --inplace

The timing results:

python -mtimeit -s"import fast_sampler" "fast_sampler.sample(10000,[1,2,3])"
5000 loops, best of 5: 61.8 usec per loop

Accessing elements in a loop seems quick enough.

python -mtimeit -s"import fast_sampler" "fast_sampler.rando(10000,[1,2,3])"
50 loops, best of 5: 5.77 msec per loop

However, the calls to random.randrange() seem to be the bottleneck.

If we add this cimport statement, we can directly call rand()

from libc.stdlib cimport rand

def rando_c(int n_sampling, l):

    a = []
    for _ in range(n_sampling):
        a.append(rand() % 3)
    return a

And finally:

python -mtimeit -s"import fast_sampler" "fast_sampler.rando_c(10000,[1,2,3])"
2000 loops, best of 5: 104 usec per loop

Which brings a 50x speedup!

What about the seed ?

Usually, it is a good practice to add these lines in a Python code when dealing with random number generation (to ensure reproducibility):


By default, rand() always returns the same numbers, as long as you do not call srand() before, so you do not have to worry about them any more ! (At least, not in this part of your code).

Vim for datascience

There are plenty of tutorials here and there to have Python and vim interact beautifully. The point of this one is to provide some simple lines to add to you .vimrc file without to worry too much about installing more (vim) packages. Having myself struggled to implement this line, I will provide some explanations about the meaning of each of them.

If you have more tricks for your common datascience tasks in vim, let me know, I will add them!



Here are the thing you can do with the following settings:

  • Associate the common import to keypresses,
  • Preview the contents of a csv file in a vim pane,
  • Format JSON files with jq or python files with autopep8,
  • Quickly add print() statements,
  • Fix the usual copy paste into vim (bonus).

If you are familiar with vim, you will know that you can do pretty much everything with a sequence of keypresses. Recording this keypresses and mapping them to another key just factors everything you want to do ;)


Python packages: pandas, autopep8, numpy Packages: jq.

Data preview

The function in action

I start with the hardest but most satisfying command:

autocmd FileType python map <C-F9> va"y:!python -c "import pandas as pd; df=pd.read_csv('<C-R>"', nrows=5); print(df)" > tmp.tmp<CR>:sv tmp.tmp<CR>:resize 8<CR>

It will show the first five lines of the .csv file in the quotes surrounding the cursor in a new vim pane.


autocmd FileType python is just saying that the mapping which follows will only apply to python files. This avoids accidental application to other languages.

map <C-F9> means map Ctrl + F9 to the following sequence of keypresses

va"y is a way to tell vim :

  • select v

  • around a

  • quotes "

  • copy y (to register)

:! allows to execute vim commands in your actual terminal

python -c "import pandas as pd; df=pd.read_csv('<C-R>"', nrows=5); print(df)" now we are doing one line python, the only trick here is the <C-R> which refers to vim clipboard (or register), so what we stored when “pressing” va"y.

> tmp.tmp<CR>:sv tmp.tmp<CR>:resize 8<CR> outputs the Python print statement to a tmp file (tmp.tmp) which in turn is opened by vim (with :sv)

Beautifying files


This one needs autopep8 installed. Otherwise, it will just remove everything in the file you are editing…

autocmd FileType python map <F4> :!autopep8 --in-place --aggressive %<CR>

It will format your Python scripts using the autopep8 guidelines.


This one needs to have jq installed. It is a tool to manipulate JSON files easily and I strongly recommend using it.

autocmd FileType json map <F4> :%! jq .<CR>

Upon pressing <F4> it will ident your file beautifully.



If I want to execute quickly the script I am working on, these two lines enable to do it (whether I am in visual or edit mode)

autocmd FileType python map <F5> :wall!<CR>:!python %<CR>
autocmd FileType python imap <F5> <Esc>:wall!<CR>:!python %<CR>

It is ideal when you are used to test your classes like this:

from collections import defaultdict

class MarkovLikelihood:

    def __init__(self, alpha):
        self.alpha_ = alpha
        self.transition_counts_ = defaultdict(lambda: 0)
        self.start_counts = defaultdict(lambda: 1)

    def fit(self, sentences):
        for sentence in sentences:
        return self
    def update_(self, sentence):
        words = sentence.split(' ')
        for w1, w2 in self.pairwise_(words):
            self.transition_counts_[f"{w1}_{w2}"] += 1
            self.start_counts[w1] += 1

    def pairwise_(self, iterable):
        a = iter(iterable)
        return zip(a, a)

    def predict(self, sentence):
        res = 1
        words = sentence.split(' ')
        n = len(words)
        for w1, w2 in self.pairwise_(words):
            res *= (self.transition_counts_[f"{w1}_{w2}"] + self.alpha_) / self.start_counts[w1]

        return res
if __name__ == "__main__":

    ml = MarkovLikelihood(0.5)
    sentences = [ 
        "I ate dinner.",
        "We had a three-course meal.",
        "Brad came to dinner with us.",
        "He loves fish tacos.",
        "In the end, we all felt like we ate too much.",
        "We all agreed; it was a magnificent evening."]

    res = ml.predict("dinner with tacos")
    res = ml.predict("I love tennis")


The following two lines allow to have the most common imports with a couple of keypresses:

autocmd FileType python map <C-F10> ggiimport pandas as pd<CR>import numpy as np<CR>np.random.seed(0)<CR><Esc>
autocmd FileType python map <C-F11> ggiimport matplotlib.pyplot as plt<CR>import seaborn as sns<CR><Esc>

Will add the following to the Python file you are working on. Note that gg makes sure to place the cursor at the top of the file first.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

Quick print

autocmd FileType python map <C-p> viwyoprint(<Esc>pA)<Esc>

if your cursor is on a word (my_variable), it will simply append a print(my_variable) statement below the current line. Useful for debugging.

Fixing copy and paste

" Automatically set paste mode in Vim when pasting in insert mode
let &t_SI .= "\<Esc>[?2004h"
let &t_EI .= "\<Esc>[?2004l"

inoremap <special> <expr> <Esc>[200~ XTermPasteBegin()

function! XTermPasteBegin()
  set pastetoggle=<Esc>[201~
  set paste
    return ""
Random Greedy Forest tutorial


A not so famous algorithm

Regularized Greedy Forest is a quite recent algorithm in machine learning, the article “Learning Nonlinear Functions Using Regularized Greedy Forest” has been published in 2014.

If we want to compare it to gradient boosting, which seems to have been studied back in 1999, it took a while before this algorithm received its many reliable and fast implementations (xgboost, catboost, LightGBM).

It seems to be a very good candidate in terms of performance. However, as we will see in the parameters section, it requires some tuning (regularization and number of leaves can be application critical).


As stated by the authors, Regularized Greedy Forest can achieve better performance than gradient boosting approaches:

In contrast to these traditional boosting algorithms that treat a tree learner as a black box, the method we propose directly learns decision forests via fully-corrective regularized greedy search using the underlying forest structure. Our method achieves higher accuracy and smaller models than gradient boosting on many of the datasets we have tested on.

And if you are skeptical about benchmarks proposed by the authors, check out the following posts, related to data science competitions:

On top of that, I noted that on some dataset I have comparable performance between xgboost and RGF after a careful tuning of the parameters for both the models.

How does it work ?

The algorithm

As for gradient boosting and random forests, the idea is to train a collection of decision trees. However, the key difference is that you are allowed to modify previously trained trees and weights attributed to each tree if that improves the performance of the overall model.

To make things more clear:

  • In the case of random forests, all the trees are trained simultaneously and regardless of each other performance.
  • In the case of gradient boosting, a new tree is trained on the residuals of the previous trees.
  • In the case of random greedy forest, things are more complicated :) At each step we may either start a new tree, or split an existing leaf node. Then, the weights of each leaf are adjusted, to optimize the loss function.

To put it in equations, when we try to learn some objective function what we do is to solve this type of optimization program.

The space of functions where lives being quite large, we usually rely on heuristics to make the above problem solvable in an acceptable amount of time.

Per example, in the case of gradient boosting, we solve a series of training of decision trees, with the following induction:


So that each step is as simple as training a decision tree, and the training time is, roughly, the number of trees times the training time of a tree.

In the case of the Regularized Greedy Forest, the procedure is as follows:

  • Fix the weights, and change the structure of the forest (which changes basis functions) so that the loss $Q(F)$ is reduced the most.
  • Fix the structure of the forest, and change the weights so that lossQ(F) is minimized

Cross validating a random greedy forest

The cross validation is usual, here are the roles of the different parameters.

Parameters description

The parameters are sorted by order of importance in terms of their impact on the accuracy of the model.

max_leaf : the total number of leaves in the forest. Note that given the training procedure, we never have to specify the total number of trees needed in the forest. Beware, the larger this parameter, the longer the training. By default, the value is 1000 for RGFClassifier and 500 RGFRegressor.

l2 : the penalty. This parameter has to be tuned to obtain a good performance. By default, the value is 0.1 but smaller values usually improve performance.

n_tree_search : (1 by default) Number of trees to be searched for the nodes to split.

algorithm one of (“RGF”, “RGF_Opt”, “RGF_Sib”), where the algorithm are the following:

  • RGF: RGF with L2 regularization on leaf-only models.
  • RGF Opt: RGF with min-penalty regularization.
  • RGF Sib: RGF with min-penalty regularization with the sum-to-zero sibling constraints.

By default, the algorithm is “RGF” for both RGFClassifier() and RGFRegressor().

reg_depth : Must be no smaller than 1.0. Meant for being used with algorithm="RGF_Opt"|"RGF_Sib". A larger value penalizes deeper nodes more severely.

loss one of ("LS", "Expo", "Log", "Abs"), by default this is LS for regressions and Log for classification.

  • LS: Square loss,
  • Expo: Exponential loss,
  • Log: Logistic loss,
  • Abs: Absolute error loss.

n_iter : Number of iterations of coordinate descent to optimize weights. If None, 10 is used for loss=”LS” and 5 for loss=”Expo” or “Log”. Not critical to improve accuracy.

Classification only

calc_prob : One of (“sigmoid”, “softmax”), by default “sigmoid”. I guess it will not affect accuracy or roc_auc, but may affect logloss.



Fortunately, the implementation comes with a scikit learn interface, therefore you can use the usual .fit() and .transform() methods, so there is not much to say about how to use it. Per example, the following will cross validate a random greedy forest.

from sklearn import datasets
from sklearn.utils.validation import check_random_state
from sklearn.model_selection import StratifiedKFold, cross_val_score
from rgf.sklearn import RGFClassifier

iris = datasets.load_iris()
rng = check_random_state(0)
perm = rng.permutation( =[perm] =[perm]

rgf = RGFClassifier(max_leaf=400,

n_folds = 3

rgf_scores = cross_val_score(rgf,

rgf_score = sum(rgf_scores)/n_folds
print('RGF Classifier score: {0:.5f}'.format(rgf_score))


Learning Nonlinear Functions Using Regularized Greedy Forest is the main article on the topic.

The RGF implementation for the basic implementation, and for the sparse implementation (FastRGF).