Best books on Fermat's last theorem

I haven’t published many articles these days, the main reason being that I got attracted into the history of Fermat’s last theorem. This theorem states that the equation:

Only has trivial integer solutions (i.e. one of the elements is 0) for . This problem fascinated mathematicians for more than three centuries before it was finally solved!

Here is the collection of books I read so far, I ordered them from the least mathematical oriented to the toughest.

Fermat last theorem by Simon Singh

Requires absolutely no mathematical background. It is a very nice introduction to the problem, mostly focusing on the history of (and around) Fermat’s last theorem. Some mathematical details, along with various interesting puzzles are presented in the text and in the appendix. Really nice if you want to know what this theorem is about, and why so important it became, without digging into the mathematical details.

Fermat a-t-il démontré son grand théorème ? L’hypothèse « Pascal » (in French) by Laurent Hua and Jean Rousseau

This book is split in two parts: the first one is purely historical and does not require to know much about mathematics. The authors present the possibility that Fermat may have proved his theorem. Though the consensus seems to claim the opposite, proving than Fermat did not prove his theorem has never been done!

The first part is a thorough analysis of Fermat’s correspondance with other mathematicians of his time and it does become moving, especially towards the end. This analysis is so detailed that you even get a sense of Fermat’s sense of humor in some of the letters presented!

The second one is more mathematical, but a high school student could understand most of it. It gives what could be the starting point of a proof with the tools known by Fermat at his time. Unfortunately, it does not go too far, but the approach is interesting!

Fermat’s Last Theorem for Amateurs by Paulo Ribenboim

This book presents many special cases of proofs of FLT with many details. Though the original problem is to find solutions in the set of integers, the author propose to study this equation in other sets as well: Gaussian integers (), -adic numbers (they truely are a hidden gem of number theory)… As the title states, it is for amateurs though this is presented as your usual math book: with theorem and proofs. Note that however, an emphasis is put on examples.

Invitation to the Mathematics of Fermat-Wiles by Yves Hellegouarch

This book aims to present the proof given by Wiles. Obviously, some details will not be presented, but this is an amazing introduction. Topics presented contains introction topic (the case ), Kummer’s proof… Soon enough, the author jumps to Elliptic Functions, Elliptic Curves and Modular Forms which are essential.

It contains a lot of exercises and clear proofs, if there is only one book to read on the topic (and you are not afraid of mathematical details), this is probably the best one to read!

13 Lectures on Fermat’s last theorem by Paulo Ribenboim

This one is interesting: it has been written in 1979, before Fermat’s last theorem was actually proven. It summarizes many of the efforts put into trying to prove FLT and features some of the proofs. The historical context is essential, as many of the theorems are presented with notes regarding their history.

A large part of the book is devoted to the fascinating proof by Kummer and its limitations. Many other results with various importance are also presented: estimates, equivalence of FLT with other statements… One I found particularly interesting is:

  1. The equation has only the trivial solutions in
  2. For every non-zero, the polynomial is irreducible over

A minor regret is that many of the statements are not proved and left “as exercises” with no clue regarding their level of difficulty nor hints. It is particularly interesting to read it with Fermat’s Last Theorem for Amateurs as some of the proofs not present in this book are in the other.

Extra readings

In some of the books, you will need (on top of usual algebra, basic number theory and analysis tools) to know more advanced topics, such as Galois theory. To that end Galois Theory Through Exercises by Juliusz Brzeziński was my best read on this topic. It has a chapter completely dedicated to cyclotomic fields, which are an object commonly used in the books above!

Number Theory 1: Fermat’s Dream, by Kazuya Kato, Nobushige Kurokawa, Takeshi Saito this is actually the book that drove me into the study of Fermat’s Last Theorem. It is really well written, with a lot of figures, exercises and corrections.

Not read yet

The books below are my to read list. If you have any opinion on them, let me know in the comments!

Three Lectures on Fermat’s Last Theorem by Louis Joel Mordell

Mordell is a name that I now met many times! I want to read it mostly for its historical value, and to know what he had to say about this theorem, a century and a half (almost) ago!

A Course in Arithmetic by Jean-Pierre Serre

Please note that all the links above are affiliate links. However, having read these books, I am confident about the quality of my recommendations!

Benchmark Fossil Demand Forecasting Challenge

Introduction

Zindi is hosting the Fossil Demand Forecasting Challenge, where competitors have to predict the amount of units sold for various products.

Note that the rules state that the metric to optimize is not is usual squared error, but instead, the absolute error:

The evaluation metric for this challenge is Mean Absolute Error.

All the models relying on the minimization of least squares (usual regressions, random forests with default parameters) are likely to perform poorly since they will return the mean over subsambles, while minimizing the absolute error returns the mean of the sample.

In a mathematical language:

A simple benchmark

With that knowledge, the benchmark below simply returns, for each product, the median of units sold over the year 2021. The score should be around 192xxx

import numpy as np
import pandas as pd
import random
random.seed(0)
np.random.seed(0)


train = pd.read_csv("../raw_data/Train.csv")
sku_names = train["sku_name"].unique()
train["year_month"] = train["year"].astype(
    str) + "/" + train["month"].astype(str)
train["date"] = pd.to_datetime(train["year_month"])
train_recent = train[train["date"] >= "2021/01"]


medians = train_recent.groupby("sku_name")["sellin"].median().to_dict()

test = pd.read_csv("../raw_data/Test.csv")
sku_names_test = test["sku_name"].unique()

missing = {}
for sku_name_test in sku_names_test:
    missing[sku_name_test] = 0


test["Target"] = test["sku_name"].replace(medians).replace(missing).astype(int)

test["Item_ID"] = test["sku_name"] + "_" + \
    test["month"].astype(str) + "_" + test["year"].astype(str)
test[["Item_ID", "Target"]].to_csv("./submission_.csv", index=False)
Random number generation in Cython

Problem

In one of my programs, I had to perform (a lot of) random sampling from Python lists. So much that it ended up being my bottleneck.

Without going in too much details, the function was mostly generating random numbers and accessing elements in lists. I gave it a simple Cython try with something along these lines:

import random
import cython

@cython.boundscheck(False)
def sample(int n_sampling, l):
    a = []
    for _ in range(n_sampling):
        a.append(l[0])
    return a

@cython.boundscheck(False)
def rando(int n_sampling, l):
    a = []
    for _ in range(n_sampling):
        a.append(random.randrange(3))
    return a

Cython usually needs the following setup code:

from setuptools import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("fast_sampler.pyx")
)

And the following command will build the code:

python setup.py build_ext --inplace

The timing results:

python -mtimeit -s"import fast_sampler" "fast_sampler.sample(10000,[1,2,3])"
5000 loops, best of 5: 61.8 usec per loop

Accessing elements in a loop seems quick enough.

python -mtimeit -s"import fast_sampler" "fast_sampler.rando(10000,[1,2,3])"
50 loops, best of 5: 5.77 msec per loop

However, the calls to random.randrange() seem to be the bottleneck.

If we add this cimport statement, we can directly call rand()

from libc.stdlib cimport rand

@cython.boundscheck(False)
def rando_c(int n_sampling, l):

    a = []
    for _ in range(n_sampling):
        a.append(rand() % 3)
    return a

And finally:

python -mtimeit -s"import fast_sampler" "fast_sampler.rando_c(10000,[1,2,3])"
2000 loops, best of 5: 104 usec per loop

Which brings a 50x speedup!

What about the seed ?

Usually, it is a good practice to add these lines in a Python code when dealing with random number generation (to ensure reproducibility):

random.seed(0)
np.random.seed(0)

By default, rand() always returns the same numbers, as long as you do not call srand() before, so you do not have to worry about them any more ! (At least, not in this part of your code).

Vim for datascience

There are plenty of tutorials here and there to have Python and vim interact beautifully. The point of this one is to provide some simple lines to add to you .vimrc file without to worry too much about installing more (vim) packages. Having myself struggled to implement this line, I will provide some explanations about the meaning of each of them.

If you have more tricks for your common datascience tasks in vim, let me know, I will add them!

Introduction

Summary

Here are the thing you can do with the following settings:

  • Associate the common import to keypresses,
  • Preview the contents of a csv file in a vim pane,
  • Format JSON files with jq or python files with autopep8,
  • Quickly add print() statements,
  • Fix the usual copy paste into vim (bonus).

If you are familiar with vim, you will know that you can do pretty much everything with a sequence of keypresses. Recording this keypresses and mapping them to another key just factors everything you want to do ;)

Requirements

Python packages: pandas, autopep8, numpy Packages: jq.

Data preview

The function in action

I start with the hardest but most satisfying command:

autocmd FileType python map <C-F9> va"y:!python -c "import pandas as pd; df=pd.read_csv('<C-R>"', nrows=5); print(df)" > tmp.tmp<CR>:sv tmp.tmp<CR>:resize 8<CR>

It will show the first five lines of the .csv file in the quotes surrounding the cursor in a new vim pane.

Details

autocmd FileType python is just saying that the mapping which follows will only apply to python files. This avoids accidental application to other languages.

map <C-F9> means map Ctrl + F9 to the following sequence of keypresses

va"y is a way to tell vim :

  • select v

  • around a

  • quotes "

  • copy y (to register)

:! allows to execute vim commands in your actual terminal

python -c "import pandas as pd; df=pd.read_csv('<C-R>"', nrows=5); print(df)" now we are doing one line python, the only trick here is the <C-R> which refers to vim clipboard (or register), so what we stored when “pressing” va"y.

> tmp.tmp<CR>:sv tmp.tmp<CR>:resize 8<CR> outputs the Python print statement to a tmp file (tmp.tmp) which in turn is opened by vim (with :sv)

Beautifying files

Python

This one needs autopep8 installed. Otherwise, it will just remove everything in the file you are editing…

autocmd FileType python map <F4> :!autopep8 --in-place --aggressive %<CR>

It will format your Python scripts using the autopep8 guidelines.

JSON

This one needs to have jq installed. It is a tool to manipulate JSON files easily and I strongly recommend using it.

autocmd FileType json map <F4> :%! jq .<CR>

Upon pressing <F4> it will ident your file beautifully.

Python

Execution

If I want to execute quickly the script I am working on, these two lines enable to do it (whether I am in visual or edit mode)

autocmd FileType python map <F5> :wall!<CR>:!python %<CR>
autocmd FileType python imap <F5> <Esc>:wall!<CR>:!python %<CR>

It is ideal when you are used to test your classes like this:

from collections import defaultdict

class MarkovLikelihood:

    def __init__(self, alpha):
        self.alpha_ = alpha
        self.transition_counts_ = defaultdict(lambda: 0)
        self.start_counts = defaultdict(lambda: 1)

    def fit(self, sentences):
        for sentence in sentences:
            self.update_(sentence)
        return self
    
    def update_(self, sentence):
        words = sentence.split(' ')
        for w1, w2 in self.pairwise_(words):
            self.transition_counts_[f"{w1}_{w2}"] += 1
            self.start_counts[w1] += 1

    def pairwise_(self, iterable):
        a = iter(iterable)
        return zip(a, a)

    def predict(self, sentence):
        res = 1
        words = sentence.split(' ')
        n = len(words)
        for w1, w2 in self.pairwise_(words):
            res *= (self.transition_counts_[f"{w1}_{w2}"] + self.alpha_) / self.start_counts[w1]

        return res
    
if __name__ == "__main__":


    ml = MarkovLikelihood(0.5)
    sentences = [ 
        "I ate dinner.",
        "We had a three-course meal.",
        "Brad came to dinner with us.",
        "He loves fish tacos.",
        "In the end, we all felt like we ate too much.",
        "We all agreed; it was a magnificent evening."]

    ml.fit(sentences)

    res = ml.predict("dinner with tacos")
    print(res)
    res = ml.predict("I love tennis")
    print(res)

Imports

The following two lines allow to have the most common imports with a couple of keypresses:

autocmd FileType python map <C-F10> ggiimport pandas as pd<CR>import numpy as np<CR>np.random.seed(0)<CR><Esc>
autocmd FileType python map <C-F11> ggiimport matplotlib.pyplot as plt<CR>import seaborn as sns<CR><Esc>

Will add the following to the Python file you are working on. Note that gg makes sure to place the cursor at the top of the file first.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(0)

Quick print

autocmd FileType python map <C-p> viwyoprint(<Esc>pA)<Esc>

if your cursor is on a word (my_variable), it will simply append a print(my_variable) statement below the current line. Useful for debugging.

Fixing copy and paste

" Automatically set paste mode in Vim when pasting in insert mode
" https://coderwall.com/p/if9mda/automatically-set-paste-mode-in-vim-when-pasting-in-insert-mode
let &t_SI .= "\<Esc>[?2004h"
let &t_EI .= "\<Esc>[?2004l"

inoremap <special> <expr> <Esc>[200~ XTermPasteBegin()

function! XTermPasteBegin()
  set pastetoggle=<Esc>[201~
  set paste
    return ""
endfunction
Random Greedy Forest tutorial

Introduction

A not so famous algorithm

Regularized Greedy Forest is a quite recent algorithm in machine learning, the article “Learning Nonlinear Functions Using Regularized Greedy Forest” has been published in 2014.

If we want to compare it to gradient boosting, which seems to have been studied back in 1999, it took a while before this algorithm received its many reliable and fast implementations (xgboost, catboost, LightGBM).

It seems to be a very good candidate in terms of performance. However, as we will see in the parameters section, it requires some tuning (regularization and number of leaves can be application critical).

Performance

As stated by the authors, Regularized Greedy Forest can achieve better performance than gradient boosting approaches:

In contrast to these traditional boosting algorithms that treat a tree learner as a black box, the method we propose directly learns decision forests via fully-corrective regularized greedy search using the underlying forest structure. Our method achieves higher accuracy and smaller models than gradient boosting on many of the datasets we have tested on.

And if you are skeptical about benchmarks proposed by the authors, check out the following posts, related to data science competitions:

On top of that, I noted that on some dataset I have comparable performance between xgboost and RGF after a careful tuning of the parameters for both the models.

How does it work ?

The algorithm

As for gradient boosting and random forests, the idea is to train a collection of decision trees. However, the key difference is that you are allowed to modify previously trained trees and weights attributed to each tree if that improves the performance of the overall model.

To make things more clear:

  • In the case of random forests, all the trees are trained simultaneously and regardless of each other performance.
  • In the case of gradient boosting, a new tree is trained on the residuals of the previous trees.
  • In the case of random greedy forest, things are more complicated :) At each step we may either start a new tree, or split an existing leaf node. Then, the weights of each leaf are adjusted, to optimize the loss function.

To put it in equations, when we try to learn some objective function what we do is to solve this type of optimization program.

The space of functions where lives being quite large, we usually rely on heuristics to make the above problem solvable in an acceptable amount of time.

Per example, in the case of gradient boosting, we solve a series of training of decision trees, with the following induction:

,

So that each step is as simple as training a decision tree, and the training time is, roughly, the number of trees times the training time of a tree.

In the case of the Regularized Greedy Forest, the procedure is as follows:

  • Fix the weights, and change the structure of the forest (which changes basis functions) so that the loss $Q(F)$ is reduced the most.
  • Fix the structure of the forest, and change the weights so that lossQ(F) is minimized

Cross validating a random greedy forest

The cross validation is usual, here are the roles of the different parameters.

Parameters description

The parameters are sorted by order of importance in terms of their impact on the accuracy of the model.

max_leaf : the total number of leaves in the forest. Note that given the training procedure, we never have to specify the total number of trees needed in the forest. Beware, the larger this parameter, the longer the training. By default, the value is 1000 for RGFClassifier and 500 RGFRegressor.

l2 : the penalty. This parameter has to be tuned to obtain a good performance. By default, the value is 0.1 but smaller values usually improve performance.

n_tree_search : (1 by default) Number of trees to be searched for the nodes to split.

algorithm one of (“RGF”, “RGF_Opt”, “RGF_Sib”), where the algorithm are the following:

  • RGF: RGF with L2 regularization on leaf-only models.
  • RGF Opt: RGF with min-penalty regularization.
  • RGF Sib: RGF with min-penalty regularization with the sum-to-zero sibling constraints.

By default, the algorithm is “RGF” for both RGFClassifier() and RGFRegressor().

reg_depth : Must be no smaller than 1.0. Meant for being used with algorithm="RGF_Opt"|"RGF_Sib". A larger value penalizes deeper nodes more severely.

loss one of ("LS", "Expo", "Log", "Abs"), by default this is LS for regressions and Log for classification.

  • LS: Square loss,
  • Expo: Exponential loss,
  • Log: Logistic loss,
  • Abs: Absolute error loss.

n_iter : Number of iterations of coordinate descent to optimize weights. If None, 10 is used for loss=”LS” and 5 for loss=”Expo” or “Log”. Not critical to improve accuracy.

Classification only

calc_prob : One of (“sigmoid”, “softmax”), by default “sigmoid”. I guess it will not affect accuracy or roc_auc, but may affect logloss.

Benchmarks

Code

Fortunately, the implementation comes with a scikit learn interface, therefore you can use the usual .fit() and .transform() methods, so there is not much to say about how to use it. Per example, the following will cross validate a random greedy forest.

from sklearn import datasets
from sklearn.utils.validation import check_random_state
from sklearn.model_selection import StratifiedKFold, cross_val_score
from rgf.sklearn import RGFClassifier

iris = datasets.load_iris()
rng = check_random_state(0)
perm = rng.permutation(iris.target.size)
iris.data = iris.data[perm]
iris.target = iris.target[perm]

rgf = RGFClassifier(max_leaf=400,
                    algorithm="RGF_Sib",
                    test_interval=100,
                    verbose=True)

n_folds = 3

rgf_scores = cross_val_score(rgf,
                             iris.data,
                             iris.target,
                             cv=StratifiedKFold(n_folds))

rgf_score = sum(rgf_scores)/n_folds
print('RGF Classifier score: {0:.5f}'.format(rgf_score))

References

Learning Nonlinear Functions Using Regularized Greedy Forest is the main article on the topic.

The RGF implementation for the basic implementation, and for the sparse implementation (FastRGF).

Does gradient boosting overfit

What is overfit ?

Overfit is somehow what happens when you “train your model too much”. In that case, you achieve a very good training accuracy, while the test accuracy is usually poor. Think about a one dimensional regression. You may either fit a straight line, or draw a line that passes through every point. Which one will generalize best ?

In the cases of some models, there are some specific parameters that can control the balance between “passing through every point” and “drawing a straight line”. In the case of gradient boosting, this will be the number of trees, in the case of neural networks, it will be the number of iterations in the gradient descent, in the case of support vector machines, a combination of parameters…

A quite common figure regarding overfit is the following:

Overfitting

It applies mostly to neural network, where the abscissa represents the number of epochs, the blue line the training loss and the red line the validation loss.

The same question applies to gradient boosting, where the number of trees if quite critical and could replace the abscissa on the upper graph, see per example this question. Some people seem to claim that it should not overfit (as random forest do not overfit).

Simulations

A cool python library

While looking for bencharmk data, I found pmlb which stands for Penn Machine Learning Benchmark. Basically it enables to download different real word datasets.

More info can be found in the project page.

Method

I used xgboost for which I simply increased the number of trees.

for n_trees in [10, 20, 50, 100, 200, 1000, 1500, 2000, 3000, 4000, 10000, 30000]:
    for max_depth in [5]:
        for learning_rate in [0.001]:
            yield {
                    "name": "XGBClassifier",
                    "parameters": {
                        "n_estimators": n_trees,
                        "max_depth": max_depth,
                        "learning_rate": learning_rate,
                        }
              }

On the following datasets:

['heart_statlog', 'hepatitis', 'horse_colic', 'house_votes_84', 'hungarian', 'hypothyroid', 'ionosphere', 'iris', 'irish', 'kr_vs_kp', 'krkopt', 'labor', 'led24', 'led7', 'letter', 'lupus', 'lymphography', 'magic']

The performance of each model was then evaluated using a 5 folds cross validation for the following metrics ["roc_auc", "accuracy", "neg_log_loss"].

Does xgboost overfit ?

The graphs below seem to say that increasing the number of trees may harm the performance of the model. However, in some cases, even very large number of trees are beneficial to the model.

Graphs

Accuracy Log Loss ROC AUC

Comparison with random forest

However, the comparison with random forests is needed to understand what was at stake above:

Accuracy Log Loss ROC AUC

Code

from pmlb import fetch_data
import pandas as pd
import numpy as np
from sklearn.model_selection import cross_validate
from model_builder import model_builder


def run(model_parameters, dataset_key, metrics):
    model = model_builder(model_parameters)
    X, y = fetch_data(dataset_key, return_X_y=True)
    if len(np.unique(y)) != 2:
        print("Problem is not binary")
        return None

    res = cross_validate(model, X, y, scoring=metrics)

    row = {
            "fit_time": np.mean(res["fit_time"]),
            "n": X.shape[0],
            "p": X.shape[1]
            }
    for metric in metrics:
        row[metric] = np.mean(res[f"test_{metric}"])
        row[f"{metric}_std"] = np.std(res[f"test_{metric}"])
    return row


def benchmark():

    for n_trees in [10, 20, 50, 100, 200, 1000, 1500, 2000, 3000, 4000, 10000, 30000]:
        for max_depth in [5]:
            for learning_rate in [0.001]:
                yield {
                        "name": "XGBClassifier",
                        "parameters": {
                            "n_estimators": n_trees,
                            "max_depth": max_depth,
                            "learning_rate": learning_rate,
                            }
                    }

dataset_keys = ['heart_statlog', 'hepatitis', 'horse_colic', 'house_votes_84', 'hungarian', 'hypothyroid', 'ionosphere', 'iris', 'irish', 'kr_vs_kp', 'krkopt', 'labor', 'led24', 'led7', 'letter', 'lupus', 'lymphography', 'magic']
rows = []

for dataset_key in dataset_keys:
    for model in benchmark():
        row = run(model, dataset_key, ["roc_auc", "accuracy", "neg_log_loss"])
        if row is not None:
            row["model"] = model["name"]
            row["data"] = dataset_key
            row = {**row, **model["parameters"]}
            print(row)
            rows.append(row)

output_data = pd.DataFrame(rows)
output_data.to_csv("./xgb_overfit.csv", index=False)

And the graphs can be produced easily using the nice .plot() methods proposed by pandas and matplotlib.

import pandas as pd
import matplotlib.pyplot as plt

for metric in ["roc_auc", "accuracy", "neg_log_loss"]:
    benchmark_data = pd.read_csv("xgb_overfit.csv")
    benchmark_data.set_index("n_estimators", inplace=True)
    benchmark_data.groupby("data")[metric].plot(legend=True, logx=True, title=metric)
    plt.legend(loc="lower left")
    plt.savefig(f"xgb_{metric}.svg")
    plt.clf()

References

  • xgboost one of the best gradient boostin libraries available.

  • pmlb a Python library providing various benchmark datasets.

Keras memory leak

Keras memory usage keeps increasing

I was having fun, attempting to do some deep learning with a 2M lines dataset (nothing my computer can’t handle, xgboost was running with roughly 15% of my RAM) when suddenly, as I was adding neural networks in my fancy stacked models, the script kept failing, the memory usage went to the moon, etc, etc.

What did I do wrong ? Did I introduce a memory leak between my model stacking / neural network factory code ? I would be suprised, it worked fine with every other model. And a neural network is more or less a simple vector of floats (in my case, with only hundreds of parameters) so there is no reason for it to be that big.

The only thing I was attempting to do was to cross validate different neural networks, with different architectures.

So, after a quick research : I found this stack overflow question , also some people mentioning a weird behavior coming from model.predict() . Another Github issue is simply called Memory leak . There even is another article simply titled Dealing with memory leak issue in Keras model training and is even mentioned on twitter .

What I ended up suspecting is that there are actually many memory leaks from different methods in the code. So I gathered the list of workarounds I could find.

Workarounds

Beware, none of them actually works. Some just alleviate the pain, but most likely, the memory usage will keep increasing. Anyways, the good news is that, combining many of the tricks I could read, I managed to have my models run ;)

Garbage collecting

Generally, when you see these lines in the code it means that the person who wrote it was desperate to make it run while closely monitoring the memory usage of the script and combined tricks not to make sure everything was fitting into the memory. Usually, performing tasks in dedicated functions and trusting the garbage collector to do its job at the right time is enough. But sometimes you meet these del / garbage collector random invokations.

import gc
del model
gc.collect()
K.clear_session()

I did put these lines after every model.fit() I found. They did not help at all in my case.

Force eager evaluation

This one kind of worked for me. It slows down the training (3 times slower in my case), the memory keeps increasing for no reason, but much less. Just add the following argument in the model.compile() method :

model.compile( [...]
              run_eagerly=True)

model(x) instead of model.predict(x)

Some people mentioned it. It did not change a thing for me, but I wrote it that way. Be careful though, model(x) will return a tensorflow object while model.predict(x) will return a numpy object.

Run it in a dedicated script

Yes, kind of ugly. It does not solve the issue, but if you make your cross validation in a python script, itself being called from the terminal level, you can pass parameters using JSON and hope that each script won’t hit your memory limit.

In my case, I wrote the following class:

from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
import tensorflow as tf
import gc
import numpy as np

class NNModel:

    def __init__(self, architecture, epochs, loss="binary_crossentropy", optimizer="adam"):
        self._epochs = epochs
        self._loss = loss
        self._optimizer = optimizer
        self._architecture = architecture
        self._model = None

    def fit(self, X, y):

        gc.collect()
        tf.keras.backend.clear_session()
    
        self._model = self._model_factory(X.shape[1])

        X_tf = tf.convert_to_tensor(X, dtype=tf.float32)
        y_tf = tf.convert_to_tensor(y, dtype=tf.float32)

        self._model.fit(X_tf, y_tf, epochs=self._epochs)
        return self

    def _model_factory(self, input_dim):

        model = Sequential()

        architecture = self._architecture.copy()
        first_layer = architecture.pop(0)

        model.add(Dense(first_layer[0], input_dim=input_dim, activation=first_layer[1]))
        for layer in architecture:
            model.add(Dense(layer[0], activation=layer[1]))

        model.compile(loss=self._loss, 
                      optimizer=self._optimizer, 
                      run_eagerly=True,
                      metrics=['accuracy'])

        return model

    def predict(self, X):
        raise NotImplementedError

    def predict_proba(self, X):
        X_tf = tf.convert_to_tensor(X, dtype=tf.float32)
        res =  self._model(X_tf)
        res = np.hstack((1-res, res))
        return res

Which I can configure using a JSON that will contain the arguments of the class constructor:

{
  "epochs": 8,
  "architecture": [[ 12, "relu" ], [ 8, "relu" ], [ 1, "sigmoid" ]]
}

And then I invoke them with:

find ../models/ -name \*.json | xargs --max-args=1 python run_nn.py

So that I can run my different models while I am sure that the memory will be totally released between the execution of two scripts.

model.predict_on_batch

Quoting MProx from a git issue

I have managed to get around this error by using model.predict_on_batch() instead of model.predict(). This returns an object of type <class ‘tensorflow.python.framework.ops.EagerTensor’> - not a numpy array as claimed in the docs - but it can be cast by calling np.array(model.predict_on_batch(input_data)) to get the output I want.

Side note: I also noticed a similar memory leak problem with calling model.fit() in a loop, albeit with a slower memory accumulation, but this can be fixed in a similar way using model.train_on_batch().

I did not try this one, as segregating different models in different scripts and setting run_eagerly did the job.

Use tf-nightly

So, tf-nightly is built more or less every day, with the latest features and less tests. Many people claimed that the leak disapeared when using this library. But there are many versions, with potentially other bugs.

re install the 1.14 version

This bug has been around for a while, some tickets mention it from october 2019 and it is still present in the 2.4 version.

Conclusion

I look forward to this issue being solved.

Finding the index of the largest element in a list in OCaml

As far as I know, there is no implementation of argmax and argmin in the default ocaml library (or perhaps they could be called maxi for more consistency with respect to mapi). The following snippet solves it!

let argmax l =
  let rec aux max_index index max_value = function
    | [] -> max_index
    | h::t -> if h > max_value then aux index (index + 1) h t 
                         else aux max_index (index + 1) max_value t
  in 

  match l with 
  | [] -> 0
  | _ ->   aux 0 0 (List.hd l) l

Note that the defaut behavior with an empty list is to return 0 but it can be changed.

Python fast screenshots and locateOnScreen

Taking screenshots with Python is easy, however, the performance often seems to be an issue, depending on the packages you started with (see per example this question )

In my previous article (/reinforcement-learning/nintendo/reinforcement-learning-nintendo-nes-tutorial/), I noted that I would be limited by the method that was looking for an element on the screen (the Game Over) as often as possible.

Getting rid of bottlenecks is a fun thing to do as a developer. For an unexplained reason, I find it particularly satisfying. Python, with its plethora of libraries introduces many of them. So it is time for a quick tour of the possible solutions.

Benchmarks

I do not want to mess up the code of my previous article so it often is a good idea, when possible, to do the benchmarks in separate files, with similar inputs.

Screenshots

In the previous article, I relied on pyautogui. I realized it was built on pyscreeze, so I also tried this library. After some browsing, I learned that PIL also proposed this feature.

I discovered it after writing this article but d3dshot claims to be the fastest way to perform screenshots in Python. I’ll keep that in mind if I face new bottlenecks in the future, but let’s stick with the first 3 packages for now.

from PIL import ImageGrab
from Xlib import display, X
import io
import numpy as np
import pyautogui as pg
import pyscreeze
import time


REGION = (0, 0, 400, 400)


def timing(f):
    def wrap(*args, **kwargs):
        time1 = time.time()
        ret = f(*args, **kwargs)
        time2 = time.time()
        print('{:s} function took {:.3f} ms'.format(
            f.__name__, (time2-time1)*1000.0))

        return ret
    return wrap


@timing
def benchmark_pyautogui():
    return pg.screenshot(region=REGION)


@timing
def benchmark_pyscreeze():
    return pyscreeze.screenshot(region=REGION)


@timing
def benchmark_pil():
    return np.array(ImageGrab.grab(bbox=REGION))


if __name__ == "__main__":

    im_pyautogui = benchmark_pyautogui()
    im_pyscreeze = benchmark_pyscreeze()
    im_pil =       benchmark_pil()

As expected, pyscreeze is slightly faster than pyautogui, but PIL beats them by a factor of 10!

benchmark_pyautogui function took 157.669 ms
benchmark_pyscreeze function took 152.185 ms
benchmark_pil function took 13.198 ms

Locate an element on screen

import pyautogui as pg
import numpy as np
import cv2 as cv
from PIL import ImageGrab, Image
import time

REGION = (0, 0, 400, 400)
GAME_OVER_PICTURE_PIL = Image.open("./balloon_fight_game_over.png")
GAME_OVER_PICTURE_CV = cv.imread('./balloon_fight_game_over.png')


def timing(f):
    def wrap(*args, **kwargs):
        time1 = time.time()
        ret = f(*args, **kwargs)
        time2 = time.time()
        print('{:s} function took {:.3f} ms'.format(
            f.__name__, (time2-time1)*1000.0))

        return ret
    return wrap


@timing
def benchmark_pyautogui():
    res = pg.locateOnScreen(GAME_OVER_PICTURE_PIL,
                            grayscale=True,  # should provied a speed up
                            confidence=0.8,
                            region=REGION)
    return res is not None


@timing
def benchmark_opencv_pil(method):
    img = ImageGrab.grab(bbox=REGION)
    img_cv = cv.cvtColor(np.array(img), cv.COLOR_RGB2BGR)
    res = cv.matchTemplate(img_cv, GAME_OVER_PICTURE_CV, method)
    # print(res)
    return (res >= 0.8).any()


if __name__ == "__main__":

    im_pyautogui = benchmark_pyautogui()
    print(im_pyautogui)

    methods = ['cv.TM_CCOEFF', 'cv.TM_CCOEFF_NORMED', 'cv.TM_CCORR',
               'cv.TM_CCORR_NORMED', 'cv.TM_SQDIFF', 'cv.TM_SQDIFF_NORMED']


    # cv.TM_CCOEFF_NORMED actually seems to be the most relevant method
    for method in methods:
        print(method)
        im_opencv = benchmark_opencv_pil(eval(method))
        print(im_opencv)

And the results!

benchmark_pyautogui function took 175.712 ms
False
cv.TM_CCOEFF
benchmark_opencv_pil function took 21.283 ms
True
cv.TM_CCOEFF_NORMED
benchmark_opencv_pil function took 23.377 ms
False
cv.TM_CCORR
benchmark_opencv_pil function took 20.465 ms
True
cv.TM_CCORR_NORMED
benchmark_opencv_pil function took 25.347 ms
False
cv.TM_SQDIFF
benchmark_opencv_pil function took 23.799 ms
True
cv.TM_SQDIFF_NORMED
benchmark_opencv_pil function took 22.882 ms
True

pyautogui, once again, is super slow. However, the cv based methods are an order of magnitude lower, though some see “Game Over” when it is not here. I made sure that TM_CCOEFF_NORMED also returned True when the element was in the region before updating the following class:

from PIL import Image, ImageGrab
from helpers import fast_locate_on_screen
import cv2 as cv
import numpy as np
import os
import pyautogui as pg
import time


class BalloonTripEnvironment:

    def __init__(self):
        self._game_filepath = "../games/BalloonFight.zip"
        self._region = (10,10,300,300)
        self._game_over_picture = cv.imread("./balloon_fight_game_over.png")

    def _custom_press_key(self, key_to_press):
        pg.keyDown(key_to_press)
        pg.keyUp(key_to_press)

    def turn_nes_up(self):
        os.system(f"fceux {self._game_filepath} &")
        time.sleep(1)

    def start_trip(self):
        keys_to_press = ['s', 's', 'enter']
        for key_to_press in keys_to_press:
            self._custom_press_key(key_to_press)

    def observe_state(self):
        return pg.screenshot(region=self._region)

    def capture_state_as_png(self, filename):
        pg.screenshot(filename, region=self._region)

    def step(self, action):
        self._custom_press_key(action)
       
    def is_game_over(self):
        img = ImageGrab.grab(bbox=self._region)
        img_cv = cv.cvtColor(np.array(img), cv.COLOR_RGB2BGR)
        res = cv.matchTemplate(img_cv, self._game_over_picture, eval('cv.TM_CCOEFF_NORMED'))
        return (res >= 0.8).any()

    def rage_quit(self):
        os.system("pkill fceux")
        exit()


if __name__ == "__main__":

    env = BalloonTripEnvironment()

    env.turn_nes_up()
    time.sleep(10)
    env.step('enter')
    env.start_trip()
    print("Started")
    is_game_over = False
    i = 0

    while not is_game_over:
        i += 1
        is_game_over = env.is_game_over()
        env.step('f')

    print("Game over!")

    env.rage_quit()

Below, you can see the GIFs of the loop. On the left, the previous version, where each call to is_game_over() needed so much time that the “agent” could not press the button often enough. Now the frequency is high enough, the “agent” just bounces on the top of the screen until it dies!

Before After

Fig. 1: On the left, the previous version of is_game_over(), and the new version, is on the right (note that the beginning of the GIF is just the demo mode of the game.

Hope you liked it, stay tuned for the next articles if you like the project!

Reinforcement learning Nintendo NES Tutorial (Part 1)

Reinforcement learning is an amazingly fun application of machine learning to a variety of tasks! I have seen plenty of videos around of reinforcement learning applied to video games, but very few tutorials. In this series (it may take a while and I have not finished the project as I write this article, it may not even work!) we will apply it to a game I love: Balloon Fight, with explanations that will, hopefully, make the reader able to reuse to different games!

In a nutshell, reinforcement learning consists in teaching a computer to act in a environment (here, a game) the best way it can, without having to describe the rules of the environment (the game) to the algorithm. The way it works is that the algorithm will try different behaviors thousands of times and improve every time, learning from its mistakes.

In a more precise way, the game is splitted in a discrete set of steps. At each step the agent (or algorithm) will observe the environment and decide of an action to take (here, a key to press) and observe, again, the environment and the rewards it got from taking the previous action. When the game is over, the agent restarts to play, but with the accumulated knowledge of its previous experiences.

Enabling Python to communicate with the game

What you will need

We will teach our algorithm to play Balloon Fight. More precisely, the Balloon Trip mode (as it will save some efforts, as we will see later).

Balloon Trip

Fig. 1: The Balloon Trip (me playing)

I have the following directory structure:

.
├── games
│   └── BalloonFight.zip
├── src
│   ├── balloon_trip_environment.py
│   ├── intro.py
│   └── balloon_fight_game_over.png
└── TODO.md

BalloonFight.zipis a ROM of the game.

intro.py and balloon_trip_environment.py will be explored below.

balloon_fight_game_over.png will be created and explained later.

Python packages

Regarding the packages, we will start with Python default packages and pyautogui which enables interaction with other windows. This may have some dependencies, but I trust my reader to be able to fix all of them :)

External dependencies

We will use FCEUX as an emulator.

The starting point (intro.py)

Let’s see how Python can interact with any window on the screen, just like a human being! The script below will:

  • launch FCEUX with the game
  • press a sequence of buttons (or keys)
  • take a screenshot of a region of the screen
import pyautogui as pg
import os
import time

game_filepath = "../games/BalloonFight.zip"
os.system(f"fceux {game_filepath} &")

time.sleep(1)

keys_to_press = ['s', 's', 'enter']

for key_to_press in keys_to_press:
    pg.keyDown(key_to_press)
    pg.keyUp(key_to_press)

time.sleep(2)

im = pg.screenshot("./test.png", region=(0,0, 300, 400))
print(im)

The main things to notice are:

os.system(f"fceux {game_filepath} &")

Note the & at the end of the command. Without it, Python would be “stuck” waiting for the return of os.system(). With this, Python keeps executing the following lines.

pg.keyDown(key_to_press)
pg.keyUp(key_to_press)

There is a .press() method with pyautogui, but for some reason, it did not work with the emulator.

im = pg.screenshot("./test.png", region=(0,0, 300, 400))

Will be able to capture parts of the screen while the emulator is running, therefore allowing Python to “communicate” with the window.

Balloon Trip

Fig. 2: The output of pg.screenshot() (This will be improved later)

Reinforcement learning

Micro crash course (or the basics we need for now)

An important element in reinforcement learning is the following loop (env refers to the environment, or the state of the game at each instant, while agent will be able to press buttons and interact with the environment).

for episode in range(N_EPISODES):

        env.reset()
        episode_reward = 0

        done = False
        while not done:
            current_state = env.observe_state()

            action = agent.get_action(current_state)

            new_state, reward, done = env.step(action)

Basically, it shows a clear separation between the environment and the agent. For now, let’s just implement the environment.

The environment

For now, we only need to describe the environment in a convenient way. It needs to expose a observe_state(), next(action) and is_game_over(), making sure the agent can continue acting.

import pyautogui as pg
import os
import time
from helpers import fast_locate_on_screen
from PIL import Image


class BalloonTripEnvironment:

    def __init__(self):
        self._game_filepath = "../games/BalloonFight.zip"
        self._region = (10,10,300,300)
        self._game_over_picture = Image.open("./balloon_fight_game_over.png")

    def _custom_press_key(self, key_to_press):
        pg.keyDown(key_to_press)
        pg.keyUp(key_to_press)

    def turn_nes_up(self):
        os.system(f"fceux {self._game_filepath} &")
        time.sleep(1)

    def start_trip(self):
        keys_to_press = ['s', 's', 'enter']
        for key_to_press in keys_to_press:
            self._custom_press_key(key_to_press)

    def observe_state(self):
        return pg.screenshot(region=self._region)

    def capture_state_as_png(self, filename):
        pg.screenshot(filename, region=self._region)

    def step(self, action):
        self._custom_press_key(action)
       
    def is_game_over(self):
        res = pg.locateOnScreen(self._game_over_picture, 
                                grayscale=True, # should provied a speed up
                                confidence=0.8,
                                region=self._region)
        return res is not None

    def rage_quit(self):
        os.system("pkill fceux")
        exit()

Some details

The class above is an adaptation of the introduction script, in a more object oriented way. The main detail is the following:

def is_game_over(self):
    res = pg.locateOnScreen(self._game_over_picture, 
                            grayscale=True, # should provied a speed up
                            confidence=0.8,
                            region=self._region)
    return res is not None

It looks for the image below to make sure that we are not seeing the “Game Over” screen.

Balloon Trip

Fig. 3: The pattern we will look for to detect the game over.

Watch it in action

Let’s test it! f is just the A button of the NES, it will simply enable the balloon guy to go up. For the loop, we will test that the game is not over, and then the player will press the A button.

if __name__ == "__main__":

    env = BalloonTripEnvironment()

    env.turn_nes_up()
    env.start_trip()
    print("Started")
    is_game_over = False
    i = 0

    while not is_game_over:
        i += 1
        is_game_over = env.is_game_over()
        env.step('f')
        

    print("Game over!")

    env.rage_quit()

And tada!

Balloon Trip

Fig. 4: An agent, pumping on a regular basis

Next steps

We have the environment. Now, we will need to turn it as a matrix that will be used as “features” for reinforcement learning, this will be the topic of the next article. Once achieved, we will be able to jump to the deep learning part!

First issues

Programming without issues does not exist, at least in my world. Though the above works as expected, we notice that during the loop, the button was pressed only four times. After a more careful exam, I noticed that is_game_over() is the bottleneck.

If we want to be able to read the input on the screen, decide of the best action to take, we need to have much (much) more time between two screenshots. This will be the topic of an intermediate post, stay tuned if you liked it :)

Learning more

The following resources (sponsored URLs): Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow and Reinforcement Learning: Industrial Applications of Intelligent Agents will provide a good introduction to the topic, in Python, with the libraries I am going to use.