Benchmark Fossil Demand Forecasting Challenge

Reading time ~1 minute

Introduction

Zindi is hosting the Fossil Demand Forecasting Challenge, where competitors have to predict the amount of units sold for various products.

Note that the rules state that the metric to optimize is not is usual squared error, but instead, the absolute error:

The evaluation metric for this challenge is Mean Absolute Error.

All the models relying on the minimization of least squares (usual regressions, random forests with default parameters) are likely to perform poorly since they will return the mean over subsambles, while minimizing the absolute error returns the mean of the sample.

In a mathematical language:

\[\arg\min_x \sum_{j=1}^n (x_j - x)^2 = \bar{x},\] \[\arg\min_x \sum_{j=1}^n |x_j - x| = \mathrm{med}(x_1, \dots, x_j)\]

A simple benchmark

With that knowledge, the benchmark below simply returns, for each product, the median of units sold over the year 2021. The score should be around 192xxx

import numpy as np
import pandas as pd
import random
random.seed(0)
np.random.seed(0)


train = pd.read_csv("../raw_data/Train.csv")
sku_names = train["sku_name"].unique()
train["year_month"] = train["year"].astype(
    str) + "/" + train["month"].astype(str)
train["date"] = pd.to_datetime(train["year_month"])
train_recent = train[train["date"] >= "2021/01"]


medians = train_recent.groupby("sku_name")["sellin"].median().to_dict()

test = pd.read_csv("../raw_data/Test.csv")
sku_names_test = test["sku_name"].unique()

missing = {}
for sku_name_test in sku_names_test:
    missing[sku_name_test] = 0


test["Target"] = test["sku_name"].replace(medians).replace(missing).astype(int)

test["Item_ID"] = test["sku_name"] + "_" + \
    test["month"].astype(str) + "_" + test["year"].astype(str)
test[["Item_ID", "Target"]].to_csv("./submission_.csv", index=False)

How to optimize PyTorch code ?

Optimizing some deep learning code may seem quite complicated. After all, [PyTorch](https://pytorch.org/) is already super optimized so w...… Continue reading

Acronyms of deep learning

Published on March 10, 2024

AI with OCaml : the tic tac toe game

Published on September 24, 2023