Benchmark Fossil Demand Forecasting Challenge

Reading time ~1 minute


Zindi is hosting the Fossil Demand Forecasting Challenge, where competitors have to predict the amount of units sold for various products.

Note that the rules state that the metric to optimize is not is usual squared error, but instead, the absolute error:

The evaluation metric for this challenge is Mean Absolute Error.

All the models relying on the minimization of least squares (usual regressions, random forests with default parameters) are likely to perform poorly since they will return the mean over subsambles, while minimizing the absolute error returns the mean of the sample.

In a mathematical language:

A simple benchmark

With that knowledge, the benchmark below simply returns, for each product, the median of units sold over the year 2021. The score should be around 192xxx

import numpy as np
import pandas as pd
import random

train = pd.read_csv("../raw_data/Train.csv")
sku_names = train["sku_name"].unique()
train["year_month"] = train["year"].astype(
    str) + "/" + train["month"].astype(str)
train["date"] = pd.to_datetime(train["year_month"])
train_recent = train[train["date"] >= "2021/01"]

medians = train_recent.groupby("sku_name")["sellin"].median().to_dict()

test = pd.read_csv("../raw_data/Test.csv")
sku_names_test = test["sku_name"].unique()

missing = {}
for sku_name_test in sku_names_test:
    missing[sku_name_test] = 0

test["Target"] = test["sku_name"].replace(medians).replace(missing).astype(int)

test["Item_ID"] = test["sku_name"] + "_" + \
    test["month"].astype(str) + "_" + test["year"].astype(str)
test[["Item_ID", "Target"]].to_csv("./submission_.csv", index=False)

Best books on Fermat's last theorem

I haven't published many articles these days, the main reason being that I got attracted into the history of Fermat's last theorem. This ...… Continue reading

Random number generation in Cython

Published on February 25, 2022

Vim for datascience

Published on January 09, 2022