Acronyms of deep learning

Reading time ~4 minutes

When learning about deep learning, it can be quite confusing to discover boatloads of acronyms that readers are expected to know. In my quest to discovering it, I gathered plenty of them which I write here as a reference. Hope this will help :)


  1. AI - Artificial Intelligence: The simulation of human intelligence processes by machines, typically involving tasks such as learning, reasoning, and problem-solving.

  2. ML - Machine Learning: A subset of AI that provides systems with the ability to automatically learn and improve from experience without being explicitly programmed.

  3. DL - Deep Learning: A subset of machine learning that utilizes neural networks with multiple layers (hence “deep”) to extract higher-level features from raw data.

  4. NN - Neural Network: A computational model inspired by the structure and function of the human brain, consisting of interconnected nodes (neurons) organized in layers.

  5. CNN - Convolutional Neural Network: A type of neural network specifically designed for processing structured grid-like data, such as images or audio.

  6. RNN - Recurrent Neural Network: A type of neural network that processes sequential data by maintaining an internal state, allowing it to exhibit temporal dynamics.

  7. LSTM - Long Short-Term Memory: A type of recurrent neural network architecture that addresses the vanishing gradient problem, enabling better learning of long-term dependencies in sequential data.

  8. GRU - Gated Recurrent Unit: Another type of recurrent neural network architecture, similar to LSTM but with a simpler structure, making it more computationally efficient.

  9. GAN - Generative Adversarial Network: A type of neural network architecture consisting of two networks, a generator and a discriminator, trained adversarially to generate realistic synthetic data.

  10. DNN - Deep Neural Network: A neural network with multiple layers between the input and output layers, allowing it to learn complex representations of data.

  11. RL - Reinforcement Learning: A type of machine learning where an agent learns to make decisions by interacting with an environment to maximize some notion of cumulative reward.

  12. ReLU - Rectified Linear Unit: A commonly used activation function in deep learning that introduces non-linearity by outputting the input if it is positive and zero otherwise.


  1. SGD - Stochastic Gradient Descent: An optimization algorithm commonly used in training neural networks by iteratively updating the model parameters in the direction that minimizes the loss function.

  2. Adam (Adaptive Moment Estimation): Adam is an optimization algorithm used for training deep learning models. It combines the advantages of two other popular optimization algorithms, namely RMSprop and momentum. Adam computes adaptive learning rates for each parameter by estimating the first and second moments of the gradients. This adaptive learning rate helps Adam converge faster and more robustly compared to traditional stochastic gradient descent (SGD) methods.

  3. RMSprop (Root Mean Square Propagation): RMSprop is an optimization algorithm designed to address some of the limitations of traditional stochastic gradient descent (SGD) methods, particularly in dealing with vanishing and exploding gradients. RMSprop adaptively scales the learning rate for each parameter based on the magnitude of recent gradients. By maintaining a moving average of the squared gradients, RMSprop effectively normalizes the updates and improves the stability and convergence of the optimization process.

Famous networks

  1. ResNet (Residual Neural Network): A type of deep convolutional neural network architecture that introduced the concept of residual learning, where shortcut connections (skip connections) are added to the network to ease the training of very deep neural networks by mitigating the vanishing gradient problem.

  2. EfficientNet: A family of convolutional neural network architectures that achieve state-of-the-art performance on image classification tasks with significantly fewer parameters and FLOPs (floating-point operations) compared to traditional models. EfficientNet achieves this efficiency by scaling the network width, depth, and resolution in a balanced manner using compound scaling.

  3. VGG (Visual Geometry Group): A convolutional neural network architecture proposed by the Visual Geometry Group at the University of Oxford. VGG is characterized by its simplicity and uniform architecture, consisting of several convolutional layers followed by max-pooling layers, with fully connected layers at the end.

  4. ImageNet: A large-scale dataset containing over 14 million images annotated with high-level semantic labels, designed for training and evaluating image classification models. ImageNet has been instrumental in advancing the field of computer vision, serving as the benchmark dataset for numerous deep learning models.

  5. DenseNet (Densely Connected Convolutional Network): A type of convolutional neural network architecture where each layer is connected to every other layer in a feed-forward fashion. DenseNet introduces dense connectivity patterns between layers, allowing feature maps from all preceding layers to be directly input into subsequent layers. This dense connectivity encourages feature reuse, facilitates gradient flow, and significantly reduces the number of parameters, leading to improved parameter efficiency and performance compared to traditional architectures.


  1. GPU - Graphics Processing Unit: A specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images for display on electronic devices. In deep learning, GPUs are commonly used for their parallel processing capabilities, which significantly speed up neural network training.

  2. TPU - Tensor Processing Unit: An application-specific integrated circuit (ASIC) developed by Google specifically for neural network processing. TPUs are optimized for TensorFlow and are designed to accelerate both training and inference tasks.

Learning more

Deep learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville is an excellent introduction to the topic. The algorithms and mathematics are presented without any code so it will not be outdated as soon as new breaking change is introduced in the main packages ;) note that this is a sponsored link.

OCaml List rev_map vs map

If you found this page, you are probably very familiar with OCaml already!So, OCaml has a ````map```` function whose purpose is pretty cl...… Continue reading

How to optimize PyTorch code ?

Published on March 17, 2024

AI with OCaml : the tic tac toe game

Published on September 24, 2023