mattsdev.com

Deep learning is a branch of machine learning within artificial intelligence that models how the human brain works. Neural networks consists of multiple layers to understand more complex patterns in data. Neural networks mimic the way neurons, in the human brain, transmit information to one another.

Structure of a Deep Learning Network

A deep learning network is composed of multiple layers that process data in a hierarchical way. The network’s architecture is generally divided into three main layer types:

Input layer: Raw data is fed into the input layer
Hidden layers: data is transformed through each layer
Output layer: Final processed data reaches the output layer with the desired predictions or classifications.

Input Layer

The input layer is the first layer of the network and serves as the entry point for the data. Each node (conceptual neuron) is the input layer represents a feature or a dimension of the data.

The input layer receives the raw input data and passes it to the next layer without performing any computations.

Examples:

In image recognition, each pixel value of an impact might be fed into the input layer nodes
In natural language processing, word embeddings or text tokens are input into the network, with each token being passed to single node.

The input layer does no processing but simply distributes the raw data to the following layers within the network.

Hidden Layers

The hidden layers are the computational layers within the network. They are called “hidden” because thy are not directly exposed to the input or the output - these sit between these two layers and perform transformations to the data.

Each hidden layer processes the input from the previous layer, applies weights and biases, and passes the result through an activation function.

A network can have one or many hidden layers, each can have many nodes.

Types of hidden layers:

Fully connected Layers (Dense Layers): Every node is connected to every node in the proceeding and succeeding layers.
Convolution Layers: Used in CNNs to detect spatial hierarchies in data, such as edges and textures in images
Pooling Layers: Reduce the spatial dimensions of the data to decrease the computational load and control overfitting
Recurrent Layers: Used in RNNs to process sequential data by maintaining a state that captures information about previous inputs.

Activation Functions

An activation function introduces ‘non-linearity’ into the network, allowing it to learn complex patterns.

Output Layer

The output layer is the final layer that provides the desired prediction or classification.

Key characteristics of Deep Learning

Artificial Neural Networks: Deep learning models are built on neural networks with many layers. Each layer extracts higher-level features from the raw input data, allowing the model to learn intricate patterns.
Feature Hierarchy: Unlike traditional machine learning, which requires an element of manual feature extraction, deep learning models automatically discover the representations needed for classification or detection.
Large Data Requirements: Deep learning requires a large, unstructured, dataset. The more data available to the model the more effective the model can be at learning and process new inputs.
High Computational Power: Training deep learning models requires significantly more computation resources, often utilising GPUs (Graphics Processing Units) or TPUs (Tensor Processing Units) to handle the complex mathematical calculations that form the underlying algorithms.

Common Architectures of Deep Learning

Convolutional Neural Networks (CNNs): Primarily used for image and video recognition, CNNs can automatically learn spatial hierarchies of features through backpropagation.
Recurrent Neural Networks (RNNs): Designed for sequence data, such as time series or natural language, RNNs can maintain contextual information through their recurrent connections.
Long Short-Term Memory Networks (LSTMs): A type of RNN that can learn long-term dependencies, making them effective for tasks like language modelling and translation.
Transformers: A more recent architecture that relies on self-attention mechanisms, transformers have become the foundation for many advance language models.

Why does deep learning matter?

Deep Learning has created a revolution in the field of artificial intelligence by achieving unprecedented levels of accuracy in tasks that, previously, were too challenging for machines. It’s ability to understand unstructured data has allowed for the reduction of cost and improved the speed at which these more intelligent models have come online.

Challenges with Deep Learning:

Huge amounts of data are required to make deep learning models effective, this raises ethical and legal considerations.
Deep learning models have been considered “black boxes” as it is difficult to understand how the decisions the model has taken to arrive at an output.
Models can be too tailored to its training data and can often be susceptible to Overfitting.