Handling Sequential Data : Recurrent Neueal Network

Steven Willers
Feb 8, 2024
6 min read

Updated: Feb 13, 2024

In this chapter , we would go over the basics of 'Recurrent Neural Network ' and more of its fundamental along with introducing the building blocks of them from importing the standard libraries like 'tensorflow' and 'pytorch' , before we start we should have a quick recape of what we already knew .

Percepton

An artificial neural network, often shortened to "neural network" or "neural net," is a powerful tool in the field of artificial intelligence inspired by the structure and function of the human brain . A single neural network (or perceptron) is the base concept of neural network, it consists:

Input layer: Neural networks receive input data, which is passed through the first layer of neurons.
Weighted sum: Each neuron in this layer receives the input values, and these values are multiplied by corresponding weights, reflecting the importance of each input to that neuron.
Bias: An additional value called bias is added to the weighted sum, allowing the neuron to learn offsets in the data.
Activation function: The resulting sum is then fed into the activation function, which applies a non-linear transformation to determine the neuron's output.
Output layer: These neuron outputs are either passed to the next layer or become the final network output, depending on the network's architecture.

What are RNNs?

An RNN consists of recurrent units, which are cells that maintain a hidden state. Each unit takes an input at a particular time step and produces an output and a hidden state. The hidden state from the previous time step is also fed into the current time step, allowing the network to capture temporal dependencies .

Initial State: At the beginning of sequence processing, the RNN receives the first input along with an initial hidden state. This initial hidden state is typically initialized to zeros or learned as a parameter of the network.

Input Processing: Similar to a traditional neural network, the RNN takes the input and applies a linear transformation. This involves multiplying the input by a weight matrix and adding a bias term.

Hidden State Update: The key difference in an RNN occurs in the hidden state update step. In addition to processing the current input, the RNN also considers the previous hidden state. It combines the current input with the previous hidden state using another set of weights. We can express it as :

Non-linearity: Like traditional neural networks, an activation function is applied to the output of the hidden state update to introduce non-linearity. Common activation functions include the sigmoid, tanh, or ReLU functions.

Output: Depending on the task, the RNN may produce an output at each time step or only at the final time step. The output can be obtained by applying another linear transformation to the hidden state followed by an activation function if necessary.

Recurrent Connection: After producing the output and updating the hidden state, the RNN moves on to the next time step in the sequence. The updated hidden state from the current time step becomes the input to the next time step, creating a recurrent connection that allows the network to capture temporal dependencies.

Sequence Processing: This process repeats for each time step in the sequence until all inputs have been processed. The final hidden state may be used for tasks such as sequence classification or passed through additional layers for further processing.

Benefits of RNNs

Recurrent Connection: The key feature of an RNN is its recurrent connection, which allows information to persist over time. At each time step, the current input and the previous hidden state are combined to produce the current hidden state. This process is repeated for each time step in the sequence.

Like other neural networks, an RNN is trained using backpropagation and gradient descent. However, because of the recurrent connections, the gradient calculation involves backpropagating through time (BPTT), which can lead to the vanishing or exploding gradient problem.

There are several variations of RNNs, including vanilla RNNs, Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs). These variations address the shortcomings of basic RNNs, such as difficulty in capturing long-range dependencies and gradient instability.

RNNs are used in various tasks such as sequence generation (text generation, music generation), sequence classification (sentiment analysis, named entity recognition), and sequence-to-sequence learning (machine translation, speech recognition).

Before moving further we would cover somewhat basics of 'backpropagation' and 'Gated Recurrent Units' and LSTMs.

*Backpropagation: In the context of RNNs, backpropagation involves the process of propagating error gradients backwards through time, allowing the network to learn from past interactions and adjust its parameters accordingly. Unlike traditional feedforward neural networks, where the input and output are fixed, RNNs have connections that form directed cycles, allowing them to exhibit temporal dynamics and capture dependencies over time.

*Gated Recurrent Units: GRUs (or Gated Recurrent Units ) have two gates: an update gate and a reset gate. These gates determine how much of the past information to forget and how much of the new information to incorporate at each time step. GRUs are generally simpler than LSTMs and have fewer parameters, making them computationally less expensive to train and deploy. They have been shown to perform well on a wide range of sequential tasks and are often preferred in scenarios where computational resources are limited.

*LSTMs :LSTMs have three gates: an input gate, a forget gate, and an output gate. Additionally, LSTMs have a cell state that allows them to maintain long-term information over multiple time steps. This added complexity enables LSTMs to model more intricate dependencies in the data and to capture longer-range correlations. LSTMs are particularly effective in tasks that require modeling long-term dependencies, such as machine translation and speech recognition.

Voice Translation Scenario

Say we have to create a model which have to translate English voice to French without affecting the pitch and depth of the model voice[ to simplify it we have test data (Supervied Learning) of what text does the audio contain (*note we have test data as text not the voice) and we have to change it into French. But in language translation, we can came with a problem the different phrase s translates up to different meanings depending on it sequence it came in the sentences to create a model which can deal with sequential data , let's look how RNNs can help us:

Sequential Processing : Text data is inherently sequential, with each word depending on the previous words for context. Similarly, audio data is also sequential, with each audio sample depending on the previous samples. RNNs are well-suited for processing sequential data because they can capture dependencies over time.

[*The RNN can take the input text one word at a time, encoding each word into a fixed-size vector representation. This is done by processing each word sequentially through the RNN, with the hidden state capturing information about the previous words in the sequence. [¹] .Lower Level Programming like C++ can't achieve this efficiency one should use a higher level programming language (such as python). ]

Context Understanding: As the RNN processes the input text, it learns to understand the context of the text by capturing dependencies between words. For example, in a translation task, the RNN can learn that the translation of a word may depend on the words that precede it in the input text.

Audio Generation: Once the input text has been encoded into a sequence of hidden states, the RNN can then decode these hidden states into the corresponding audio waveform. This is done by feeding the hidden states into another part of the RNN or a separate neural network that generates the audio.

Traning: During training, the RNN learns to translate text into audio by minimizing a loss function that measures the difference between the generated audio and the target audio. This is done using techniques like backpropagation through time (BPTT), where the gradients are computed recursively over the sequence.

Testing: During testing or inference, the trained RNN can be used to translate arbitrary input text into audio. The RNN processes the input text sequentially, generates the corresponding audio waveform, and outputs it for listening.

To see the code you can visit my GitHub page .

I also pinned it right here .

Untill Next Sunday(Triduum):

-Thanks for your extreme patience and attention 😄