From 01a38c28c5035e27a6b375aa6a5235bf823f9982 Mon Sep 17 00:00:00 2001
From: "Christine P. Chai" <star1327p@gmail.com>
Date: Thu, 9 Oct 2025 10:42:27 -0700
Subject: [PATCH 1/5] DOC: traini -> train

---
 ...utorial-deep-reinforcement-learning-with-pong-from-pixels.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md b/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
index 1598e572..12a789ef 100644
--- a/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
+++ b/content/tutorial-deep-reinforcement-learning-with-pong-from-pixels.md
@@ -552,7 +552,7 @@ while episode_number < max_episodes:
 
 A few notes:
 
-- If you have previously run an experiment and want to repeat it, your `Monitor` instance may still be running, which may throw an error the next time you try to traini the agent. Therefore, you should first shut down `Monitor` by calling `env.close()` by uncommenting and running the cell below:
+- If you have previously run an experiment and want to repeat it, your `Monitor` instance may still be running, which may throw an error the next time you try to train the agent. Therefore, you should first shut down `Monitor` by calling `env.close()` by uncommenting and running the cell below:
 
 ```python
 # env.close()

From 2e90370c34cfc225df0a32a5a1d5b2fe043b0072 Mon Sep 17 00:00:00 2001
From: "Christine P. Chai" <star1327p@gmail.com>
Date: Thu, 9 Oct 2025 10:45:42 -0700
Subject: [PATCH 2/5] DOC: Correct typos in mooreslaw-tutorial.md

---
 content/mooreslaw-tutorial.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/content/mooreslaw-tutorial.md b/content/mooreslaw-tutorial.md
index 42c0de15..8dc7417b 100644
--- a/content/mooreslaw-tutorial.md
+++ b/content/mooreslaw-tutorial.md
@@ -106,7 +106,7 @@ $B_M=-675.4$
 
 Since the function represents Moore's law, define it as a Python
 function using
-[`lambda`](https://docs.python.org/3/library/ast.html?highlight=lambda#ast.Lambda)
+[`lambda`](https://docs.python.org/3/library/ast.html?highlight=lambda#ast.Lambda):
 
 ```{code-cell}
 A_M = np.log(2) / 2
@@ -156,7 +156,7 @@ The extra options below will put the data in the desired format:
 
 * `delimiter = ','`: specify delimeter as a comma ',' (this is the default behavior)
 * `usecols = [1,2]`: import the second and third columns from the csv
-* `skiprows = 1`: do not use the first row, because its a header row
+* `skiprows = 1`: do not use the first row, because it's a header row
 
 ```{code-cell}
 data = np.loadtxt("transistor_data.csv", delimiter=",", usecols=[1, 2], skiprows=1)
@@ -282,7 +282,7 @@ In the next plot, use the
 [`fivethirtyeight`](https://matplotlib.org/3.1.1/gallery/style_sheets/fivethirtyeight.html)
 style sheet.
 The style sheet replicates
-https://fivethirtyeight.com elements. Change the matplotlib style with
+[https://fivethirtyeight.com](https://fivethirtyeight.com) elements. Change the matplotlib style with
 [`plt.style.use`](https://matplotlib.org/3.3.2/api/style_api.html#matplotlib.style.use).
 
 ```{code-cell}
@@ -334,7 +334,7 @@ option,
 to increase the transparency of the data. The more opaque the points
 appear, the more reported values lie on that measurement. The green $+$
 is the average reported transistor count for 2017. Plot your predictions
-for $\pm\frac{1}{2}~years.
+for $\pm\frac{1}{2}$ years.
 
 ```{code-cell}
 transistor_count2017 = transistor_count[year == 2017]
@@ -386,7 +386,7 @@ array using `np.loadtxt`, to save your model use two approaches
 ### Zipping the arrays into a file
 Using `np.savez`, you can save thousands of arrays and give them names. The
 function `np.load` will load the arrays back into the workspace as a
-dictionary. You'll save a five arrays so the next user will have the year,
+dictionary. You'll save five arrays so the next user will have the year,
 transistor count, predicted transistor count,  Gordon Moore's
 predicted count, and fitting constants. Add one more variable that other users can use to
 understand the model, `notes`.

From 07acc59dc0134a60aabdfe160c24ec3e68463934 Mon Sep 17 00:00:00 2001
From: "Christine P. Chai" <star1327p@gmail.com>
Date: Thu, 9 Oct 2025 10:53:20 -0700
Subject: [PATCH 3/5] DOC: Correct punctuation usage in Sentiment Analysis
 tutorial

---
 content/tutorial-nlp-from-scratch.md | 38 +++++++++++++++-------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/content/tutorial-nlp-from-scratch.md b/content/tutorial-nlp-from-scratch.md
index 67b3dce5..49a84ed5 100644
--- a/content/tutorial-nlp-from-scratch.md
+++ b/content/tutorial-nlp-from-scratch.md
@@ -104,7 +104,7 @@ We made sure to include different demographics in our data and included a range
 ## 2. Preprocess the datasets
 >Preprocessing data is an extremely crucial step before building any Deep learning model, however in an attempt to keep the tutorial focused on building the model, we will not dive deep into the code for preprocessing. Given below is a brief overview of all the steps we undertake to clean our data and convert it to its numeric representation. 
 
-1. **Text Denoising** : Before converting your text into vectors, it is important to clean it and remove all unhelpful parts a.k.a the noise from your data by converting all characters to lowercase, removing html tags, brackets and stop words (words that don't add much meaning to a sentence). Without this step the dataset is often a cluster of words that the computer doesn't understand. 
+1. **Text Denoising** : Before converting your text into vectors, it is important to clean it and remove all unhelpful parts a.k.a. the noise from your data by converting all characters to lowercase, removing html tags, brackets and stop words (words that don't add much meaning to a sentence). Without this step the dataset is often a cluster of words that the computer doesn't understand. 
 
 
 2. **Converting words to vectors** : A word embedding is a learned representation for text where words that have the same meaning have a similar representation. Individual words are represented as real-valued vectors in a predefined vector space. GloVe is an unsupervised algorithm developed by Stanford for generating word embeddings by generating global word-word co-occurence matrix from a corpus. You can download the zipped files containing the embeddings from [the GloVe official website](https://nlp.stanford.edu/projects/glove/). Here you can choose any of the four options for different sizes or training datasets. We have chosen the least memory consuming embedding file. 
@@ -390,7 +390,7 @@ imdb_train = data.fetch('imdb_train.txt')
 imdb_test = data.fetch('imdb_test.txt')
 ```
 
-Instantiate the` TextPreprocess` class to perform various operations on our datasets:
+Instantiate the `TextPreprocess` class to perform various operations on our datasets:
 
 ```python
 textproc = TextPreprocess()
@@ -421,7 +421,7 @@ y_test = test_df['sentiment'].to_numpy()[0:1000]
 ```
 
 The same process is applicable on the collected speeches:
-> Since we will be performing paragraph wise sentiment analysis on each speech further ahead in the tutorial, we'll need the punctuation marks to split the text into paragraphs, hence we refrain from removing their punctuation marks at this stage
+> Since we will be performing paragraph wise sentiment analysis on each speech further ahead in the tutorial, we'll need the punctuation marks to split the text into paragraphs, hence we refrain from removing their punctuation marks at this stage.
 
 ```python
 speech_data_path = 'tutorial-nlp-from-scratch/speeches.csv'
@@ -444,13 +444,13 @@ emb_matrix = textproc.loadGloveModel(emb_path)
 ## 3. Build the Deep Learning Model
  It is time to start implementing our LSTM! You will have to first familiarize yourself with some high-level concepts of the basic building blocks of a deep learning model. You can refer to the [Deep learning on MNIST from scratch tutorial](https://numpy.org/numpy-tutorials/content/tutorial-deep-learning-on-mnist.html) for the same. 
 
-You will then learn how a Recurrent Neural Network differs from a plain Neural Network and what makes it so suitable for processing sequential data. Afterwards, you will construct the building blocks of a simple deep learning model in Python and NumPy and train it to learn to classify the sentiment of a piece of text as positive or negative with a certain level of accuracy
+You will then learn how a Recurrent Neural Network differs from a plain Neural Network and what makes it so suitable for processing sequential data. Afterwards, you will construct the building blocks of a simple deep learning model in Python and NumPy and train it to learn to classify the sentiment of a piece of text as positive or negative with a certain level of accuracy.
 
 ### Introduction to a Long Short Term Memory Network
 
 In a [Multilayer perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron) (MLP), the information only moves in one direction — from the input layer, through the hidden layers, to the output layer. The information moves straight through the network and never takes the previous nodes into account at a later stage. Because it only considers the current input, the features learned are not shared across different positions of the sequence. Moreover, it cannot process sequences with varying lengths.
 
-Unlike an MLP, the RNN was designed to work with sequence prediction problems.RNNs introduce state variables to store past information, together with the current inputs, to determine the current outputs. Since an RNN shares the learned features with all the data points in a sequence regardless of its length, it is capable of processing sequences with varying lengths.  
+Unlike an MLP, the RNN was designed to work with sequence prediction problems. RNNs introduce state variables to store past information, together with the current inputs, to determine the current outputs. Since an RNN shares the learned features with all the data points in a sequence regardless of its length, it is capable of processing sequences with varying lengths.  
 
 The problem with an RNN however, is that it cannot retain long-term memory because the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network’s recurrent connections. This shortcoming is referred to as the vanishing gradient problem. Long Short-Term Memory (LSTM) is an RNN architecture specifically designed to address the [vanishing gradient problem](https://en.wikipedia.org/wiki/Vanishing_gradient_problem).
 
@@ -462,7 +462,7 @@ The problem with an RNN however, is that it cannot retain long-term memory becau
 In the above gif, the rectangles labeled $A$ are called `Cells` and they are the **Memory Blocks** of our LSTM network. They are responsible for choosing what to remember in a sequence and pass on that information to the next cell via two states called the `hidden state` $H_{t}$ and the `cell state` $C_{t}$ where $t$ indicates the time-step. Each `Cell` has dedicated gates which are responsible for storing, writing or reading the information passed to an LSTM. You will now look closely at the architecture of the network by implementing each mechanism happening inside of it.
 
 
-Lets start with writing a function to randomly initialize the parameters which will be learned while our model trains
+Lets start with writing a function to randomly initialize the parameters which will be learned while our model trains:
 
 ```python
 def initialise_params(hidden_dim, input_dim):
@@ -641,7 +641,7 @@ def forward_prop(X_vec, parameters, input_dim):
 After each forward pass through the network, you will implement the `backpropagation through time` algorithm to accumulate gradients of each parameter over the time steps. Backpropagation through a LSTM is not as straightforward as through other common Deep Learning architectures, due to the special way its underlying layers interact. Nonetheless, the approach is largely the same; identifying dependencies and applying the chain rule.
 
 
-Lets start with defining a function to initialize gradients of each parameter as arrays made up of zeros with same dimensions as the corresponding parameter
+Lets start with defining a function to initialize gradients of each parameter as arrays made up of zeros with same dimensions as the corresponding parameter.
 
 ```python
 # Initialise the gradients
@@ -777,10 +777,10 @@ def backprop(y, caches, hidden_dim, input_dim, time_steps, parameters):
 
 ### Updating the Parameters
 
-We update the parameters through an optimization algorithm called [Adam](https://optimization.cbe.cornell.edu/index.php?title=Adam) which is an extension to stochastic gradient descent that has recently seen broader adoption for deep learning applications in computer vision and natural language processing. Specifically, the algorithm calculates an exponential moving average of the gradient and the squared gradient, and the parameters `beta1` and `beta2` control the decay rates of these moving averages. Adam has shown increased convergence and robustness over other gradient descent algorithms and is often recommended as the default optimizer for training.
+We update the parameters through an optimization algorithm called [Adam](https://optimization.cbe.cornell.edu/index.php?title=Adam), which is an extension to stochastic gradient descent that has recently seen broader adoption for deep learning applications in computer vision and natural language processing. Specifically, the algorithm calculates an exponential moving average of the gradient and the squared gradient, and the parameters `beta1` and `beta2` control the decay rates of these moving averages. Adam has shown increased convergence and robustness over other gradient descent algorithms, and is often recommended as the default optimizer for training.
 
 
-Define a function to initialise the moving averages for each parameter
+Define a function to initialise the moving averages for each parameter:
 
 ```python
 # initialise the moving averages
@@ -795,7 +795,7 @@ def initialise_mav(hidden_dim, input_dim, params):
     return v, s
 ```
 
-Define a function to update the parameters
+Define a function to update the parameters:
 
 ```python
 # Update the parameters using Adam optimization
@@ -820,7 +820,7 @@ def update_parameters(parameters, gradients, v, s,
 ### Training the Network
 
 
-You will start by initializing all the parameters and hyperparameters being used in your network
+You will start by initializing all the parameters and hyperparameters being used in your network:
 
 ```python
 hidden_dim = 64
@@ -834,8 +834,10 @@ v, s = initialise_mav(hidden_dim,
                       parameters)
 ```
 
-To optimize your deep learning network, you need to calculate a loss based on how well the model is doing on the training data. Loss value implies how poorly or well a model behaves after each iteration of optimization. <br>
-Define a function to calculate the loss using [negative log likelihood](http://d2l.ai/chapter_linear-networks/softmax-regression.html?highlight=negative%20log%20likelihood#log-likelihood)
+To optimize your deep learning network, you need to calculate a loss based on how well the model is doing on the training data. Loss value implies how poorly or well a model behaves after each iteration of optimization.
+
+
+Define a function to calculate the loss using [negative log likelihood](http://d2l.ai/chapter_linear-networks/softmax-regression.html?highlight=negative%20log%20likelihood#log-likelihood):
 
 ```python
 def loss_f(A, Y):
@@ -849,7 +851,7 @@ def loss_f(A, Y):
 ```
 
 Set up the neural network's learning experiment with a training loop and start the training process. You will also evaluate the model's performance on the training dataset to see how well the model is *learning* and the testing dataset to see how well it is *generalizing*.
->Skip running this cell if you already have the trained parameters stored in a `npy` file
+>Skip running this cell if you already have the trained parameters stored in a `npy` file.
 
 ```python
 # To store training losses
@@ -952,7 +954,7 @@ plt.show()
 ### Sentiment Analysis on the Speech Data
 
 
-Once your model is trained, you can use the updated parameters to start making our predictions. You can break each speech into paragraphs of uniform size before passing them to the Deep Learning model and predicting the sentiment of each paragraph
+Once your model is trained, you can use the updated parameters to start making our predictions. You can break each speech into paragraphs of uniform size before passing them to the Deep Learning model and predicting the sentiment of each paragraph.
 
 ```python
 # To store predicted sentiments
@@ -1028,7 +1030,7 @@ In the plot above, you're shown what percentages of each speech are expected to
 <!-- #region -->
 It's crucial to understand that accurately identifying a text's sentiment is not easy primarily because of the complex ways in which humans express sentiment, using irony, sarcasm, humor, or, in social media, abbreviation. Moreover neatly placing text into two categories: 'positive' and 'negative' can be problematic because it is being done without any context. Words or abbreviations can convey very different sentiments depending on age and location, none of which we took into account while building our model.
 
-Along with data, there are also growing concerns that data processing algorithms are influencing policy and daily lives in ways that are not transparent and introduce biases. Certain biases such as the [Inductive Bias](https://bit.ly/2WtTKIe) are essential to help a Machine Learning model generalize better, for example the LSTM we built earlier is biased towards preserving contextual information over long sequences which makes it so suitable for processing sequential data. The problem arises when [societal biases](https://hbr.org/2019/10/what-do-we-do-about-the-biases-in-ai) creep into algorithmic predictions. Optimizing Machine algorithms via methods like [hyperparameter tuning](https://en.wikipedia.org/wiki/Hyperparameter_optimization) can then further amplify these biases by learning every bit of information in the data. 
+Along with data, there are also growing concerns that data processing algorithms are influencing policy and daily lives in ways that are not transparent and introduce biases. Certain biases such as the [Inductive Bias](https://bit.ly/2WtTKIe) are essential to help a Machine Learning model generalize better, for example the LSTM we built earlier is biased towards preserving contextual information over long sequences which makes it so suitable for processing sequential data. The problem arises when [societal biases](https://hbr.org/2019/10/what-do-we-do-about-the-biases-in-ai) creep into algorithmic predictions. Optimizing Machine Learning algorithms via methods like [hyperparameter tuning](https://en.wikipedia.org/wiki/Hyperparameter_optimization) can then further amplify these biases by learning every bit of information in the data. 
 
 
 There are also cases where bias is only in the output and not the inputs (data, algorithm). For example, in sentiment analysis [accuracy tends to be higher on female-authored texts than on male-authored ones]( https://doi.org/10.3390/electronics9020374). End users of sentiment analysis should be aware that its small gender biases can affect the conclusions drawn from it and apply correction factors when necessary. Hence, it is important that demands for algorithmic accountability should include the ability to test the outputs of a system, including the ability to drill down into different user groups by gender, ethnicity and other characteristics, to identify, and hopefully suggest corrections for, system output biases.
@@ -1039,7 +1041,7 @@ There are also cases where bias is only in the output and not the inputs (data,
 
 You have learned how to build and train a simple Long Short Term Memory network from scratch using just NumPy to perform sentiment analysis.
 
-To further enhance and optimize your neural network model, you can consider one of a mixture of the following:
+To further enhance and optimize your neural network model, you can consider one or a mixture of the following:
 
 - Alter the architecture by introducing multiple LSTM layers to make the network deeper.
 - Use a higher epoch size to train longer and add more regularization techniques, such as early stopping, to prevent overfitting.
@@ -1053,7 +1055,7 @@ Nowadays, LSTMs have been replaced by the [Transformer](https://jalammar.github.
 
 Building a neural network from scratch with NumPy is a great way to learn more about NumPy and about deep learning. However, for real-world applications you should use specialized frameworks — such as PyTorch, JAX or TensorFlow — that provide NumPy-like APIs, have built-in automatic differentiation and GPU support, and are designed for high-performance numerical computing and machine learning.
 
-Finally, to know more about how ethics come into play when developing a machine learning model, you can refer to the following resources :
+Finally, to know more about how ethics come into play when developing a machine learning model, you can refer to the following resources:
 - [Data ethics resources](https://www.turing.ac.uk/research/data-ethics) by the Turing Institute
 - Considering how artificial intelligence shifts power, an [article](https://www.nature.com/articles/d41586-020-02003-2) and [talk](https://slideslive.com/38923453/the-values-of-machine-learning) by Pratyusha Kalluri
 - More ethics resources on [this blog post](https://www.fast.ai/2018/09/24/ai-ethics-resources/) by Rachel Thomas and the [Radical AI podcast](https://www.radicalai.org/)

From 885822b312db4fb88b6b3c07e235ad07a8059581 Mon Sep 17 00:00:00 2001
From: "Christine P. Chai" <star1327p@gmail.com>
Date: Thu, 9 Oct 2025 10:55:23 -0700
Subject: [PATCH 4/5] DOC: weights values -> weights

---
 content/tutorial-deep-learning-on-mnist.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/tutorial-deep-learning-on-mnist.md b/content/tutorial-deep-learning-on-mnist.md
index e1085b91..d58f4f7b 100644
--- a/content/tutorial-deep-learning-on-mnist.md
+++ b/content/tutorial-deep-learning-on-mnist.md
@@ -384,7 +384,7 @@ layer.)](_static/tutorial-deep-learning-on-mnist.png)
 
     In the beginning of model training, your network randomly initializes the weights and feeds the input data forward through the hidden and output layers. This process is the forward pass or forward propagation.
 
-    Then, the network propagates the "signal" from the loss function back through the hidden layer and adjusts the weights values with the help of the learning rate parameter (more on that later).
+    Then, the network propagates the "signal" from the loss function back through the hidden layer and adjusts the weights with the help of the learning rate parameter (more on that later).
 
 > **Note:** In more technical terms, you:
 >

From 78cb7f462c5f88090fd500aee713090dfc6f657a Mon Sep 17 00:00:00 2001
From: "Christine P. Chai" <star1327p@gmail.com>
Date: Thu, 9 Oct 2025 11:14:16 -0700
Subject: [PATCH 5/5] Update content/mooreslaw-tutorial.md

Co-authored-by: Ross Barnowski <rossbar@caltech.edu>
---
 content/mooreslaw-tutorial.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/mooreslaw-tutorial.md b/content/mooreslaw-tutorial.md
index 8dc7417b..cedc32d9 100644
--- a/content/mooreslaw-tutorial.md
+++ b/content/mooreslaw-tutorial.md
@@ -282,7 +282,7 @@ In the next plot, use the
 [`fivethirtyeight`](https://matplotlib.org/3.1.1/gallery/style_sheets/fivethirtyeight.html)
 style sheet.
 The style sheet replicates
-[https://fivethirtyeight.com](https://fivethirtyeight.com) elements. Change the matplotlib style with
+<https://fivethirtyeight.com> elements. Change the matplotlib style with
 [`plt.style.use`](https://matplotlib.org/3.3.2/api/style_api.html#matplotlib.style.use).
 
 ```{code-cell}