Andrew Trask, author of Grokking Deep Learning, walks us through developing a neural network that will predict whether or not movie reviews are positive or negative.

## Framing the Problem

• Neural nets know nothing inherently.
• We have data, what we know, and we frame the problem by deciding what we want to know about that data.
• What is the prediction our model will make from a set of inputs?
• For our data and exercise, the what we know is a bunch of movie reviews. What we want to know is, ‘Is this a positive or negative review?’

## Develop a Theory

• Before building a neural net to make a prediction, see if you can figure it out as a human.
• This can help you see patterns that might help in constructing a neural net. It could also uncover a naive solution, saving the work of building the neural net at all.
• My theory for a naive solution: Create an array of positive words and negative words. Parse reviews, counting occurrences of each. Which ever is greater will reveal whether or it’s positive or negative.

## Explore the Data

• Following this theory is problematic, because you see pretty soon the most common words don’t reveal sentiment (The, and, is etc.)
• We can highlight the words that differ between pos and neg reviews and then manipulate the data into a shape that is more useful.
• Trasks solution ends with assigning a positive score to positive words, and a negative score to negative words.
• Once we start listing words and their scores, we start to see a pattern, or signal, that inspires confidence we’ll be able to use word occurrence to predict whether a review is pos or neg

### Signal vs Noise

• Signal refers to a meaningful pattern in data
• The opposite of signal is noise.
• Consider the initial counts of word occurrences. Not very meaningful by itself.
• Once word occurrence in pos reviews are juxtaposed against neg review occurrences, a more meaningful pattern is revealed.
• From that pattern we can make important statements about the data.
• Like a review with the word ‘superb’ in it is likely to be a positive review, and ‘atrocious’ is likely to indicate a negative review.

## Designing the Model

• How we design the model from the beginning will bias it towards success or failure.
• For example, this model’s output should be binary. We only want to predict a review as pos or neg, so the output should be 1 or 0
• If we applied a sigmoid activation function to create a scaled output from 0.0 to 1.0, it would have more room for error.
• The input should be a list of numbers counting the occurrences of any word in the review.
• The size of the data quickly becomes unwieldy to work with. Employ some strategies to make it easier:
• Instead of building and testing for the whole dataset, use one example from the set at a time. For us, that would be the first review.
• Since memory allocation is so expensive, initialize lists and matrices as early as possible (Using np.zeros probably) then update values within them.

## Initializing Weights

• So far we’ve just chosen random values to populate our weights. There are other strategies, though.
• One is to choose starting weights between $y$ and $-y$ where $y = 1 / \sqrt{n}$ where n is the number of input nodes.
• Depending on the model, it might make sense to use the number of hidden nodes for the value of $n$.

## The digging for gold analogy

The data is a river, meaningful patterns in the data is the gold, and the neural net is the pan that helps you sift out the gold. If you’re not finding gold, there’s a good chance you’re panning in the wrong part of the river. Or, to break the analogy, reshape the data to highlight the signal and reduce the noise.