# Sentiment Analysis

Andrew Trask, author of Grokking Deep Learning, walks us through developing a neural network that will predict whether or not movie reviews are positive or negative.

## Framing the Problem

- Neural nets know nothing inherently.
- We have data, what we know, and we frame the problem by deciding what we want to know about that data.
- What is the prediction our model will make from a set of inputs?

- For our data and exercise, the what we know is a bunch of movie reviews. What we want to know is, ‘Is this a positive or negative review?’

## Develop a Theory

- Before building a neural net to make a prediction, see if you can figure it out as a human.
- This can help you see patterns that might help in constructing a neural net. It could also uncover a naive solution, saving the work of building the neural net at all.
- My theory for a naive solution: Create an array of positive words and negative words. Parse reviews, counting occurrences of each. Which ever is greater will reveal whether or it’s positive or negative.

## Explore the Data

- Following this theory is problematic, because you see pretty soon the most common words don’t reveal sentiment (The, and, is etc.)
- We can highlight the words that differ between pos and neg reviews and then manipulate the data into a shape that is more useful.
- Trasks solution ends with assigning a positive score to positive words, and a negative score to negative words.
- Once we start listing words and their scores, we start to see a pattern, or signal, that inspires confidence we’ll be able to use word occurrence to predict whether a review is pos or neg

### Signal vs Noise

- Signal refers to a meaningful pattern in data
- The opposite of signal is noise.

- Consider the initial counts of word occurrences. Not very meaningful by itself.
- Once word occurrence in pos reviews are juxtaposed against neg review occurrences, a more meaningful pattern is revealed.
- From that pattern we can make important statements about the data.
- Like a review with the word ‘superb’ in it is likely to be a positive review, and ‘atrocious’ is likely to indicate a negative review.

## Designing the Model

- How we design the model from the beginning will bias it towards success or failure.
- For example, this model’s output should be binary. We only want to predict a review as pos or neg, so the output should be 1 or 0
- If we applied a sigmoid activation function to create a scaled output from 0.0 to 1.0, it would have more room for error.

- The input should be a list of numbers counting the occurrences of any word in the review.
- The size of the data quickly becomes unwieldy to work with. Employ some strategies to make it easier:
- Instead of building and testing for the whole dataset, use one example from the set at a time. For us, that would be the first review.
- Since memory allocation is so expensive, initialize lists and matrices as early as possible (Using
`np.zeros`

probably) then update values within them.

## Initializing Weights

- So far we’ve just chosen random values to populate our weights. There are other strategies, though.
- One is to choose starting weights between $y$ and $-y$ where $y = 1 / \sqrt{n}$ where n is the number of input nodes.
- Depending on the model, it might make sense to use the number of hidden nodes for the value of $n$.

## The digging for gold analogy

The data is a river, meaningful patterns in the data is the gold, and the neural net is the pan that helps you sift out the gold. If you’re not finding gold, there’s a good chance you’re panning in the wrong part of the river. Or, to break the analogy, reshape the data to highlight the signal and reduce the noise.