Project 1: Bike Sharing
 Predict bike sharing rides
 Context:
 You own bikeshare company
 How many bikes do you need?
 Too many: Waste money on unused bikes
 Too few: Waste money on customer loss
 Use historical data to predict # of bikes required
Rubric
Data example

We’re using a real data set. It takes a lot of factors into account. Here’s one row:
instant dteday season yr mnth hr holiday weekday workingday weathersit temp atemp hum windspeed casual registered cnt 0 1 20110101 1 0 1 0 0 6 0 1 0.24 0.2879 0.81 0.0 3 13 16  Traditionally, we might try to suss out the individual importance of factors like windspeed to the number of riders on a given day.
 Our model will be able to (hopefully) holistically predict how many riders will use the service on any given day.
Prepare data
 Create dummy variables to handle multiple classes.
 For example,
hr
is a field that ranges from 024. Hour 24 is not 24 times ‘heavier’ than hour 1, so to prevent misleading results, will create 25 columns w/ a0
or1
to indicate the hour for a row. pandas
handles this with theget_dummies
method.
 For example,
 Drop the fields we don’t care about including the fields we made dummy values from.
 Scale target fields so they’re conistent and easier to work with
shift and scale the variables such that they have a mean of 0 and standard deviation of 1
 Split data into appropriate groups
 Test set = approximately last 21 days
 Remove test set from data set
 Get targets and features from test set as well
 Target set = fields indicating # of riders on a given day
 Feature set = fields that are not targets
 Break targets and features into training and validation sets
 Training = rows more than 60 days old
 Validation = rows newer than 60 days old
 Test set = approximately last 21 days
Run model and tune hyperparameters
 This model has 3 hyperparameters to play with
 Epochs: How many times we train the model
 More gives the model more chances to learn the pattern in the data.
 More also is more computationally taxing.
 Ideally, network will learn more on each epoch without overfitting the data (Validation and error loss should both go down on each epoch.)
 Learning rate: The constant we adjust the error by.
 Larger means model can find pattern faster
 Larger also can break model because gradient descent will keep missing the optimal weight.
 If the network is struggling, lower the learning rate.
 Hidden nodes: How many nodes we push the input through.
 Too few and the network will never learn the pattern.
 Too many and the network will overfit the data.
 Rule o thumb: Use a number in between number of inputs and number of outputs.
 Epochs: How many times we train the model
 After running the model, we have two numbers to measure its performance:
 Training loss: Error on training data
 Should never go up
 Validation loss: Error on validation data
 If this starts going up, it means the model has overfit the data.
 Training loss: Error on training data
 The real world test is running the original data through the trained model and comparing its predictions against the actual results.
My results
After a lot of trial and error, I found the most success with a
iterations = 2000
learning_rate = 1
hidden_nodes = 11
output_nodes = 1
The training loss was around .07 and the validation loss was around 0.15.