• Predict bike sharing rides
  • Context:
    • You own bikeshare company
    • How many bikes do you need?
      • Too many: Waste money on unused bikes
      • Too few: Waste money on customer loss
    • Use historical data to predict # of bikes required

Rubric

Data example

  • We’re using a real data set. It takes a lot of factors into account. Here’s one row:

    instant dteday season yr mnth hr holiday weekday workingday weathersit temp atemp hum windspeed casual registered cnt  
    0 1 2011-01-01 1 0 1 0 0 6 0 1 0.24 0.2879 0.81 0.0 3 13 16
  • Traditionally, we might try to suss out the individual importance of factors like windspeed to the number of riders on a given day.
  • Our model will be able to (hopefully) holistically predict how many riders will use the service on any given day.

Prepare data

  • Create dummy variables to handle multiple classes.
    • For example, hr is a field that ranges from 0-24. Hour 24 is not 24 times ‘heavier’ than hour 1, so to prevent misleading results, will create 25 columns w/ a 0 or 1 to indicate the hour for a row.
    • pandas handles this with the get_dummies method.
  • Drop the fields we don’t care about including the fields we made dummy values from.
  • Scale target fields so they’re conistent and easier to work with

    shift and scale the variables such that they have a mean of 0 and standard deviation of 1

  • Split data into appropriate groups
    • Test set = approximately last 21 days
      • Remove test set from data set
      • Get targets and features from test set as well
    • Target set = fields indicating # of riders on a given day
    • Feature set = fields that are not targets
    • Break targets and features into training and validation sets
      • Training = rows more than 60 days old
      • Validation = rows newer than 60 days old

Run model and tune hyperparameters

  • This model has 3 hyperparameters to play with
    • Epochs: How many times we train the model
      • More gives the model more chances to learn the pattern in the data.
      • More also is more computationally taxing.
      • Ideally, network will learn more on each epoch without overfitting the data (Validation and error loss should both go down on each epoch.)
    • Learning rate: The constant we adjust the error by.
      • Larger means model can find pattern faster
      • Larger also can break model because gradient descent will keep missing the optimal weight.
      • If the network is struggling, lower the learning rate.
    • Hidden nodes: How many nodes we push the input through.
      • Too few and the network will never learn the pattern.
      • Too many and the network will overfit the data.
      • Rule o thumb: Use a number in between number of inputs and number of outputs.
  • After running the model, we have two numbers to measure its performance:
    • Training loss: Error on training data
      • Should never go up
    • Validation loss: Error on validation data
      • If this starts going up, it means the model has overfit the data.
  • The real world test is running the original data through the trained model and comparing its predictions against the actual results.

My results

After a lot of trial and error, I found the most success with a

iterations = 2000
learning_rate = 1
hidden_nodes = 11
output_nodes = 1

The training loss was around .07 and the validation loss was around 0.15.