• Sebastian Thrun’s Stanford students built an ML model to identify cancer in images of skin.
  • Survival rates drastically impacted by early detection.
  • Hard for humans to do.
    • Much harder than dog breed identification.
    • Dermatologists are paid handsomely for their abilitiy to do it.


  • First, build classification tree.
    • Has over 2000 classifications
    • 3 branches from root (Benign, malignant, non-neoplastic)
  • Gather training data.
    • Correctly labeled images of skin disease.
  • Clean data by correcting for
    • resolution
    • lighting
    • presence of foreign objects
    • duplicates
  • Build net
    • Used stock RCNN
    • Using net pretrained to identify animals actually improved skin cancer detection.
      • Makes sense to me. There are some common characteristics between them: Outlines, shape, contrast.
  • Train net to classify 757 outcomes
  • Validate results
    • Compare network success vs actual dermatologist success on same data set.

Sensitivity/Specificity & Precision/Recall

  • They’re all metrics of success
    • Sensitivity:
      • Of all the positive outcomes (In this case, the images that contained skin cancer), how many were labeled as positive?
    • Specificity:
      • Of all the negative outcomes (Or, the images that did not contain skin cancer), how many were labeled as negative?
    • Recall:
      • Synonymous with sensitivity. How many positive cases were correctly labeled by the network?
    • Precision:
      • Out of all the cases we labeled as positive, how many were actually positive?
  • Where:
    • True positives (TP)
    • True negatives (TN)
    • False positives (FP)
    • False negatives (FN)

Threshold for labeling cancer

  • Set a low threshold for determining cancer
    • It’s better to send a healthy person to the doctor for more tests than it is to tell a sick person they’re fine.

ROC curves

  • Receiver operating characteristic
  • Metric to determine accuracy of a ‘split’
    • perfect
    • good
    • bad (or random)
  • Requires finding
    • True positive rate
    • False positive rate
    • For positive and negative labels
  • Make coordinates out of $\frac{TP}{FP}$
    • Plot them on a $0,0$ to $1,1$ graph
    • Area under the curve is the final score

How does the model know?

  • We can’t say for sure, but these are visualizations of the feature maps that the network thinks are significant:
  • Skin Cancer Feature Maps

Confusion Matrices

  • A graph that says, ‘If I am class B, what are the chances of the NN classifying me as A or C?’

Broader Application

  • Does this replace doctors?
    • No. They do much more than classifying pictures.
    • It will augment a doctor. They will spend less time looking at pictures.
  • DL won’t replace all jobs. It might free up resources to focus on more complicated work or increase one person’s impact.