This will be the first of a series of short posts relating to subject matter discussed in the text, “An Introduction to Statistical Learning”. This is an interesting read, but it often skips over statement proofs — that’s where this series of posts comes in! Here, I consider the content of Section 5.1.2: This gives a lightning-quick “short cut” method for evaluating a regression’s leave-one-out cross-validation error. The method is applicable to any least-squares linear fit.

# Theory

## The mean shift clustering algorithm

### Mean shift clustering

Mean shift clustering is a general non-parametric cluster finding procedure — introduced by Fukunaga and Hostetler [1], and popular within the computer vision field. Nicely, and in contrast to the more-well-known K-means clustering algorithm, the output of mean shift does not depend on any explicit assumptions on the shape of the point distribution, the number of clusters, or any form of random initialization.

(more…)

## Machine Learning Methods: Decision trees and forests

This post contains our crib notes on the basics of decision trees and forests. We first discuss the construction of individual trees, and then introduce random and boosted forests. We also discuss efficient implementations of greedy tree construction algorithms, showing that a single tree can be constructed in $O(k \times n \log n)$ time, given $n$ training examples having $k$ features each. We provide exercises on interesting related points and an appendix containing relevant python/sk-learn function calls.

(more…)