Two microphones are placed in a room where two conversations are taking place simultaneously. Given these two recordings, can one “remix” them in some prescribed way to isolate the individual conversations? Yes! In this post, we review one simple approach to solving this type of problem, Independent Component Analysis (ICA). We share an ipython document implementing ICA and link to a youtube video illustrating its application to audio de-mixing.
We review the two essentials of principal component analysis (“PCA”): 1) The principal components of a set of data points are the eigenvectors of the correlation matrix of these points in feature space. 2) Projecting the data onto the subspace spanned by the first $k$ of these — listed in descending eigenvalue order — provides the best possible $k$-dimensional approximation to the data, in the sense of captured variance.
To whet your appetite for support vector machines, here’s a quote from machine learning researcher Andrew Ng:
“SVMs are among the best (and many believe are indeed the best) ‘off-the-shelf’ supervised learning algorithms.”
Professor Ng covers SVMs in his excellent Machine Learning MOOC, a gateway for many into the realm of data science, but leaves out some details, motivating us to put together some notes here to answer the question:
“What are the support vectors in support vector machines?”
This will be the first of a series of short posts relating to subject matter discussed in the text, “An Introduction to Statistical Learning”. This is an interesting read, but it often skips over statement proofs — that’s where this series of posts comes in! Here, I consider the content of Section 5.1.2: This gives a lightning-quick “short cut” method for evaluating a regression’s leave-one-out cross-validation error. The method is applicable to any least-squares linear fit.
Mean shift clustering
Mean shift clustering is a general non-parametric cluster finding procedure — introduced by Fukunaga and Hostetler , and popular within the computer vision field. Nicely, and in contrast to the more-well-known K-means clustering algorithm, the output of mean shift does not depend on any explicit assumptions on the shape of the point distribution, the number of clusters, or any form of random initialization.
This post contains our crib notes on the basics of decision trees and forests. We first discuss the construction of individual trees, and then introduce random and boosted forests. We also discuss efficient implementations of greedy tree construction algorithms, showing that a single tree can be constructed in $O(k \times n \log n)$ time, given $n$ training examples having $k$ features each. We provide exercises on interesting related points and an appendix containing relevant python/sk-learn function calls.