## Logistic Regression

We review binary logistic regression. In particular, we derive a) the equations needed to fit the algorithm via gradient descent, b) the maximum likelihood fit’s asymptotic coefficient covariance matrix, and c) expressions for model test point class membership probability confidence intervals. We also provide python code implementing a minimal “LogisticRegressionWithError” class whose “predict_proba” method returns prediction confidence intervals alongside its point estimates.

Our python code can be downloaded from our github page, here. Its use requires the jupyter, numpy, sklearn, and matplotlib packages.

## Normal Distributions

I review — and provide derivations for — some basic properties of Normal distributions. Topics currently covered: (i) Their normalization, (ii) Samples from a univariate Normal, (iii) Multivariate Normal distributions, (iv) Central limit theorem.

## Model AUC depends on test set difficulty

The AUC score is a popular summary statistic that is often used to communicate the performance of a classifier. However, we illustrate here that this score depends not only on the quality of the model in question, but also on the difficulty of the test set considered: If samples are added to a test set that are easily classified, the AUC will go up — even if the model studied has not improved. In general, this behavior implies that isolated, single AUC scores cannot be used to meaningfully qualify a model’s performance. Instead, the AUC should be considered a score that is primarily useful for comparing and ranking multiple models — each at a common test set difficulty.

## Deep reinforcement learning, battleship

Here, we provide a brief introduction to reinforcement learning (RL) — a general technique for training programs to play games efficiently. Our aim is to explain its practical implementation: We cover some basic theory and then walk through a minimal python program that trains a neural network to play the game battleship.

## Bayesian Statistics: MCMC

We review the Metropolis algorithm — a simple Markov Chain Monte Carlo (MCMC) sampling method — and its application to estimating posteriors in Bayesian statistics. A simple python example is provided.

## Interpreting the results of linear regression

Our last post showed how to obtain the least-squares solution for linear regression and discussed the idea of sampling variability in the best estimates for the coefficients. In this post, we continue the discussion about uncertainty in linear regression — both in the estimates of individual linear regression coefficients and the quality of the overall fit.

Specifically, we’ll discuss how to calculate the 95% confidence intervals and p-values from hypothesis tests that are output by many statistical packages like python’s statsmodels or R. An example with code is provided at the end.

## Linear Regression

We review classical linear regression using vector-matrix notation. In particular, we derive a) the least-squares solution, b) the fit’s coefficient covariance matrix — showing that the coefficient estimates are most precise along directions that have been sampled over a large range of values (the high variance directions, a la PCA), and c) an unbiased estimate for the underlying sample variance (assuming normal sample variance in this last case). We then review how these last two results can be used to provide confidence intervals / hypothesis tests for the coefficient estimates. Finally, we show that similar results follow from a Bayesian approach.

Last edited July 23, 2016.

## Independent component analysis

Two microphones are placed in a room where two conversations are taking place simultaneously. Given these two recordings, can one “remix” them in some prescribed way to isolate the individual conversations? Yes! In this post, we review one simple approach to solving this type of problem, Independent Component Analysis (ICA). We share an ipython document implementing ICA and link to a youtube video illustrating its application to audio de-mixing.

## Principal component analysis

We review the two essentials of principal component analysis (“PCA”): 1) The principal components of a set of data points are the eigenvectors of the correlation matrix of these points in feature space. 2) Projecting the data onto the subspace spanned by the first $k$ of these — listed in descending eigenvalue order — provides the best possible $k$-dimensional approximation to the data, in the sense of captured variance.

## Support Vector Machines for classification

To whet your appetite for support vector machines, here’s a quote from machine learning researcher Andrew Ng:

“SVMs are among the best (and many believe are indeed the best) ‘off-the-shelf’ supervised learning algorithms.”

Professor Ng covers SVMs in his excellent Machine Learning MOOC, a gateway for many into the realm of data science, but leaves out some details, motivating us to put together some notes here to answer the question:

“What are the support vectors in support vector machines?”