Skip to main content

Martingales

Here, I give a quick review of the concept of a Martingale. A Martingale is a sequence of random variables satisfying a specific expectation conservation law. If one can identify a Martingale relating to some other sequence of random variables, its use can sometimes make quick work of certain expectation value evaluations.

This note is adapted from Chapter 2 of Stochastic Calculus and Financial Applications, by Steele.

(more…)

Read More

log

Logistic Regression

We review binary logistic regression. In particular, we derive a) the equations needed to fit the algorithm via gradient descent, b) the maximum likelihood fit’s asymptotic coefficient covariance matrix, and c) expressions for model test point class membership probability confidence intervals. We also provide python code implementing a minimal “LogisticRegressionWithError” class whose “predict_proba” method returns prediction confidence intervals alongside its point estimates.

Our python code can be downloaded from our github page, here. Its use requires the jupyter, numpy, sklearn, and matplotlib packages.

(more…)

Read More

spectrum

Model AUC depends on test set difficulty

The AUC score is a popular summary statistic that is often used to communicate the performance of a classifier. However, we illustrate here that this score depends not only on the quality of the model in question, but also on the difficulty of the test set considered: If samples are added to a test set that are easily classified, the AUC will go up — even if the model studied has not improved. In general, this behavior implies that isolated, single AUC scores cannot be used to meaningfully qualify a model’s performance. Instead, the AUC should be considered a score that is primarily useful for comparing and ranking multiple models — each at a common test set difficulty.

(more…)

Read More

title

Hyperparameter sample-size dependence

Here, we briefly review a subtlety associated with machine-learning model selection: the fact that the optimal hyperparameters for a model can vary with training set size, $N.$ To illustrate this point, we derive expressions for the optimal strength for both $L_1$ and $L_2$ regularization in single-variable models. We find that the optimal $L_2$ approaches a finite constant as $N$ increases, but that the optimal $L_1$ decays exponentially fast with $N.$ Sensitive dependence on $N$ such as this should be carefully extrapolated out when optimizing mission-critical models.

(more…)

Read More

line

Linear Regression

We review classical linear regression using vector-matrix notation. In particular, we derive a) the least-squares solution, b) the fit’s coefficient covariance matrix — showing that the coefficient estimates are most precise along directions that have been sampled over a large range of values (the high variance directions, a la PCA), and c) an unbiased estimate for the underlying sample variance (assuming normal sample variance in this last case). We then review how these last two results can be used to provide confidence intervals / hypothesis tests for the coefficient estimates. Finally, we show that similar results follow from a Bayesian approach.

Last edited July 23, 2016.

(more…)

Read More

line

Average queue wait times with random arrivals

Queries ping a certain computer server at random times, on average $\lambda$ arriving per second. The server can respond to one per second and those that can’t be serviced immediately are queued up. What is the average wait time per query? Clearly if $\lambda \ll 1$, the average wait time is zero. But if $\lambda > 1$, the queue grows indefinitely and the answer is infinity! Here, we give a simple derivation of the general result — (9) below.

(more…)

Read More