Skip to main content

Gaussian Processes

We review the math and code needed to fit a Gaussian Process (GP) regressor to data. We conclude with a demo of a popular application, fast function minimization through GP-guided search. The gif below illustrates this approach in action — the red points are samples from the hidden red curve. Using these samples, we attempt to leverage GPs to find the curve’s minimum as fast as possible.


Appendices contain quick reviews on (i) the GP regressor posterior derivation, (ii) SKLearn’s GP implementation, and (iii) GP classifiers.


Read More


Dotfiles for peace of mind

Reinstalling software and configuring settings on a new computer is a pain. After my latest hard drive failure set the stage for yet another round of download-extract-install and configuration file twiddling, it was time to overhaul my approach. "Enough is enough!"

This post walks through

  1. how to back up and automate the installation and configuration process
  2. how to set up a minimal framework for data science

We’ll use a dotfiles repository on Github to illustrate both points in parallel.


Read More


Getting started with Pandas

We have made use of Python’s Pandas package in a variety of posts on the site. These have showcased some of Pandas’ abilities including the following:

  • DataFrames for data manipulation with built in indexing
  • Handling of missing data
  • Data alignment
  • Melting/stacking and Pivoting/unstacking data sets
  • Groupby feature allowing split -> apply -> combine operations on data sets
  • Data merging and joining

Pandas is also a high performance library, with much of its code written in Cython or C. Unfortunately, Pandas can have a bit of a steep learning curve — In this post, I’ll cover some introductory tips and tricks to help one get started with this excellent package.


  • This post was partially inspired by Tom Augspurger’s Pandas tutorial, which has a youtube video that can be viewed along side it. We also suggest some other excellent resource materials — where relevant — below.
  • The notebook we use below can be downloaded from our github page. Feel free to grab it and follow along.

Follow us on twitter for new submission alerts!


Read More


Build a web scraper for a literature search – from soup to nuts

Code, references, and examples of this project are on Github.

In this post, I’ll describe the soup to nuts process of automating a literature search in Pubmed Central using R.

It feels deeply satisfying to sit back and let the code do the dirty work.

Is it as satisfying as a bowl of red-braised beef noodle soup with melt-in-your-mouth tendons from Taipei’s Yong Kang Restaurant (featured image)?

If you have to do a lit search like this more than once, then I have to say the answer is yes — unequivocally, yes.

Read More

Reshaping Data in R

A guest post, contributed by Cathy Yeh.

Can you format some data in Excel for me?

sad tapir
Tapirs love formatting Excel spreadsheets

If you’re as excited as this tapir about the prospect of formatting data in Excel, read on!

Today, we’ll talk about reshaping data in R. At the same time, we’ll see how for-loops can be avoided by using R functionals (functions of functions). Functionals are faster than for-loops and make code easier to read by clearly laying out the intent of a loop.

Read More