Skip to main content
A photo by Patryk Grądys. unsplash.com/photos/4pPzKfd6BEg

GPU-accelerated Theano & Keras with Windows 10

There are many tutorials with directions for how to use your Nvidia graphics card for GPU-accelerated Theano and Keras for Linux, but there is only limited information out there for you if you want to set everything up with Windows and the current CUDA toolkit. This is a shame however because there are a large number of computers out there with very nice video cards that are only running windows, and it is not always practical to use a Virtual Machine, or Dual-Boot.  So for today’s post we will go over how to get everything running in Windows 10 by saving you all the trial and error I went through. (All of these steps should also work in earlier versions of Windows).

(more…)

Read More

Xiao_Liwu_im_San_Diego_Zoo_-_Foto_2

Getting started with Pandas

We have made use of Python’s Pandas package in a variety of posts on the site. These have showcased some of Pandas’ abilities including the following:

  • DataFrames for data manipulation with built in indexing
  • Handling of missing data
  • Data alignment
  • Melting/stacking and Pivoting/unstacking data sets
  • Groupby feature allowing split -> apply -> combine operations on data sets
  • Data merging and joining

Pandas is also a high performance library, with much of its code written in Cython or C. Unfortunately, Pandas can have a bit of a steep learning curve — In this post, I’ll cover some introductory tips and tricks to help one get started with this excellent package.

Notes:

  • This post was partially inspired by Tom Augspurger’s Pandas tutorial, which has a youtube video that can be viewed along side it. We also suggest some other excellent resource materials — where relevant — below.
  • The notebook we use below can be downloaded from our github page. Feel free to grab it and follow along.


Follow us on twitter for new submission alerts!

(more…)

Read More

Alcatraz

Machine learning to predict San Francisco crime

In today’s post, we document our submission to the recent Kaggle competition aimed at predicting the category of San Francisco crimes, given only their time and location of occurrence. As a reminder, Kaggle is a site where one can compete with other data scientists on various data challenges.  We took this competition as an opportunity to explore the Naive Bayes algorithm. With the few steps discussed below, we were able to quickly move from the middle of the pack to the top 33% on the competition leader board, all the while continuing with this simple model!

(more…)

Read More

MeanShiftClustering

The mean shift clustering algorithm

Mean shift clustering

Mean shift clustering is a general non-parametric cluster finding procedure — introduced by Fukunaga and Hostetler [1], and popular within the computer vision field. Nicely, and in contrast to the more-well-known K-means clustering algorithm, the output of mean shift does not depend on any explicit assumptions on the shape of the point distribution, the number of clusters, or any form of random initialization.
(more…)

Read More

Vélodi-Dijon

Forecasting Bike Sharing Demand

In today’s post, we document our efforts at applying a gradient boosted trees model to forecast bike sharing demand — a problem posed in a recent Kaggle competition. For those not familiar, Kaggle is a site where one can compete with other data scientists on various data challenges. Top scorers often win prize money, but the site more generally serves as a great place to grab interesting datasets to explore and play with. With the simple optimization steps discussed below, we managed to quickly move from the bottom 10% of the competition — our first-pass attempt’s score — to the top 10%: no sweat!

(more…)

Read More

measles vaccine

Measles vaccination rate by USA state and relation to mean outbreak size

In this post, we provide a quick overview of the data and science of measles spread. Making use of python (code provided) we extract from a CDC data set the 2012 youth vaccination rate for each USA state — see figure below. To aid in the interpretation of this data, we also review and describe the results of a generalized “SIR” model for disease spread. (more…)

Read More

Screen Shot 2015-01-21 at 12.36.09 PM

Machine learning for facial recognition

A guest post, contributed by Damien Ramunno-Johnson (LinkedIn, bio-sketch)


Follow us on twitter for new submission alerts!

Introduction

The ability to identify faces is a skill that people develop very early in life and can apply almost effortlessly. One reason for this is that our brains are very well adapted for pattern recognition. In contrast, facial recognition can be a somewhat difficult problem for computers. Today, given a full frontal image of a face, computer facial recognition software works well. However, problems can arise given large camera angles, poor lighting, or exaggerated facial expressions: Computers have a ways to go before they catch up with us in this arena. (more…)

Read More