Skip to main content

Reshaping Data in R

A guest post, contributed by Cathy Yeh.

Can you format some data in Excel for me?

sad tapir
Tapirs love formatting Excel spreadsheets

If you’re as excited as this tapir about the prospect of formatting data in Excel, read on!

Today, we’ll talk about reshaping data in R. At the same time, we’ll see how for-loops can be avoided by using R functionals (functions of functions). Functionals are faster than for-loops and make code easier to read by clearly laying out the intent of a loop.

Read More

Processing and processing.js tips and tricks in WordPress

We recently developed our NBA dashboard in the programming language Processing. In addition, we have Processing apps in our post on classification without negative examples as well as our weekly NBA predictions. Here, we will briefly describe (and recommend) Processing and discuss some tips and tricks we have discovered in developing and deploying our above-mentioned apps to our WordPress blog. (more…)

Read More

Daily traffic evolution and the Super Bowl

With an eye towards predicting traffic evolution, we begin by examining the time-dependence of the contribution from the first principal components on different days of the week. Traffic throughout the day $\vert x(t) \rangle$ can be represented in the basis of principal components; $\vert x(t) \rangle$ $= \sum_{i} c_i(t) \vert \phi_i \rangle $$^1$, where $\vert \phi_i \rangle$ is the ith principle component. The coefficients $c_i(t)$, sometimes called the “scores” of $\vert x(t) \rangle $ in the basis of principal components, carry all of the dynamics.

The largest deviations in the traffic patterns (and of the scores) are during weekday rush hours (around 8 am and 5 pm) – see plot of the scores for several modes throughout Jan. 15. (more…)

Read More

NBA learner: 2013-14 warmup

We’ve spent the last couple of evenings training some preliminary algorithms on the NBA 2013-14, regular season data, which we grabbed from  Each of the 30 NBA teams play 82 times a season, summing to 1230 games total — a sizable number that we can comfortably attempt to model.  Here, we cover our first pass at the prediction problem, what we’ve learned so far, and challenges we’re looking forward to tackling soon. (more…)

Read More

Announcing: NBA learner v0.1

Tonight is the opening night of the 2014-15 NBA season.  This year, we will be running a machine learning algorithm aimed at estimating underlying features characterizing each team.  With these features, we hope to identify interesting match-ups (including potential upsets), similar team-playing-style categories, and win-loss probabilities for future games.  As of now, the only source data that we intend to feed our system will be win-loss results of completed games.  As the season progresses, our algorithm will thus have more and more data informing it —  It will be interesting to see if it can begin to provide accurate predictions by the end of the season.  Stay tuned for periodic updates on this experiment!

Read More

Data reduction by PCA


Here, we characterize the data compression benefits of projection onto a subset of the eigenvectors of our traffic system’s covariance matrix.  We address this compression from two different perspectives:  First, we consider the partial traces of the covariance matrix, and second we present visual comparisons of the actual vs. projected traffic plots. (more…)

Read More