  As we mentioned in the last post, there are currently over 2000 active speed loop detectors within the Bay Area highway system.  The information provided by these loops is often highly redundant because speeds at neighboring sites typically differ little from one another.  This observation suggests that a higher level, “macro” picture of traffic conditions could provide more insight:  Rather than stating the speed at each detector, we might instead offer info like “101S is rather slow right now”.   In fact, we aim to characterize traffic conditions as efficiently as possible.  To move towards this goal, we have carried out a principal component analysis (PCA)$^1$ of the full 2014 (year to date) PEMS data set.
  Statistical physics of PCA:    One way of thinking about PCA as applied here is to imagine that the traffic system is harmonic.  That is, we suppose that the traffic dynamics observed can be characterized by an energy cost function that is quadratic in the speeds of the different loops, measured relative to their average values, $E = \frac{\beta^{-1}}{2} \delta \textbf{v}^{T} \cdot H \cdot \delta \textbf{v}.$   Here, $\delta v_i = v_i – \langle v_i \rangle$ and $H$ is a matrix Hamiltonian.  Under some effective, thermal driving, the pair correlation for two sites will be given by $\langle \delta v_a \delta v_b \rangle \equiv$$\frac{1}{Z} \int_{{\delta \textbf{v}_i }} e^{- \frac{1}{2} \delta \textbf{v}^{T} \cdot H \cdot \delta \textbf{v}} \delta v_a \delta v_b =$$ H^{-1}_{ab}$.  It is this pair correlation function that is measured when one carries out a PCA analysis, and the matrix $H^{-1}$ is called the covariance matrix.  Its eigenvectors are the modes of the system — the independent traffic patterns that we discuss above.  The low lying modes are those with a larger $H^{-1}$ eigenvalue.  These have low energy, are consequently often highly excited, and generally dominate the traffic conditions that we observe.