The Dow Chemical Company
Proposed Project for the
MSU Industrial Math Students
Variable Selection in Time Series Modeling Projects with Large Numbers of Leading Economic Indicators
We have found that certain public indices are good predictors of future economic
activity, measured in certain ways. These indices usually are aggregates of many
different economic time series data, such as unemployment, economic growth, etc.
The number of these time series can run into the hundreds or even thousands and
can be reported on the basis of many different time scales (weeks, months, etc.).
We would like to analyze these underlying economic time series to see which of these
time series are the most relevant for predicting future economic activity. This
presents a particular problem relating to Time Series "Data Mining" that is not
particularly rich in methodologies in the open literature. Two particular approaches
are of interest, one unsupervised and one supervised. The unsupervised approach
to be examined is Variable "Reduction," which involves such methods as Similarity
(Leonard, Lee (2008)) and potentially traditional PCA and Cluster Analysis (VARCLUS
SAS Institute (2008)). The supervised approach to be examined is Variable "Selection,"
which involves such methods as Similarity or traditional variable selection methods
via non-time series data mining best practices (SAS EM, SAS Institute (2008)). The
traditional Data Mining approach is very time consuming given the "poor man's" approach
to modeling time series data has to be adapted. That is, first differences are taken
on all X's, then, a multitude of lags are taken on the X's then the traditional
Data Mining Variable selection approaches are applied. Finding the most appropriate
approaches for Time Series variable reduction and then variable selection is the
key deliverable requested herein. Various large time series data sets will be provided
as test cases.
Top of Page