Abstract
Data normalization is one of the most common processing methods applied to raw data before its subsequent use in data mining algorithms, classification, or clustering methods. Many procedures, particularly those that use any statistical analysis, require that data be normalized in one way or another. In the case of time series a standard method of processing raw data is z-normalization of each time series instance in the data set. For multivariate (multidimensional) time series we z-normalize each dimension (variable) individually. Although normalization brings a lot of advantages, it is easy to find examples of data sets where normalization destroys information contained in the raw data. In this paper we demonstrate, that for multivariate time series (MTS) both raw and normalized components give some information about the data and the best way of mining it is a combination of them. We focus here on multidimensional time series and their classification using the nearest neighbor method with the dynamic time warping (DTW) distance measure. We construct a parametric distance measure that is a combination of DTW on raw and z-normalized time series data. It turns out that the combined distance measure carries more information about the data than the two distance components separately. By determining an individual parameter for each data set it is possible to obtain a lower classification error than the errors of both component distance measures. We perform experiments on real data sets from many fields of science and technology. The advantage of the combined approach is confirmed by graphical and statistical comparisons.
Keywords
Get full access to this article
View all access options for this article.
