Sage Journals: Discover world-class research

Abstract

Data normalization is one of the most common processing methods applied to raw data before its subsequent use in data mining algorithms, classification, or clustering methods. Many procedures, particularly those that use any statistical analysis, require that data be normalized in one way or another. In the case of time series a standard method of processing raw data is z-normalization of each time series instance in the data set. For multivariate (multidimensional) time series we z-normalize each dimension (variable) individually. Although normalization brings a lot of advantages, it is easy to find examples of data sets where normalization destroys information contained in the raw data. In this paper we demonstrate, that for multivariate time series (MTS) both raw and normalized components give some information about the data and the best way of mining it is a combination of them. We focus here on multidimensional time series and their classification using the nearest neighbor method with the dynamic time warping (DTW) distance measure. We construct a parametric distance measure that is a combination of DTW on raw and z-normalized time series data. It turns out that the combined distance measure carries more information about the data than the two distance components separately. By determining an individual parameter for each data set it is possible to obtain a lower classification error than the errors of both component distance measures. We perform experiments on real data sets from many fields of science and technology. The advantage of the combined approach is confirmed by graphical and statistical comparisons.

Keywords

Multivariate time series classification dynamic time warping parametric distance measure combining raw and normalized data

Get full access to this article

View all access options for this article.

References

Aghabozorgi

, Shirkhorshidi

A.S.

and Wah

T.Y.

, Time-seriesclustering—A decade review, Information Systems 53 (2015), 16–38.

Auckenthaler

, Carey

and Lloyd-Thomas

, Score normalization for text-independent speaker verification systems, Digital Signal Processing 10(1–3) (2000), 42–54.

Bache

and Lichman

, UCI Machine Learning Repository [http://archive.ics.uci.edu/ml], University of California, School of Information and Computer Science, Irvine, CA, 2013.

Berndt

D.J.

and Clifford

, Using dynamic time warping to find patterns in time series, AAAI Workshop on Knowledge Discovery in Databases (1994), 229–248.

Blankertz

, Curio

and Müller

K.R.

, Classifying single trial EEG: Towards brain computer interfacing, In: Diettrich

T.G.

, Becker

, Ghahramani

(Eds.), Advances in Neural Inf Proc Systems, 14 (NIS 01). Available from, 2002. http://www.bbci.de/competition/ii/

Bolstad

B.M.

, Irizarry

R.A.

, Astrand

and Speed

T.P.

, A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bioinformatics 19(2) (2003), 185–193.

Box

G.E.P.

, Jenkins

G.M.

and Reinsel

G.C.

, Time series analysis: Forecasting and control, Wiley, 2008.

Carnegie Mellon University Motion Capture Database (2014). Available from: http://mocap.cs.cmu.edu/.

Demšar

, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7 (2006), 1–30.

10.

Ertuğrul

Ö.F.

and Tağluk

M.E.

, A novel version of k nearest neighbor: Dependent nearest neighbor, Applied Soft Computing 55 (2017), 480–490.

11.

Górecki

and Łuczak

, Using derivatives in time series classification, Data Mining and Knowledge Discovery 26(2) (2013), 310–331.

12.

Górecki

and Łuczak

, First and second derivative in time series classification using DTW, Communications in Statistics-Simulation and Computation 43(9) (2014a), 2081–2092.

13.

Górecki

and Łuczak

, Non-isometric transforms in time series classification using DTW, Knowledge-Based Systems 61 (2014b), 98–108.

14.

Górecki

and Łuczak

, Multivariate time series classification with parametric derivative dynamic time warping, Expert Systems with Applications 42(5) (2015), 2305–2312.

15.

Han

and Kamber

, Data Mining: Concepts and Techniques, Morgan Kaufmann, USA, 2001.

16.

Keogh

, Exact indexing of dynamic time warping, In 28th International Conference on Very Large Data Bases, 2002, pp. 406–417.

17.

Keogh

and Kasetty

, On the need for time series data mining benchmarks: A survey and empirical demonstration, Data Mining and Knowledge Discovery 4(7) (2003), 349–371.

18.

Larose

D.T.

and Larose

C.D.

, Discovering Knowledge in Data: An Introduction to Data Mining, Wiley, 2014.

19.

Leeb

, Lee

, Keinrath

, Scherer

, Bischof

and Pfurtscheller

, Brain-computer communication: Motivation, aim, and impact of exploring a virtual apartment, IEEE Transactions on Neural Systems and Rehabilitation Engineering 15 (2007), 473–482. Available from: http://www.bbci.de/competition/iv/

20.

Lemire

, Faster retrieval with a two-pass dynamictime-warping lower bound, Pattern Recognition 42(9) (2009), 2169–2180.

21.

Łuczak

, Hierarchical clustering of time series data with parametric derivative dynamic time warping, Expert Systems with Applications 62 (2016), 116–130.

22.

Łuczak

, Univariate and multivariate time series classification with parametric integral dynamic time warping, Journal of Intelligent and Fuzzy Systems 33(4) (2017), 2403–2413.

23.

Merigó

J.M.

, Palacios-Marqués

and Soto-Acosta

, Distance measures, weighted averages, OWA operators and Bonferroni means, Applied Soft Computing 50 (2017), 356–366.

24.

Morrison

D.F.

, Multivariate statistical methods, McGraw-Hill, 1990.

25.

Olszewski

R.T.

, Generalized Feature Extraction for Structural Pattern Recognition in Time-Series Data, Ph.D. Thesis. Carnegie Mellon University, Pittsburgh, 2001. Available from: http://www.cs.cmu.edu/ bobski

26.

Seber

G.A.F.

, Multivariate Observations, Wiley, 1984.

27.

Sola

and Sevilla

, Importance of input data normalization for the application of neural networks to complex industrial problems, IEEE Transactions on Nuclear Science 44(3), 1464–1468.

28.

Warrenliao

, Clustering of time series data—a survey, Pattern Recognit 38(11) (2005), 1857–1874.

29.

Zhai

, Xu

and Wang

, Dynamic ensemble extreme learning machine based on sample entropy, Soft Computing 16(9) (2012), 1493–1502.

30.

Zhai

, Wang

and Pang

, Voting-based instance selection from large data sets with mapreduce and random weight networks, Information Sciences 367 (2016), 1066–1077.

31.

Zhai

, Zhang

and Wang

, The classification of imbalanced large data sets based on mapreduce and ensemble of ELM classifiers, Journal of Machine Learning and Cybernetics 8(3) (2017), 1009–1017.

Combining raw and normalized data in multivariate time series classification with dynamic time warping

Abstract

Keywords

Get full access to this article

References