Wavelet transform in similarity paradigm
[INS-R9802] Searching for similarity in time series finds still broader applications in data mining. However, due to the very broad spectrum of data involved, there is no possibility of defining one single notion of similarity suitable to serve all applications. We present a powerful framework based on wavelet decomposition, which allows designing and implementing a variety of criteria for the evaluation of similarity between time series. As an example, two main classes of similarity measures are considered. One is the global, statistical similarity which uses the wavelet transform derived Hurst exponent to classify time series according to their global scaling properties. The second measure estimates similarity locally using the scale-position bifurcation representation derived from the wavelet transform modulus maxima representation of the time series. A variety of generic or custom designed matching criteria can be incorporated into the detail similarity measure. We demonstrate the ability of the technique to deal with the presence of scaling, translation and polynomial bias and we also test sensitivity to the addition of random noise. Other criteria can be designed and this flexibility can be built into the data mining system to allow for specific user requirements.#[INS-R9815] For the majority of data mining applications, there are no models of data which would facilitate the tasks of comparing records of time series, thus leaving one with `noise' as the only description. We propose a generic approach to comparing noise time series using the largest deviations from consistent statistical behaviour. For this purpose we use a powerful framework based on wavelet decomposition, which allows filtering polynomial bias, while capturing the essential singular behaviour. In particular we are able to reveal scale-wise ranking of singular events including their scale-free characteristic: the Hölder exponent. We use such characteristics to design a compact representation of the time series suitable for direct comparison, e.g. evaluation of the correlation product. We demonstrate that the distance between such representations closely corresponds to the subjective feeling of similarity between the time series. In order to test the validity of subjective criteria, we test the records of currency exchanges, finding convincing levels of (local) correlation.
|MODELS AND PRINCIPLES (acm H.1), PATTERN RECOGNITION (acm I.5), MISCELLANEOUS (acm J.m), PHYSICAL SCIENCES AND ENGINEERING (acm J.2), DATA STORAGE REPRESENTATIONS (acm E.2)|
|Fractals (msc 28A80), Probabilistic methods, simulation and stochastic differential equations (msc 65Cxx), Stochastic differential and integral equations (msc 65C30), Computational Markov chains (msc 65C40), Other computational problems in probability (msc 65C50), Computational problems in statistics (msc 65C60), Pattern recognition, speech recognition (msc 68T10), Searching and sorting (msc 68P10)|
|Information (theme 2)|
|Information Systems [INS]|
Struzik, Z.R, & Siebes, A.P.J.M. (1998). Wavelet transform in similarity paradigm. Information Systems [INS]. CWI.