Topological data analysis finance

12 Pages Posted: 14 Jul 2021

Date Written: July 1, 2021

Traditional dimensionality reduction methods such as principal component analysis [PCA] and multi-dimensional scaling [MDS], will lead to valuable data losses. The topological data analysis [TDA], however, employed in this paper can deal with multi-dimensional data without losses. It emphasizes the strong robustness to the noise disturbance of the data. We discuss the volatility characteristics of daily returns of major stock indexes in the United States in the 2008 global financial crisis and China in the 2008 and 2015 crashes. We choose 50 trading days as the sliding window and then calculate the L1-norm of "persistent landscape" by TDA method to predict the index collapse. We show that before the financial crisis in 2008 and 2015, the L1-norm of the relevant index increases significantly. And the maximum value of L1-norm emerges more than 1 year before the market collapses. This method has an effective early warning indicator for financial crashes.

Keywords: Topological Data Analysis, Financial Crashes, Persistent Homology, Persistent Landscape, Warning Indicator 1. Introduction

Request full-text PDF

To read the full-text of this research, you can request a copy directly from the authors.

Request full-text PDF

To read the full-text of this research, you can request a copy directly from the authors.

Swiss researchers at the Ecole Polytechnique fédérale de Lausanne [EPFL] predicted market crashes using topological data analysis [TDA]. Post Doc Guillaume Tauzin with fellow researchers introduced giotto-tda, a Python library that integrates topological data analysis with machine learning. Using giotto-tda on the runups to the stock-market crashes in 2000 and 2008, the model warned of the danger soon to come.

“Conventional forecasting models contain so much noise and give so many signals that something is about to go awry, that you don’t really know which signals to follow,” describes Chief Scientist Matteo Caorsi. “TDA is a more robust method for making sense of volatile movements,” claims the team.

Topology and Machine Learning

Wikipedia explains, “a topology tells how elements of a set relate spatially to each other.” According to Tauzin et al., “Topological data analysis [TDA] uses tools from algebraic and combinatorial topology to extract features that capture the shape of data.” Applying shape to markets is not new. Mathematician Benoit Mandelbrot recognized markets have fractal properties in the 1960s. However, Mandelbrot did not apply machine learning to his model.

According to the team, giotto-tda “can be used to model just about any kind of data set, and the data contained in these sets feed the model’s machine-learning algorithm, improving the accuracy of its predictions and providing warning signs.” They continue, “Another benefit of TDA is that it’s resilient to noise, meaning the signals don’t get distorted by irrelevant information.”

Tauzin et al. reads, “TDA has remained outside the toolbox of most Machine Learning [ML] practitioners, largely because current implementations are developed for research purposes and not in high-level languages.” Their library is in Python and released under an open-source license on GitHub. The hope is more significant adoption of TDA with machine learning.

Versus Mandelbrots MMAR

According to Mandelbrot, “If prices take a big leap up or down now, there is a measurably greater likelihood that they will move just as violently the next day.” Likewise, subtle movements often lead to other small actions in the time that follows. However, like a rogue wave, the rhythm is sometimes broken by wild chaos. A phenomenon Mandelbrot describes as “roughness.”

“Real markets are wild. Their price fluctuations can be hair-raising-far greater and more damaging than the mild variations of orthodox finance,” describes Mandelbrot. As a result, he created the Multifractal Model of Asset Returns using time and Brownian motion.

Mandelbrot explains, “In this framework, the price of a financial asset is viewed as a multiscaling process with long memory and long tails.” In contrast, Tauzin’s team utilized “a novel approach based on the fact that when a system reaches a critical state, such as when water is about to solidify into ice, the data points representing the system begin to form shapes that change its overall structure.” Instead of a market simulation like MMAR, the TDA model serves as a warning for crashes. Something MMAR lacks.

Roughness

IBM, one of Mandelbrot’s employers, reveals, “there were rules and parameters to this roughness, but it was a form of geometry previously unidentified by the scientific community.” The difficulty of modeling volatility comes from mathematics preference for smooth forms like circles. Cut up a circle enough times, and it eventually becomes a straight line. Not so with rough shapes where each piece is a copy of the larger whole, a property known as self-similarity.

Self-similarity and long-range dependence are related to Mandelbrot’s model. Fitting to Mandelbrot’s observation, “the influence of long-range dependence in an otherwise random process-or, put another way, a long-term memory through which the past continues to influence the random fluctuations of the present.” This finding echos the warning signals found by Tauzin et al. just before a market crash where the sudden change in shape correlated to past anomalies.

TDA Internals

Frédéric Chazal et al., in “An introduction to Topological Data”, explains “TDA aims at providing well-founded mathematical, statistical and algorithmic methods to infer, analyze and exploit the complex topological and geometric structures underlying data that are often represented as point clouds.” In other words, TDA converts data structures into geometric shapes for analysis. Known as point clouds, these windows of time are the core parts of the model.

While Tauzin et al. do not explain how the market crash model works, Chazal et al. and the documentation for giotto-tda give clues. Chazal et al. reveals there are four steps to the typical TDA pipeline:

Step 1:

The first step includes creating equal size windows of the time series. This technique is called time-delay embedding, where each window represents a point cloud. The only requirement is that the number of windows must be less than or equal to the number of timestamps. For example, the image below maps 1-dimensional signals to coordinates in a 2-dimensional embedding space.

Animation by Sean M. Law

Research Scientist and creator of the STUMPY Python package, Sean M. Law, explains that the quantity [d-1]t is known as the “window size,” and the difference between ti+1 and ti is called the “stride,” where d is the embedding dimension, and t is the time delay.

Step 2:

A “continuous” shape built on top of the data highlights the underlying topology that reflects the data at different scales. Law explains, “The next step is to calculate the persistence diagrams associated with each point cloud. In giotto-tda we can do this with the Vietoris-Rips construction as follows:”

Step 3:

The topological information gets extracted using false nearest neighbor, ordinal partition network, or some other method to learn from the shape properties. For example, Audun Myers et al., in “Persistent Homology of Complex Networks for Dynamic State Detection”, describes, “a graph embedding of a periodic time series is long connected network loops, while a chaotic time series has many short loops.” Thus the shape of the loops determines if the series is periodic or chaotic.

Law explains, “The false nearest neighbours algorithm is based on the assumption that “unfolding” or embedding a deterministic system into successively higher dimensions is smooth. In other words, points which are close in one embedding dimension should be close in a higher one.” The idea is to distinguish the actual neighbors from the false by increasing the dimension. With the increasing dimension, the false neighbors will no longer be neighbors.

An ordinal partition network is similar to a Markov Chain with nodes and edges. The shape of the resulting graph and other properties can track changes in the dynamical behavior of the modeled system.

Step 4:

This step includes further analysis of the topological information from machine learning tasks or additional methods employed to gain new insight from the conclusions developed in Step 3. giotto-tda utilizes scikit-learn to include machine learning methods into TDA models.

Conclusion

TDA, like fractal analysis, is a universal methodology for geographic investigation. Mandelbrot discovered the shape of markets in the 1960s, and TDA applied to financial markets in 2021. Mandelbrot’s MMAR is for market simulation, and TDA is for anomaly detection. However, TDA using giotto-tda has a machine learning capability not present in MMAR.

The most potent model seems to be a combination of Mandelbrot’s MMAR with TDA. Precisely, how does Mandelbrot’s trading time correspond with the anomaly detection of Tauzin et al.? Currently, TDA is an active research field with many algorithms and efficient data structures available. Including ones with machine learning capabilities. Combining TDA with the proven MMAR model could make volatility forecasting a reality.

Video liên quan

Chủ Đề