Case Studies :: Machine Learning

Detecting invalid data in financial time-series

A financial client of ours stores over 2,000,000 time series data sets that represent the points on market data indices. Points in the data series had been imported on a daily basis over a number of years. The vast majority of the data was perfectly valid but occasionally there would be an invalid data point. For several years these data points had been corrected by users manually identifying the anomalies when they were discovered. This process was too time consuming and manually intensive, resulting in wasted man hours not only in detecting the points but also in investigating pricing variations that were being caused by the market data anomalies.

We were tasked with investigating the use of Machine Learning and Data Science to resolve the problem of identifying the anomalies. This wasn't a straight-forward anomaly detection issue as the natural variation of the indices was incredibly varied. What was a clear indication of an outlier in some indices, such as steep overnight variations in gradient, were normal behaviour in others.

Through a process of developing Random Forests, Isolation Forests and Clustering algorithms allowed us to develop an Ensemble which had an over 90% level of accuracy in detecting anomalous points. The result of processing the Ensemble against the clients data identified the anomalous data points and dramatically reduced the effort required to investigate overnight variations in the PnL of Trading Desks.

Call or contact now for a free, no obligation Consultation

We have a team of skilled and experienced Data Scientists and Machine Learning experts in AI that are ready to assist you with the consulting and development of your ML projects. Whether it is a matter of using us to develop your MVPs or for the whole of your Data Science requirements then we can help.