How can AI help in the fight against financial crime?
Financial institutions face a huge challenge fighting against financial crime, while criminal methods are becoming more and more sophisticated. So far, institutions rely on tailor-made, hand-crafted rules to flag suspicious transactions, actors, or activities. As there is a vast amount of complex financial data, AI can be considered as a crucial tool to automatically learn from such enormous datasets, how to detect anomalous patterns, that would help identify illegally acting parties.
EURECOM’s professor Pietro Michiardi, expert in machine learning (ML) and computational statistics, joins forces with Oracle Corporation, in order to develop AI methods for anomaly detection in the context of fighting financial crime.
Q. What is the context and motivation of this project?
PM. The general context of the problem is anomaly detection, which is a very old topic, on which we have been working for a long time. We can apply our tools on various domains and one of those is financial crime. When we were approached by Oracle the question was whether there is a different way to approach financial crime. In fact, the current method is to define rules manually by financial crime investigator experts. We are working on AI and ML tools that can learn from the data and possibly get rid of manually written rules, which would help scaling to very large anomaly detection tasks. This is very important since the amount of data produced by these transactions grows exponentially. In this context, we obtained a research grant from Oracle, in order to come up with AI methods to fight financial crime that would be evaluated by them on real-life, confidential data.
Financial activities are modelled through graph theory, where nodes would be individuals or companies and edges would represent their transactions. Particularly, looking inside a dataset that mostly contains normal transactions, unusual activity would give rise to distinct patterns, and be classified as anomalies. For this application, data are very complex, structured and, most importantly, confidential. Researchers have to work with equivalent synthetic data to test their hypotheses, adding an extra challenge to their mission. Following the suggestions from Oracle, EURECOM’s researchers focused their work on anomaly detection methods for graph nodes, identifying fraudulent companies or individuals.
Q. That is a very interesting approach. Could you elaborate more on your method?
PM. In more detail, we developed a new methodology for anomaly detection. The input to our statistical model is very complicated data that are atypical for ML. There is an exponential interest in graph neural networks and we use them as a way to transform complicated data into latent representations. We extract the factors that determine the shape of the data. So, the first step is to project these data into a different space, which allows us to work more easily. Using this embedding space, we can model the density of the data in a completely new way. If you need to detect anomalies, you have one-class classification problem; either there is an anomaly either not. Basically, you put all your normal data in a sphere and all anomalies are outside this sphere. But things in practice are not so simple, especially where we have complicated data like the graph data; it is hard to put things inside a single sphere.
The solution is to model the latent space with a more flexible distribution, allowing us to follow more closely the density of the data. The principle is the following: for high-density regions in this weird multidimensional space we have normal data and for low-density regions we have anomaly data.
Q. What are the challenges you had to overcome in this project?
PM. One issue is that real life data could not be shared due to confidentiality issues, therefore we have to work with synthetic data or find problems in completely different domains that look very similar. For example, we have worked with datasets coming from the e-commerce field. In particular, we used data from Amazon, where there are reviews for items sold on their website. Our task is to detect fake reviews (anomalies) or to detect rogue users (spammers). The setup is very similar to financial transactions, so we can build graphs and find anomalies on graphs. That is a limitation, since we cannot be 100% sure that the method we develop has the same efficacy on real, financial data.
Also, the literature is somehow limited, and produced some biased results. For example, we discovered flaws in papers regarding their experimental protocols, were anomalies and normal data would be “too easy” to separate. In our work, we seek patterns that emerge from the data and these patterns can involve the structure of the graph. For example, there are some nodes with many connections and those nodes could be the anomalies. In a nutshell, our work is about data driven anomaly detection.
Q. What is the message you would like to share?
PM. From the scientific point of view, this work is a nice testimony of critical thinking. Particularly, even if hundreds of papers in the literature point to one direction, it doesn’t mean that this direction is the only one to solve a problem, especially in ML. Yet another proof that we have to be careful, not only in the more theoretical aspects but also on the experimental protocols that are used. By starting with the scientific approach trying to replicate some work in the beginning, we managed to understand that there were some gray areas in experimental protocols. This allowed us to discover important problems that needed to be solved. It is clear that the scientific mentality brings you much further.
This work is also an important testimony on the usefulness of AI in practical environments. People at Oracle were very satisfied by the results and explanations we provided and plan to use this software in their own products. The success of an approach is based on solid scientific foundations and the possibility for a practical use of our work is very important to us.
by Dora Matzakou for EURECOM