Between 2009 and 2020, regulators in the US and Europe are estimated to impose fines of over $400 billion on banks and other financial institutions. This enormous number underlines the importance of regulatory compliance in today’s financial industry. Current (and upcoming) regulations require ever-increasing amounts of data to accommodate the complexity of the modern banking system. While this data might not necessarily be "new" (as it's always been there), applying "big data" approaches to capture and analyze it is now an essential function to ensure regulations are adhered to, assets are protected against fraud, and customers can be serviced in ways that were not possible before.
Solutions based on Hadoop (such as HortonWorks) are often employed to capture and analyze this financial/regulatory information. However, when the velocity of the data is very large (measured in the 10's of billions of records/day, for example), this approach can result in servers sprawling well beyond what was initially planned. This can dramatically increase power usage, cooling, rack space, licensing, and management costs.
Initially, the advantage of Hadoop was the capability to utilize low-cost servers, breaking the data apart and spreading it across an almost unlimited number of nodes for redundancy and parallel processing. While this approach works well for data with moderate ingestion rates, when there is a “river” of data flowing into the data lake, the approach of using inexpensive, low-speed, high-capacity nodes breaks the economic advantage that Hadoop was supposed to offer in the first place. This can make the cost of meeting regulations burdensome for even the largest banks, to say nothing of the impact to smaller institutions.
Big data analytics are no doubt more fiscally beneficial in the long run versus the potential large fees and fines for errors and oversights that run afoul of regulators. The key is to base the Hadoop solution on a platform that breaks free from the restrictions that traditional Hadoop approaches are encumbered by, drastically reducing the number of nodes required. The FabricXpress™ (FX) server platform from Axellio is a high density, all-NVMe SSD based appliance designed to break beyond the performance limitations of traditional server architectures. With a high-speed internal data fabric, FX can support internal data rates of up to 60GB/sec, with up to 88x CPU cores and 2TB of system RAM. When configured with Hadoop, it becomes a platform for ingesting high-velocity data with a fraction of the servers typically required for an equal-performing Hadoop installation.
Axellio is working with a major financial institution on testing this configuration in the Proof of Concept Lab in Colorado Springs, CO - and the results have been phenomenal! While we cannot comment on specific testing designed by this institution, we were able to run several generic Hadoop benchmarks that are included with the HortonWorks distribution of Hadoop. When compared with a typical reference deployment from our integration partner, ViON, the Axellio-based Hadoop solution achieved the following results:
From these results, it's clear that a Hadoop solution based on Axellio can achieve performance levels that are orders of magnitude better than a solution which utilizes the traditional server approach. For this particular Hadoop use case, the ability to ingest and analyze high-velocity data into the Hadoop data lake is the key to drastically reducing the number of servers needed to satisfy regulatory requirements.
Meeting regulatory compliance in our modern age of data is only going to become more complex and expensive with traditional server approaches. Initially, Hadoop solutions were deployed to enable searches over a large quantity of data for the creation of compliance reports, perform regulatory stress tests, and detect fraud. However, as regulations have evolved, simply searching through relatively static data is no longer enough. Modern financial institutions have rivers of data that must flow into the data lake while being able to perform predictive analytics for not only compliance reasons but cash management, gaining a 360 degree view of the customer and trade analytics. Hadoop can be a powerful solution to the big-and-fast data problem, but it can be stifled by traditional server architectures. With the FabricXpress as the foundation of the solution, organizations can dramatically decrease the costs and equipment needed to perform these essential operations.