Accelerating Financial Market Server Through Hybrid List Design (abstract Only)
Haohuan Fu,Conghui He,Huabin Ruan,Itay Greenspon,Wayne Luk,Yongkang Zheng,Junfeng Liao,Qing Zhang,Guangwen Yang
DOI: https://doi.org/10.1145/3020078.3021775
2017-01-01
Abstract:The financial market server in exchanges aims to maintain the order books and provide real time market data feeds to traders. Low-latency processing is in a great demand in financial trading. Although software solutions provide the flexibility to express algorithms in high-level programming models and to recompile quickly, it is becoming increasingly uncompetitive due to the long and unpredictable response time. Nowadays, Field Programmable Gate Arrays (FPGAs) have been proved to be an established technology for achieving a low and constant latency for processing streaming packets in a hardware accelerated way. However, maintaining order books on FPGAs involves organizing packets into GBs of structural data information as well as complicated routines (sort, insertion, deletion, etc.), which is extremely challenging to FPGA designs in both design methodology and memory volume. Thus existing FPGA designs often leave the post-processing part to the CPUs. However, it largely cancels the latency gain of the network packet processing part. This paper proposes a CPU-FPGA hybrid list design to accelerate financial market servers that achieve microsecond-level latencies. This paper mainly includes four contributions. First, we design a CPU-FPGA hybrid list with two levels, a small cache list on the FPGA and a large master list at the CPU host. Both lists are sorted with different sorting schemes, where the bitonic sort is applied to the cache list while a balanced tree is used to maintain the master list. Second, in order to effectively update the hybrid sorted list, we derive a complete set of low-latency routines, including insertion, deletion, selection, sorting, etc., providing a low latency at the scale of a few cycles. Third, we propose a non-blocking on-demand synchronization strategy for the cache list and the master list to communicate with each other. Lastly, we integrate the hybrid list as well as other components, such as packets splitting, parsing, processing, etc. to form an industry-level financial market server. Our design is applied in the environment of the China Financial Futures Exchange (CFFEX), demonstrating its functionality and stability by running 600+ hours with hundreds of millions packets per day. Compared with the existing CPU-based solution in CFFEX, our system is able to support identical functionalities while significantly reducing the latency from 100+ microseconds to 2 microseconds, gaining a speedup of 50x.