Study on Metabonomic Data Parallel Processing Based on LC/MS

Hai-tao SUN,Zhi-qiang YANG,Bao-hong LI,De-zhan CHEN
DOI: https://doi.org/10.7538/zpxb.2015.36.06.0535
2015-01-01
Abstract:Metabonomics is a new research field of life science after genomics and proteomics.It explores the relationship between metabolites of a creature and the pathological changes.LC/MS is an important analytical technology in the determination of metabolites,and has been widely used in disease diagnosis,pharmaceutical analysis as well as other aspect of metabonomics.With the wide application of this technology,amount of raw data was formed quickly.Currently,MZmine is one of the leading software environments that provides a full analysis pipeline for these data.However,ittakes a long time to process the data due to the performance of traditional serial computational method meet with the problem of physical extreme limit.Therefore,a method of faster data processing and finding useful information from these massive data timely is significant.In this paper,a new parallel data pre-processing method based on data parallel was proposed,which increases the speed of data processing.Raw data was grouped and parallel processed in different computing nodes which have been installed with MZmine.Because the complexity of data process is closely related to data grouping mode,the experiments show that the simple time grouping mode is unstable.So,a new parallel peak pre-identification method,named peak grouping mode,was proposed to quickly identify peaks and group data.The results show the speedup rate was 2.87 for time grouping mode and 4.55 for peak grouping mode when process the raw samples of27 mice serum with 5nodes.More data and computing nodes test indicate that the speed of new parallel data processing method is faster than one of the serial computational method,and that the speedup rate of peak grouping mode tended to a linear one.In addition,the peak grouping mode is more efficient and stable than the time grouping mode in the parallel computing load balancing.
What problem does this paper attempt to address?