Support Vector Data Description in Rainstorm Prediction of the Northwest China
Yan Dongwei,Sun Tianwen,Yang Yan,Fang Jiangang,Liu Zhijing
DOI: https://doi.org/10.3969/j.issn.1001-7313.2007.05.012
2007-01-01
Abstract:The expert system(ES) has been studied and applied in meteorological field widely.ES depends on knowledge engineers to enter knowledge used in inferring by computer,which is toilsome and error-prone work.As another branch of artificial intelligence(AI),machine learning aims at solving the knowledge obtaining problem automatically and paving a path to remedy the shortcoming of ES.But machine learning still does not work well if it is not tailored to fit characteristics of weather foresting,among which imbalanced class is an important problem deserving study.Although it is usually assumed implicitly by the machine learning research community that the classes are well-balanced,there exist many domains for which one class is represented by a large number of examples while the other is represented by only a few,and there are many applications demanding to classify important but rare positive examples(minority).It is a typical example of learning from imbalanced training set to predict such disaster weathers as hail and rainstorm in meteorology.Though they are small probability events,those disastrous weathers will bring about serious destruction.Thus disastrous weathers' prediction has been paid much more attention by meteorologist than normal weather prediction.Normally,the number of examples belonging to normal weather is much more than disaster ones.Aiming at improving the accuracy,trivial classifier that labels every example with majority when faced with imbalanced class distribution will be lead to by traditional machine learning algorithms.By doing so,high accuracy would be obtained.Imbalanced class is a stumbling block stymieing practical attempts to apply machine learning to realistic problem.In order to find algorithms being resistant to imbalanced class distribution,threat score(TS) is used as criterion to evaluate classifiers.As a kernel method,SVM fails to deal with imbalanced class problem too although based on statistical learning theory,and working well in many applications.SVM will incline to the majority class(corresponding to normal weather),and lose very important disaster weather.Support vector data description(SVDD) is another import kernel method originated from SVM.By employing training examples of target set only,one class method is fit for imbalanced class problem.As one class method,SVDD tries to obtain characteristics of target class,and is resistant to class imbalanced problem.The comparative study of SVDD and SVM is conducted to predict rainstorm in Tongchuan City,Shaanxi Province.The experiment shows that SVM is prone to majority class evidently,and brings about many false negative.When normal weather class is select as target,TS of SVDD' is prior to SVM.The result fits the theory analysis on SVDD and SVM.Results show that SVDD is a better choice than such traditional methods as SVM when dealing with imbalanced class problem,better performance could be obtained if the class with more examples is chosen as target class.