A PCA-based frame selection method for applying CNN and LSTM to classify postural behaviour in sows
Meiqing Wang,Maciek oczak,Mona Larsen,Florian Bayer,Kristina Maschat,Johannes Baumgartner,Jean-Loup Rault,Tomas Norton
DOI: https://doi.org/10.1016/j.compag.2021.106351
IF: 8.3
2021-10-01
Computers and Electronics in Agriculture
Abstract:Posture and the rate of postural changes of farrowing and lactating sows are considered reliable indicators of environmental comfort and health status and are risk factors for piglet crushing. The objective of this study was to develop a combined deep learning and principle component analysis (PCA) based approach to classify different postural behaviours of sows in videos. Compared to previous studies of sow's postural behaviour classification based on deep learning, this study selects sequences of frames from the videos that distinguish different postural behaviours rather than using all frames for the classification. Videos were collected from 13 sows, and the recording started from 5 days before the expected date of farrowing until weaning. From the videos, 3100 videos without piglets and 1680 including piglets were manually selected. Then, these videos were augmented by using vertical mirroring and adding Gaussian noise, which resulted in 7200 and 4600 videos without and including piglets, respectively. Each video lasted 5 sec and included 1 out of 5 behavioural postures (sternal lying, lateral lying, sitting, standing, walking) labelled by one trained expert with extensive experience in sow's behaviour classification. Out of the total of 11,800 videos, 75% were randomly allocated as training set and the remaining 25% as validation set. To select motion-related frames, each video was first converted into a multidimensional matrix. Then, PCA was performed on the matrix and a number of component(s) were selected to represent the frame. After that, the frame Euclidean distances were computed based on the components and the frames over a certain distance threshold were selected to generate new videos. Since a different number of components and distance thresholds can affect the number of selected frames, a range of component numbers (1, 2, 3, 5, 10, 20, 50) and distance thresholds were further tested to find the optimal parameters. The best balance between accuracy and performance of the classification was obtained when using 10 components (87.98% of total variation). The best results were obtained when the threshold was set as one fourth of the largest distance between two successive frames. To classify different behaviours, the videos composed of the selected frames were trained and validated with convolutional neural network (CNN) and a long short-term memory (LSTM) models. Using the proposed method, postural behaviours could be classified with accuracies of 95.33% and 92.67% on videos without piglets and all data (including and not including piglets). Furthermore, 500 new videos were selected from the experiment and were used as test set. The final model was further tested on the test set and returned an accuracy of 90.60%, which indicated that the proposed method can be generalized on new data.
agriculture, multidisciplinary,computer science, interdisciplinary applications