Spectral fit residuals as an indicator to increase model complexity

Anshuman Acharya,Vinay L. Kashyap
DOI: https://doi.org/10.3847/2515-5172/ad18b5
2024-01-12
Abstract:Spectral fitting of X-ray data usually involves minimizing statistics like the chi-square and the Cash statistic. Here we discuss their limitations and introduce two measures based on the cumulative sum (CuSum) of model residuals to evaluate whether model complexity could be increased: the percentage of bins exceeding a nominal threshold in a CuSum array (pct$_{CuSum}$), and the excess area under the CuSum compared to the nominal (p$_\textit{area}$). We demonstrate their use with an application to a $\textit{Chandra}$ ACIS spectral fit.
Instrumentation and Methods for Astrophysics,High Energy Astrophysical Phenomena
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the spectral fitting of X - ray data, traditional statistical methods (such as χ² and Cash statistics) have limitations and cannot fully identify the structured deviations in model residuals. Therefore, the author proposes two new metrics based on Cumulative Sum (CuSum) - pct CuSum and p area - to evaluate whether it is necessary to increase the model complexity. ### Specific background of the problem 1. **Limitations of traditional methods**: - X - ray spectral fitting is usually achieved by minimizing statistics (such as χ² or Cash statistics). - These statistics can evaluate the overall goodness of fit, but they are global metrics and cannot capture the relevant residual deviations in a continuous range in the data space. - When the fitting statistic or F - test value is lower than the threshold defined by the null distribution, it is generally considered that further increasing the model complexity (i.e., increasing the number of free parameters) is not statistically supported. 2. **Introduction of new methods**: - To make up for the deficiencies of traditional methods, the author introduces two new metric indicators: pct CuSum and p area. - pct CuSum measures the binary percentage in the cumulative sum array that exceeds the nominal threshold. - p area measures the excess area between the cumulative sum array and the nominal distribution. ### The role of new methods - **pct CuSum**: It is used to detect deviations in the model continuum. If more than 10% of the binaries are outside the 90% confidence interval, the current model is considered insufficient to explain the data, and a more complex model is recommended; if it is much less than 10%, it may be over - fitting, and a simpler model is recommended. - **p area**: It is used to detect the presence or absence of narrow lines. If p area is significantly less than 0.05, it indicates that there are large deviations in the cumulative sum and the model is inappropriate. ### Application examples The author uses the observational data of the stellar corona of HD 179949 by Chandra/ACIS - S for demonstration and shows the fitting results of models with different complexities: - The simplest single - temperature APEC model (1m): pct CuSum = 40.6%, p - value = 0.0, indicating that the model is inappropriate. - The medium - complexity two - temperature model (2m): pct CuSum = 34.2%, p - value = 0.0. Although Δcstat is acceptable, the CuSum metric is still inappropriate. - The most complex two - temperature model (2v), considering the abundance variation of the first ionization potential group: pct CuSum = 13.5%, p - value = 0.09, and all metrics indicate that the model is appropriate. ### Summary By introducing pct CuSum and p area, the author provides a new method to evaluate the quality of X - ray spectral fitting and determine whether it is necessary to increase the model complexity. This method can not only identify global deviations but also capture local structures, thereby avoiding over - fitting or under - fitting problems.