SDHDF: A new file format for spectral-domain radio astronomy data

L. J. Toomey,G. Hobbs,D. C. Price,J. R. Dawson,T. Wenger,D. Lagoy,L. Staveley-Smith,J. A. Green,E. Carretti,A. Hafner,M. Huynh,J. Kaczmarek,S. Mader,V. McIntyre,J. Reynolds,T. Robishaw,J. Sarkissian,A. Thompson,C. Tremblay,A. Zic
DOI: https://doi.org/10.1016/j.ascom.2024.100804
2024-02-28
Abstract:Radio astronomy file formats are now required to store wide frequency bandwidths and multiple simultaneous receiver beams and must be able to account for versatile observing modes and numerous calibration strategies. The need to capture and archive high-time and high frequency-resolution data, along with the comprehensive metadata that fully describe the data, implies that a new data format and new processing software are required. This requirement is suited to a well-defined, hierarchically-structured and flexible file format. In this paper we present the Spectral-Domain Hierarchical Data Format (`SDHDF') -- a new file format for radio astronomy data, in particular for single dish or beam-formed data streams. Since 2018, SDHDF has been the primary format for data products from the spectral-line and continuum observing modes at Murriyang, the CSIRO Parkes 64-m radio telescope, and we demonstrate that this data format can also be used to store observations of pulsars and fast radio bursts.
Instrumentation and Methods for Astrophysics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the current radio astronomy data file formats are unable to effectively handle the data of high - bandwidth, multi - beam, and complex observation modes produced by modern radio telescopes. Specifically, with the advancement of receiver technologies in modern radio telescopes (such as the CSIRO Parkes 64 - meter radio telescope), the amount of output data has increased significantly, the instantaneous bandwidth has become wider, and the observation modes have become more diversified. Traditional file formats (such as SDFITS) can no longer meet these new requirements in design, mainly reflected in the following aspects: 1. **High - time - resolution and high - frequency - resolution data**: Modern radio telescopes need to capture and archive data with high - time - resolution and high - frequency - resolution. 2. **Multi - beam and wide - band support**: Many observation devices can generate multiple - beam and wide - band data simultaneously, and traditional formats are difficult to effectively manage and store these data. 3. **Comprehensive metadata description**: In order to ensure the integrity and interpretability of data, detailed metadata is required to describe observation conditions, calibration strategies, and other information. 4. **Flexible observation modes**: Different observation modes (such as scanning, fixed - point observation, pulsar observation, etc.) require different data structures and processing methods. To solve these problems, the author proposes a new file format - **Spectral - Domain Hierarchical Data Format (SDHDF)**. SDHDF is built based on Hierarchical Data Format (HDF5) and aims to provide a flexible, efficient, and highly self - descriptive file format to meet the needs of modern radio astronomy. ### Main improvement points: - **Flexibility and extensibility**: SDHDF can handle multiple observation modes and complex calibration strategies, support data of single or multiple beams, and can be flexibly extended through a hierarchical structure. - **High - time and - frequency resolution**: It can store data with high - time - resolution and high - frequency - resolution, which is suitable for various observation requirements. - **Comprehensive metadata support**: All data and metadata are self - described through HDF attributes, ensuring the integrity and interpretability of data. - **Distributed computing support**: It is suitable for processing large - scale data products at the TB level and can work efficiently in a distributed computing environment. - **Long - term archiving ability**: The file has good compressibility, clear metadata description, supports long - term archiving, and has been assigned a Digital Object Identifier (DOI) to ensure the accessibility during the data life cycle. By introducing SDHDF, researchers can manage and analyze complex data sets from modern radio telescopes more effectively, thus promoting the progress of radio astronomy research.