1. Introduction
Observations of pulsars spanning from the 1990s to the present day by Murriyang,Footnote a the CSIRO Parkes 64-metre radio-telescope, are archived for long-term storage in the CSIRO’s Data Access PortalFootnote b (DAP), located in Canberra, Australia, and managed by CSIRO’s Information Management and Technology (IM&T) service. The archive is a historical record of snapshots of the sky as observed by Murriyang at frequencies in the radio band ranging from 0.4 to 24 GHz, over a time span of 34 yr.
At the time of writing, over 4.5 Petabytes of data from 450 unique Parkes pulsar-specific observation proposals (known as project identifiers, hereafter ‘PIDs’) are publicly available for immediate download (Table 1). Pulsar data ingested into DAP continues to grow at a steady rate (Figure 1), as Murriyang regularly enjoys upgrades of cutting-edge receiver technology.
So why collect pulsar data and what makes Murriyang so important as an instrument? While pulsars are of considerable astrophysical interest in their own right, they are also important astrophysical tools – searched for and subsequently monitored – and observations have been used to understand many aspects of the known universe, for example stellar evolution (Cameron et al. Reference Cameron2023), solar system dynamics (Caballero et al. Reference Caballero2018), theories of gravity in strong field regimes (Kramer et al. Reference Kramer2006) and detection of a stochastic background of gravitational waves from pulsar timing experiments (Reardon et al. Reference Reardon2023).
For decades, the role of Murriyang in the discovery of new pulsars and the continuous stream of scientific results originating from pulsar observations has been remarkable. Murriyang has discovered 1235 out of the total number of 3 748 known pulsars in the ATNF Pulsar Catalogue v2.6.3.Footnote c Some of the highlights include the discovery of the only known double pulsar (Burgay et al. Reference Burgay2003) and the so-called ‘diamond planet’ (Bailes et al. Reference Bailes2011). Importantly for the purpose of this paper, archival data have also lead to a number of unexpected results – for example, the discovery of ‘Rotating Radio Transients’ (RRATS, McLaughlin et al. Reference McLaughlin2006) was made during re-processing of the Parkes Multibeam Pulsar Survey (PMPS, P268, Manchester et al. Reference Manchester2001), and the first unusually strong short-lived radio burst (a new class of object later dubbed ‘Fast Radio Bursts’, FRBs) was discovered during re-processing of a survey of the Small Magellanic Cloud (Lorimer et al. Reference Lorimer, Bailes, McLaughlin, Narkevic and Crawford2007; Manchester et al. Reference Manchester, Fan, Lyne, Kaspi and Crawford2006).
Table 1. CSIRO’s Data Access Portal – an overview of the data in the archive available for download at the time of writing.


Figure 1. Murriyang pulsar data published in CSIRO’s Data Access Portal, by observing semester.
Observations have also been carried out to study transient objects in the radio sky, such as flare stars (Zic et al. Reference Zic, Hobbs, Dai, Luo and Shang2023a), and long-period transients (LPTs, Hurley-Walker et al. Reference Hurley-Walker2023). Spectral line and continuum observations are also supported – data from these observations are archived in the Australia Telescope Online Archive (ATOAFootnote d ) and eventually migrated to the CSIRO ASKAP Science Data Archive (CASDAFootnote e ). A recent addition to the suite of supported observing modes is the phase-resolved spectra mode, using the periodic on-off of known pulsars to study the emission and absorption spectra along the line of sight (Liu et al. Reference Liu2025). Murriyang is also part of the Long Baseline Array network, supporting Very Long Baseline Interferometry (VLBI) observations including measurements of pulsar distances by parallax, e.g. Dodson et al. (Reference Dodson, Legge, Reynolds and McCulloch2003). Occasionally, Murriyang is also used for confirmation and follow-up of point sources of interest, for example Wang et al. (Reference Wang2025), an LPT discovered recently in data from the ASKAP radio telescope (Hotan et al. Reference Hotan2021).
Archived data are from proposals ranging from targeted observations, for example of the globular cluster 47 Tucanae (PID P1006, Zhang et al. Reference Zhang2019) to long-term monitoring programs like the Parkes Pulsar Timing Array (P456, Manchester et al. Reference Manchester2013), and Young Pulsar Timing (P574, Weltevrede & Johnston Reference Weltevrede and Johnston2008), to large sky surveys such as the PMPS and SUPERB – A SUrvey for Pulsars & Extragalactic Radio Bursts (P858, Keane et al. Reference Keane2018). The DAP also contains datasets that relate to a particular publication, software package or data release.
Data in the DAP are embargoed for a period of 18 months before being released for public use in ‘collections’ grouped by observation semester (nominally with two semesters annually). Embargoed data are only accessible to Principal Investigators (PIs) and contributors to a proposal. All collections are labelled with a unique Digital Object IdentifierFootnote f (DOI) that is persistent with the life of the collection, thereby providing a mechanism to couple scientific research with good provenance.
The importance of such an archive cannot be underestimated, and it continues to yield new results when the data are run through new algorithms – here are just a few examples. Reprocessing of the Parkes Multibeam survey with a GPU-accelerated processing pipeline recently yielded 37 new pulsars (Sengar et al. Reference Sengar2023). Artificial Intelligence and machine learning tools are also playing a more significant role in candidate and/or anomaly detection (Yang et al. Reference Yang, Hobbs, Zhang, Zic, Toomey, Li, Wang, Dai and Wu2025). Recently, a search of archival data was conducted to question the repeating nature of some transient events (Zhang & Yang Reference Zhang and Yang2024), and in another example, the authors mined archival observations of Open Clusters for potential candidate pulsars (Zhang et al. Reference Zhang2025). 16 yr after the discovery of the first FRB, an additional one was found in the same data-set (Zhang et al. Reference Zhang2019).
This paper is intended as a follow-up to Hobbs et al. (Reference Hobbs2011), bringing users up to date with major developments in the archive since 2011, and describing how the archive plays an important role as we follow the data on a journey from the telescope to the end user, with steps in place to ensure that data quality remains at a high standard throughout.
In Section 2, we describe aspects of data acquisition, including the importance of pulsar data for the field of radio astronomy, observation types, data formats, and archive provision. Section 3 focuses on data preservation, including archive structure and scope of available data products. Aspects of data dissemination are described in Section 4, and in Section 5, we introduce the PFITS software package and briefly discuss data reduction methods and visualisation. In Section 6, we discuss the challenges and future requirements for pulsar data archiving in the era of accelerated data volume acquisition, and leveraging Cloud platforms for processing of DAP data.
2. Data acquisition
The pulsar data products from Murriyang can be thought of as a snapshot of the sky at a particular time and radio frequency and are generally either termed ‘fold-mode’, ‘search-mode’ or ‘calibration-mode’, depending on the observation type. These data products are described in paper I (Hobbs et al. Reference Hobbs2011), but introduced briefly here.
Fold-mode observations are from a pointing of a known pulsar, where the data are ‘folded’ or stacked at the known rotation period of the pulsar, to form an integrated pulse profile that is averaged over a time period longer than the pulsar’s spin period – in the DAP, files of this type have the extension ‘.rf’.
Prior to 2018, these fold-mode files were also averaged over all frequency channels, all polarisations, and integrations, to create a separate sub-set of filesFootnote g – an accompanying preview image of the integrated pulse profile is also available allowing the user to make a judgement on observation quality.
Search-mode observations comprise a multi-channel stream of data with time (a ‘time series’) for the purpose of searching a particular sky location for radio signals, periodic or otherwise. Files of this type have the extension ‘.sf’. These observations make up 93% of the total archive volume.
Two types of calibration-mode observations are used for pulsar observations taken with the current receiver suite. The first type is of a waveform injected into the signal path (Hobbs et al. Reference Hobbs2020), and the second is from observations of a reference radio source with a known stable flux density, notably, Hydra A or more recently PKS B1934-638. The injected signal is used to calibrate the polarisation information of the astronomical data, and generally taken before (and sometimes after) a pulsar observation. The reference radio source provides a means of calibrating the flux density of an observation. The calibration-mode files have the extension ‘.cf’.
2.1. Archival data products
The accepted data format for pulsar data in the DAP is ‘PSRFITS’ (Hotan, van Straten, & Manchester Reference Hotan, van Straten and Manchester2004), although archiving of other research data and/or software is also supported. PSRFITS is a flexible and extensible format based on the Flexible Image Transport System (FITS, Pence et al. Reference Pence, Chiappetti, Page, Shaw and Stobie2010) specifically for pulsar data, adhering to the current version of the definition.Footnote h
At the completion of a particular pulsar observation, the data are transferred to a staging server where they are checked for integrity, converted to the required format if required, and sorted into collections by PID and observation semester. Finally, metadata and checksums are captured, and the collections are placed in the DAP upload queue. Once in the DAP staging area, checks to verify both integrity and metadata are applied, and if successful, the data then progress through to final publication where they become accessible via the DAP’s web-based portal.
PSRFITS format was not always supported by instrumentation at Murriyang. For example, the BPSR/HIPSR (Price et al. Reference Price, Staveley-Smith, Bailes, Carretti, Jameson, Jones, van Straten and Schediwy2016) pulsar and spectral line data acquisition system (or ‘backend’) produced search-mode files but in the native filterbankFootnote i format. Collating and conversion of early archival data is an ongoing process – data from the digital backends such as the Analogue Filterbank (Manchester et al. Reference Manchester2001) and BPSR continue to be converted to PSRFITS format on a dedicated virtual compute host in CSIRO’s Bowen Research Cloud (BRC) prior to being published on DAP.
3. Data preservation
To ensure preservation of our pulsar data products and to encourage future reuse, every file undergoes strict confirmation that they adhere to the PSRFITS definition, including check-summing and metadata completeness prior to archiving.
3.1. Accepted file formats
A PSRFITS format file consists of a primary Header Data Unit (HDU) containing observation metadata, followed by a series of binary extension HDUs, storing metadata and history specific to an observation, and the associated data products.
These files are readable by open-source software packages for pulsar data analysis such as PSRCHIVE (Hotan et al. Reference Hotan, van Straten and Manchester2004), PRESTO (Ransom Reference Ransom2011), the PFITS package (described in Section 5), and FITS file viewers such as NASA HEASARC Fv.Footnote j
The pulsar astronomy community generally sees the benefit of storing data in a format that allows for metadata updates and the ability to add entirely new HDUs if required. However, FITS cannot store data streams from receivers with multiple beams such as the recently commissioned Cryogenically cooled 72-beam Phased Array Feed (CryoPAF) on Murriyang, and is not suited for appending large data-sets. We are currently trialling Spectral-Domain Hierarchical Data Format (SDHDF, Toomey et al. Reference Toomey2024) as a replacement file format.
3.2. Data provenance
The history of the origin, processing, and methodologies associated with a particular data-set encompass the provenance. For pulsar data products from Murriyang, the file metadata and DAP policies provide a high-level of provenance in a number of ways:
-
• Comprehensive metadata capture in the PSRFITS headers, for example, key dates, astronomical source information, receiver, and digital acquisition system information.
-
• Use of a system of Digital Object Identifiers (DOIs) and persistent links to collections, an assurance is given to an author of a particular publication that a DAP collection of associated data (or for software, a specific version) will be accessible for the life of the archive.
-
• The transfer of large volumes of data over a network can occasionally lead to data loss – this can be problematic if a user wishes to reproduce a set of published results – the DAP ensures that checksums of individual files are stored in the metadata, thus providing the user the assurance that the data product is identical to when it was first archived.
-
• DAP provides the citation text for each collection – this ensures that data are correctly cited in peer-reviewed publications.
Data provenance is crucial in order to reproduce scientific results – in 2017, we attempted to re-create reprocessing of fold-mode pulsar data on virtual machines across multiple operating system environments.Footnote k The team found that by using containerised operating systems, and the DOIs provided by the DAP, they were able to fully reproduce the software and data environments and to reproduce the data perfectly. (They were only able to partially reproduce analysis results, but this was due to a random seed built in to the processing software.)
3.3. DAP collection types
The DAP groups data products in ‘collections’. The types of collections are grouped broadly as ‘standard pulsar’ and ‘other pulsar-related’ – the latter can be research data and/or software.
3.3.1. ‘Standard’ pulsar collections
Each observing semester, astronomers can submit a proposal for observing time with ATNF’s telescopes through the ATNF OPALFootnote l system – these include Non A-priori Assignable (NAPA) proposals that may over-ride allocated observing for rapid follow-up of a transient source for example. The proposals are judged on scientific merit and observing time is allocated accordingly – these are referred to as ‘standard’ pulsar collections and contain data from fold-, search-, or calibration-mode observations in PSRFITS format. These collections are bundled by semester, for example a P456 observation in April 2024 can be found in the 2024APR semester (2024APRS spans the 6 months from April 1st to September 30th 2024). A P456 observation in January 2025 will be bundled in the 2024OCT semester (2024OCTS spans from October 1st 2024 to March 31st 2025). The metadata for these ‘standard’ collections (proposal team details, descriptions, embargoes, and license) are generated automatically from the proposal in the OPAL system.
Observation time can be granted at short notice and without an OPAL proposal through applying for Director’s Discretionary Time or Target of Opportunity (ToO) time. An example of this might be a follow-up of a new source – in this case the Project ID assigned is prefixed with a ‘PX’, and prior to 2025, were collated in DAP collections with the title ‘PUNDEF’, for projects that are undefined in the OPAL system. There are three Project IDs that were exceptions to this rule however – PX500 and PX501 were assigned to projects that had purchased telescope time, and PX600 was assigned to observations from the Breakthrough ListenFootnote m initiative.
3.3.2. ‘Other’ pulsar-related collections
The DAP also archives pulsar-related data-sets that are not necessarily observation data and do not fit into the ‘standard’ type classified above. These collections may be data or software products related to a specific publication or project. Some examples of these are:
-
• The Parkes Pulsar Timing Array (PPTA) published their first, second and third data releases for general use.
-
• Johnson & Kerr (2017) published their polarimetry dataset, referenced from their publication by the DOI.
-
• Software releases for the ATNF Pulsar Catalogue are published on a regular basis.
These and ‘other’ pulsar-related collections and the publications they are referenced in, including their persistent DOIs are shown in Table 2.
Table 2. A selection of ‘Other’ pulsar-related collections grouped by subject matter, their collection DOI, and where they are referenced.

3.4. Embargo overview
Each PID has a PI and often multiple contributors. The embargoed files from a particular collection are accessible for the PI and contributors with approved credentials. Once the specified embargo period has lapsed, the files then become publicly accessible and available for download.
The proprietary period of a particular collection can be extended or removed if required. One example of this is the PULSE@Parkes project (P595, Hobbs et al. Reference Hobbs2009) – an outreach program designed to involve high school students from around the world in real-time observations and pulsar data processing using Murriyang – data from which are made publicly available immediately after the observations. Projects with paid time on the telescope may choose to extend the embargo beyond the proprietary period.
3.5. Scope of available collections
In this subsection, we present an overview of the available published collections in the DAP at the time of writing, including the scope of data by receiver and digital backend, sources and sky coverage. We also present a list of collections containing discoveries that provided breakthroughs in our understanding of pulsar classification and astrophysics.
3.5.1. Scope of available receiver and data acquisition instrumentation
The ability of Murriyang to continue to provide cutting-edge science in the field of radio astronomy is in part due to regular updates and replacements of the receivers and the digital acquisition systems. Recent additions are the Ultra-wide Bandwidth Low (UWL) receiver providing simultaneous bandwidth from 0.7 to 4 GHz (Hobbs et al. Reference Hobbs2020), and the CryoPAF, both developed in-house by the ATNF receiver group (paper in prep.).
Table 3. Murriyang’s receiver fleet since the early 1980s – used for both pulsar and non-pulsar observations. The ‘FRONTEND’ field refers to the value of the ‘FRONTEND’ parameter key in a PSRFITS file primary HDU (note, keys marked with
$^!$
indicate that there are no PSRFITS files found in the DAP). The ‘Polarisation’ field indicates the number and type of polarisation of the feed, linear (LIN) or circular (CIRC). Acronyms are as follows: Australia Telescope (AT), National Radio Astronomy Observatory (NRAO), Search for Extraterrestrial Intelligence (SETI), Dominion Radio Astrophysical Observatory (DRAO), Global Magneto-Ionic Medium Survey (GMIMS), Max Planck Institute (MPI). The contents of this table was created from Parkes schedule archives and an online receiver database
$^b$
(from 1998 on-wards), and otherwise referenced in line where known.

Table 3 is a comprehensive list of Murriyang’s receiver fleet to date. This single historic record of the instrumentation since the early 1980s is included here to provide context for the reader – not all receivers observed pulsars, and for those that did, not all have data that are accounted for. We are always on the lookout to publish historic archival data in the DAP – by providing this comprehensive list, we hope that these missing data come to light. The AT Multi-Band receiver
$^a$
(see https://www.atnf.csiro.au/observers/memos/d95b8a~1.pdf and the Fourth Annual Report of the Australia Telescope Project (CSIRO, Oct. 1987, p. 9, Appendix B)) was a five-feed receiver package – the S/X-band
$^1$
was a special concentric dual-band feed allowing S and X bands to be observed simultaneously, mostly for astrometric VLBI. The Methanol
$^2$
(also known as the ‘Old Meth’), SETI
$^3$
, K/KU-band
$^4$
, Galileo
$^5$
, 10-50
$^6$
and 13 MM
$^7$
(see http://hdl.handle.net/102.100.100/109880?index=1) receivers were all dual-feed packages. In 2000, the Methanol
$^2$
‘FRONTEND’ parameter key was changed from ‘METHANOL’ to ‘METH6’ and ‘METH12’ to reflect the two independent feeds of the receiver. The 50 cm frequency band of the 10-50 receiver was shifted upwards during it's time on the telescope in order to avoid phone and Digital TV interference – from 2003 to July 2009, the range was 680 MHz +/− 32 MHz, then 685 +/− 32 MHz, before finally settling on 700 to 764 MHz. Confusion about the polarisation of the 13 MM dual-band receiver
$^7$
is evident in the data – the ‘13MM’ parameter key was used for both feeds, causing uncertainty about whether the polarisation parameters were set correctly. From historical records, we have ascertained that the narrow-band receiver was used predominantly for VLBI and that all are circular if 21 GHz < frequency < 23 GHz. Our records also show that data prior to 21/11/2013 were marked as linear, but after were marked as circular, regardless of which feed was actually used.
In some cases, data were recorded with parameter values that did not follow the traditional naming scheme – for completeness these additional parameters found in the headers of data from some receivers are listed in Table 4.
Table 4. Other ‘FRONTEND’ parameter values found in the PSRFITS primary HDU of some early pulsar data.

Table 5. Murriyang’s principal data acquisition systems used mainly for pulsar observations, believed to be complete since 1990 – referenced where possible, showing the year they were commissioned, number of years in service (in brackets), and development credit. ‘INSTRUMENT’ refers to the value of the ‘INSTRUMENT’ parameter key in a PSRFITS file. The S2* recorder was installed for VLBI observations but also used for pulsar observations from 1996 to 1999, and then returned to VLBI use only until 2002. BPSR** was also known as the HI Parkes Swinburne Recorder, HIPSR. Apollo
$^{+}$
is a software instance running on the Boreas GPU backend, for UWL observations only. ‘INSTRUMENT’ marked ‘n/a’ (not applicable) indicates that data from these instruments predated the PSRFITS format and therefore will not be in DAP. Instruments for where there were no data found are marked with
$^{!}$
. https://resolver.caltech.edu/CaltechETD:etd-09102008-091511
$^{1}$
.

The data acquisition instrumentation produces a binary data stream from the analogue sky signal that becomes the astronomy data products. Many collaborative efforts since the early 1990s have contributed to build, configure and update these systems on Murriyang – Table 5 lists these systems and the collaborations involved, and detailed specifications are presented in Table 6.
3.5.2. Scope of sky coverage
At the time of writing, 4.3 Petabytes of pulsar search-mode pulsar are published in the DAP, encompassing observations from both legacy and recent surveys, and pulsar and FRB follow-up campaigns. A total of 1235 pulsars have been discovered in surveys using Murriyang – the first survey using the 70 cm receiver (P050) yielded a total of 298 new discoveries, and the PMPS (P268) has been the most successful to this day, finding 833 new pulsars in the galactic plane. Recent searches with new algorithms have continued to add value to these data-sets taking the total number of pulsars discovered in the PMPS to 1 160, with Xia et al. (Reference Xia, Crawford, Hisano, Jespersen, Ficarra, Golden and Gironda2025) finding a new pulsar recently in P050 data. These examples demonstrate the importance of a comprehensive long-term data archive such as DAP. Table 7 lists a selection of the pulsar surveys, and Figure 2 shows the sky coverage by Project ID for the main pulsar surveys conducted over the last 34 yr.
3.5.3. Collections containing historically important discoveries
Within the daily stream of data from observations at Murriyang, the DAP contains archival data that lead to several notable discoveries – a selection of those are shown in Table 8. The data file containing the discovery, the DOI of the collection the files are in, and the publication reference are all listed.
Access to the archive is continuing to grow, and this is reflected in the NASA ADSFootnote n publication statistics – 15 peer-reviewed papers acknowledged the use of DAP data in their work in 2024.
Table 6. Specifications of Murriyang’s principal data acquisition systems used for pulsar observations since 1990. ‘INSTRUMENT’ refers to the value of the ‘INSTRUMENT’ parameter key in a PSRFITS file. ‘Bandwidth’ refers to the maximum instantaneous bandwidth of the digital backend Analogue to Digital Converters, and ‘Sample time’ and ‘Resolution’ are the maximum time and frequency resolution respectively. Some instruments* have software-dependent frequency resolution. The AFB was available in various modes (all single polarisation): high-resolution, single beam (from 1997)
$^{1}$
; standard Multibeam Survey mode (from 1997)
$^{2}$
; 125 and 250 kHz modes for the 50 cm and 70 cm receivers (1997–2004)
$^{3}$
; wide-bandwidth, single-beam mode for the 10cm receiver (from 2005)
$^{4}$
. The Swinburne systems were dual-polarisation
$^{5}$
.

Table 7. Known pulsar surveys and targeted searches conducted with Murriyang. ‘Date’ refers to the range of observation dates for data in DAP. PIDs where the data in DAP are incomplete are marked with an
$^\mathrm{x}$
– data likely missing deemed lost or corrupt. PIDs where data are continuing to be added are marked with a
$^{+}$
. PIDs with no data found to date are marked with
$^{!}$
, and PIDs with no data on DAP are marked with *. PIDs marked ‘n/a’ (not applicable) are projects that predated the PID indexing scheme.

3.6. Additions to the archive
Some archival Murriyang pulsar datasets are gradually being added to DAP. These tend to include very large surveys (for example the ‘High Time Resolution Universe Pulsar Survey’ (P630, Keith et al. Reference Keith2010) – there is an ongoing effort to collate these ‘missing’ data, and archive them as resources become available – currently, over 25 Terabytes of data from this survey are being added to the archive annually.
4. Data dissemination
Providing data accessibility through requests based on a search and filter methodology is a core part of the archive. The DAP provides both a simple and advanced search interface, with the latter allowing filtering on source name, position, observation frequency or receiver.
DAP is also Virtual ObservatoryFootnote o (VO) compatible. VO tools such as TOPCAT Footnote p or PyVO Footnote q can be used to query pulsar observations using the Table Access Protocol (TAP) and Simple Cone Search (SCS), and cross-match sources with other catalogues – for example, Figure 2 was generated by a TAP query in TOPCAT. A configured query can also be used to generate data download links.
After selecting the required data, a user can choose a download method. In 2023, DAP moved to object storage – this resulted in considerable improvement in the access of large Terabyte-scale collections. The object store supports transfer protocols such as the AWS Command-Line Interface,Footnote r RcloneFootnote s and Globus.Footnote t

Figure 2. A TOPCAT Hammer-Aitoff sky projection in Galactic coordinates of observations published in the DAP from the main pulsar surveys conducted with Murriyang over the last 30 yr.
Table 8. DAP collections containing important discoveries.

5. Data Analysis and Visualisation
There are a number of ways a user can interact with data from the DAP, and this will depend on the nature of the research or required scientific outcome.
Re-analysing search-mode data from all-sky surveys can lead to new pulsar discoveries, or allow studies of single-pulse phenomenology of known sources. Fold-mode data can be used for a variety of astrophysical studies, such as pulsar timing, profile evolution, polarimetry, and the properties of the Inter-Stellar Medium (ISM), and the use cases will depend on the required scientific outcome.
A PI or other astronomer may wish to use conventional command-line pulsar data analysis packages, for example, PSRCHIVE Footnote u (written in C++ with Python wrappers) for folding/calibration or PRESTO Footnote v (in C with Python wrappers for some routines) for pulsar searching – these packages all read files in the PSRFITS format. The user may also wish to pre-configure packages to suit their system, and implement their own algorithms.

Figure 3. The output of pfits_frb: the profile of the Lorimer Burst is shown in the top plot, de-dispersed in the centre and dispersed at the bottom. The profile shows significant clipping in this beam (the discovery beam) of the Multibeam receiver because the pulse saturated the available dynamic range.
5.1. Introducing the PFITS package
Many updates to the conventional pulsar data analysis packages mentioned above have been conducted since they were published, in order to work with wide-band data such as those from the UWL receiver, although they each work with data from specific observing modes (fold- or search-mode). However, here we introduce the PFITS Footnote w package – written in C, it is an alternative to conventional tools and provides routines and utilities for working with PSRFITS format files from all the different observing modes. Some examples of commonly-used routines are:
-
• pfits_describe – prints the header information
-
• pfits_fv – interrogates the file metadata interactively
-
• pfits_plot – interactively plots the astronomy data
-
• pfits_zapProfile – interactively removes interference
-
• pfits_frb – interactively plots FRB candidates in a pulsar search-mode file
Figure 3 demonstrates output from the pfits_frb routine – a user can zoom in time to display a window around the dispersed pulse of an FRB, and the pulse is then de-dispersed on the fly.
6. Pulsar data archiving challenges and future requirements
Pulsar data archiving into the future provides the following challenges, including handling increasingly large data volumes, the importance of provenance for reproducibility, the hunt for missing data, and Cloud-based archiving as a means to provide global access to data products.
6.1. Managing high data volumes
The Cryogenically cooled Phased Array Feed is the next generation receiver for Murriyang – consisting of 72 beams and designed for large-scale surveys, capable of recording an instantaneous bandwidth in two bands from 700–1 200 and 1 100–1 950 MHz, with expected data rates of up to 80 TB per hour depending on the observation mode and configuration. Archiving of these data will present considerable challenges, and development is underway to enable near real-time ingest of high-volume data into DAP.
6.2. Cloud-based pulsar data processing
Cloud-based storage and compute platforms such as Amazon Web Services (AWS), are experts in handling large data volumes. These services could be used to provide a mirror of DAP data, allowing easily configurable global access to the archive, and scaleable compute infrastructure for data processing, encouraging a ‘User to the data’ model. We are currently trialling this model in CSIRO’s Earth Analytics Science and Innovation platform (EASIFootnote x ), which runs on AWS infrastructure, and is accessible to anyone who applies for and is granted an EASI account.
6.3. Archival data recovery
The DAP archive is by no means a complete set of collections, and in fact data are recovered from tape as and when they are discovered. The data are checked, sorted and converted to PSRFITS format if required prior to adding to an existing DAP collection, or forming a new collection. We are currently recovering archival data and publishing these in DAP at a rate of approximately 30TB per year.
A proportion of Parkes pulsar data are not available in the archive. Of the 400 project IDs to date, 100 are currently missing. These data are likely found in University cupboards, simply lost, or deemed junk data.
7. Conclusion
Murriyang remains an instrument at the cutting edge of the field of radio astronomy. This is due mainly to investment in its receiver and digital acquisition systems over the years, but also to the availability of its data products in long-term archives such as DAP.
In this paper we have provided an update on the archive status, and demonstrated the importance of storing large volumes of pulsar data for re-processing by modern algorithms resulting in new discoveries. We introduced the PFITS package for processing of PSRFITS format data, and touched on the future use of Cloud platforms for scaleable processing workflows without the need to move vast volumes of data.
Acknowledgements
The Parkes radio telescope is part of the Australia Telescope National Facility (grid.421683.a) which is funded by the Australian Government for operation as a National Facility managed by CSIRO. This paper includes archived data obtained through the Parkes Pulsar Data archive on the CSIRO Data Access Portal (http://data.csiro.au). We acknowledge the Wiradjuri people as the Traditional Owners of the Observatory site.



















