How do we design data sets for Machine Learning astronomy?

Renée Hložek

doi:10.1017/S1743921323001394

How do we design data sets for Machine Learning astronomy?

Published online by Cambridge University Press: 01 August 2025

Renée Hložek

Show author details

Renée Hložek*: Affiliation:
Dunlap Institute for Astronomy & Astrophysics, 50 St. George Street, Toronto ON M5S 3H4 David A. Dunlap Department for Astronomy & Astrophysics, 50 St. George Street, Toronto ON M5S 3H4
*: email: hlozek@dunlap.utoronto.ca

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Many problems in astronomy and physics lend themselves to solutions from machine learning methods for the detection and classification of astronomical signals, and model inference from those signals. The historic presentation of machine learning methods as ‘black boxes’ has generated push back from some in the the physics/astronomy communities regarding how useful they are to truly uncover the physical laws that govern our world. Skepticism about the applicability of new computational methods in scientific inference is not new; we highlight connections between the machine learning contexts and previous computational paradigm shifts in astronomy. Moreover, several advances in methodologies challenge the assumption that machine learning ‘gives us answers that we can use but do not understand’ to standing physics questions. We summarize some astronomical machine learning data challenges used in astronomy and how we can use challenges on different scales to test different parts/use cases of our analysis methods.

Keywords

machine learning surveys observations statistics

Information

Type: Contributed Paper
Information: Proceedings of the International Astronomical Union , Volume 19 , Symposium S368: Machine Learning in Astronomy: Possibilities and Pitfalls , August 2023 , pp. 11 - 27

DOI: https://doi.org/10.1017/S1743921323001394 [Opens in a new window]

NASA ADS Abstract Service [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of International Astronomical Union

References

Abbott, B. P., et al. (2016). Tests of general relativity with GW150914. Phys. Rev. Lett., 116(22), 221101. ([Erratum: Phys.Rev.Lett. 121, 129902 (2018)]) doi: 10.1103/PhysRevLett.116.221101 Google Scholar PubMed

Abbott, T. C., Buffaz, E., Vieira, N., Cabero, M., Haggard, D., Mahabal, A., & McIver, J. (2022, March). GWSkyNet-Multi: A Machine-learning Multiclass Classifier for LIGO-Virgo Public Alerts. ApJ, 927(2), 232. doi: 10.3847/1538-4357/ac5019 Google Scholar

Abbott, T. M. C., Aguena, M., Alarcon, A., Allam, S., Alves, O., Amon, A., … DES, Collaboration (2022, January). Dark Energy Survey Year 3 results: Cosmological constraints from galaxy clustering and weak lensing. Phys. Rev. D, 105(2), 023520. doi: 10.1103/PhysRevD.105.023520 CrossRef Google Scholar

Agarwal, D., Aggarwal, K., Burke-Spolaor, S., Lorimer, D. R., & Garver-Daniels, N. (2020, September). FETCH: A deep-learning based classifier for fast transient classification. MNRAS, 497(2), 1661–1674. doi: 10.1093/mnras/staa1856 Google Scholar

Alvarez-Lopez, S., Liyanage, A., Ding, J., Ng, R., & McIver, J. (2023, April). GSpyNet- Tree: A signal-vs-glitch classifier for gravitational-wave event candidates. arXiv e-prints, arXiv:2304.09977. doi: 10.48550/arXiv.2304.09977 Google Scholar

An, G. (2018, September). The crisis of reproducibility, the denominator problem and the scientific role of multi-scale modeling. Bull Math Biol, 80(12), 3071–3080.CrossRef Google Scholar PubMed

Andersson, A., Lintott, C., Fender, R., Bright, J., Carotenuto, F., Driessen, L., … Whittle, I. (2023, April). Bursts from Space: MeerKAT - The first citizen science project dedicated to commensal radio transients. arXiv e-prints, arXiv:2304.14157. doi: 10.48550/arXiv.2304.14157 Google Scholar

Babak, S., Baker, J. G., Benacquista, M. J., Cornish, N. J., Crowder, J., Cutler, C., … Challenge-2 participants, t. (2008, June). Report on the second Mock LISA data challenge. Classical and Quantum Gravity, 25(11), 114037. doi: 10.1088/0264-9381/25/11/114037 CrossRef Google Scholar

Baghi, Q. (2022, April). The LISA Data Challenges. arXiv e-prints, arXiv:2204.12142. doi: 10.48550/arXiv.2204.12142 Google Scholar

Baron, D. (2019, April). Machine Learning in Astronomy: a practical overview. arXiv e-prints, arXiv:1904.07248. doi: 10.48550/arXiv.1904.07248 Google Scholar

Bianco, F. B., Ivezić, Ž., Jones, R. L., Graham, M. L., Marshall, P., Saha, A., … Willman, B. (2022, January). Optimization of the Observing Cadence for the Rubin Observatory Legacy Survey of Space and Time: A Pioneering Process of Community-focused Experimental Design. ApJS, 258(1), 1. doi: 10.3847/1538-4365/ac3e72 CrossRef Google Scholar

Bini, S., Vedovato, G., Drago, M., Salemi, F., & Prodi, G. A. (2023, March). An autoencoder neural network integrated into gravitational-wave burst searches to improve the rejection of noise transients. arXiv e-prints, arXiv:2303.05986. doi: 10.48550/arXiv.2303.05986 Google Scholar

Blundell, C., Cornebise, J., Kavukcuoglu, K., & Wierstra, D. (2015, May). Weight Uncertainty in Neural Networks. arXiv e-prints, arXiv:1505.05424. doi: 10.48550/arXiv.1505.05424 Google Scholar

Bonavera, L., Suarez Gomez, S. L., González-Nuevo, J., Cueli, M. M., Santos, J. D., Sanchez, M. L., … de Cos, F. J. (2021, April). Point source detection with fully convolutional networks. Performance in realistic microwave sky simulations. A&A, 648, A50. doi: 10.1051/0004-6361/201937171 CrossRef Google Scholar

Boroson, T. A., & Green, R. F. (1992, May). The Emission-Line Properties of Low-Redshift Quasi-stellar Objects. ApJS, 80, 109. doi: 10.1086/191661 CrossRef Google Scholar

Boscoe, B., Do, T., Jones, E., Li, Y., Alfaro, K., & Ma, C. (2022, November). Elements of effective machine learning datasets in astronomy. arXiv e-prints, arXiv:2211.14401. doi: 10.48550/arXiv.2211.14401 Google Scholar

Boulahia, S. Y., Amamra, A., Madi, M. R., & Daikh, S. (2021, nov). Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition. Mach. Vision Appl., 32(6). doi: 10.1007/s00138-021-01249-8 CrossRef Google Scholar

Buntine, W. L., & Weigend, A. S. (1991). Bayesian back-propagation. Complex Syst., 5.Google Scholar

Cabero, M., Mahabal, A., & McIver, J. (2020, November). GWSkyNet: A Real-time Classifier for Public Gravitational-wave Candidates. ApJ, 904(1), L9. doi: 10.3847/2041-8213/abc5b5 CrossRef Google Scholar

Caldeira, J., & Nord, B. (2020, April). Deeply Uncertain: Comparing Methods of Uncertainty Quantification in Deep Learning Algorithms. arXiv e-prints, arXiv:2004.10710. doi: 10.48550/arXiv.2004.10710 Google Scholar

Cardamone, C., Schawinski, K., Sarzi, M., Bamford, S. P., Bennert, N., Urry, C. M., … Vandenberg, J. (2009, November). Galaxy Zoo Green Peas: discovery of a class of compact extremely star- forming galaxies. MNRAS, 399, 1191–1205. doi: 10.1111/j.1365-2966.2009.15383.x Google Scholar

CHIME/FRB Collaboration, Amiri, M., Andersen, B. C., Bandura, K., Berger, S., Bhardwaj, M., … Zwaniga, A. V. (2021, December). The First CHIME/FRB Fast Radio Burst Catalog. ApJS, 257(2), 59. doi: 10.3847/1538-4365/ac33ab Google Scholar

Ćiprijanović, A., Kafkes, D., Snyder, G., Sánchez, F. J., Perdue, G. N., Pedro, K., … Wild, S. M. (2022, September). DeepAdversaries: examining the robustness of deep learning models for galaxy morphology classification. Machine Learning: Science and Technology, 3(3), 035007. doi: 10.1088/2632-2153/ac7f1a CrossRef Google Scholar

Cireşan, D., Meier, U., & Schmidhuber, J. (2012, February). Multi-column Deep Neural Networks for Image Classification. arXiv e-prints, arXiv:1202.2745. doi: 10.48550/arXiv.1202.2745 CrossRef Google Scholar

Connolly, A. J., Szalay, A. S., Bershady, M. A., Kinney, A. L., & Calzetti, D. (1995, September). Spectral Classification of Galaxies: an Orthogonal Approach. AJ, 110, 1071. doi: 10.1086/117587 CrossRef Google Scholar

Connor, L., Ravi, V., Catha, M., Chen, G., Faber, J. T., Lamb, J. W., … Yadlapalli, N. (2023, February). Deep Synoptic Array science: Two fast radio burst sources in massive galaxy clusters. arXiv e-prints, arXiv:2302.14788. doi: 10.48550/arXiv.2302.14788 Google Scholar

Connor, L., & van Leeuwen, J. (2018, December). Applying Deep Learning to Fast Radio Burst Classification. AJ, 156(6), 256. doi: 10.3847/1538-3881/aae649 CrossRef Google Scholar

Dai, Z., Moews, B., Vilalta, R., & Dave, R. (2023, March). Physics-informed neural networks in the recreation of hydrodynamic simulations from dark matter. arXiv e-prints, arXiv:2303.14090. doi: 10.48550/arXiv.2303.14090 Google Scholar

DeLaunay, J., & Tohuvavohu, A. (2022, December). Harvesting BAT-GUANO with NITRATES (Non-Imaging Transient Reconstruction and Temporal Search): Detecting and Localizing the Faintest Gamma-Ray Bursts with a Likelihood Framework. ApJ, 941(2), 169. doi: 10.3847/1538-4357/ac9d38 CrossRef Google Scholar

Denker, J. S., & LeCun, Y. (1990). Transforming neural-net output levels to probability distributions. In Advances in neural information processing systems 3 (p. 853–859). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.Google Scholar

Djorgovski, S. G., Baltay, C., Mahabal, A. A., Drake, A. J., Williams, R., Rabinowitz, D., … Ellman, N. (2008, March). The Palomar-Quest digital synoptic sky survey. Astronomische Nachrichten, 329(3), 263. doi: 10.1002/asna.200710948 CrossRef Google Scholar

Drake, A. J., Djorgovski, S. G., Mahabal, A., Prieto, J. L., Beshore, E., Graham, M. J., … Williams, R. (2012, April). The Catalina Real-time Transient Survey. In Griffin, E., Hanisch, R., & Seaman, R. (Eds.), New horizons in time domain astronomy (Vol. 285, p. 306–308). doi: 10.1017/S1743921312000889 CrossRef Google Scholar

Dumusque, X. (2016, Aug). Radial velocity fitting challenge. I. Simulating the data set including realistic stellar radial-velocity signals. A&A, 593, A5. doi: 10.1051/0004-6361/201628672 CrossRef Google Scholar

Dumusque, X., Borsa, F., Damasso, M., Díaz, R. F., Gregory, P. C., Hara, N. C., … Udry, S. (2017, February). Radial-velocity fitting challenge. II. First results of the analysis of the data set. A&A, 598, A133. doi: 10.1051/0004-6361/201628671 CrossRef Google Scholar

Dunn, M., Ćiprijanović, A., Nord, B., & Mobasher, B. (2023, January). Galaxy Morphology Classification Using Bayesian Neural Networks for LSST. In American astronomical society meeting abstracts (Vol. 55, p. 105.13).CrossRef Google Scholar

Dvorkin, C., Mishra-Sharma, S., Nord, B., Villar, V. A., Avestruz, C., Bechtol, K., … Villaescusa-Navarro, F. (2022, March). Machine Learning and Cosmology. arXiv e-prints, arXiv:2203.08056. doi: 10.48550/arXiv.2203.08056 Google Scholar

Errington, T. M., Denis, A., Perfito, N., Iorns, E., & Nosek, B. A. (2021, dec). Reproducibility in cancer biology: Challenges for assessing replicability in preclinical cancer biology. eLife, 10, e67995. Retrieved from https://doi.org/10.7554/eLife.67995doi:10.7554/eLife.67995 CrossRef Google Scholar

Fukushima, K. (1980). Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern, 36(4), 193–202.CrossRef Google Scholar PubMed

George, D., & Huerta, E. (2018a). Deep learning for real-time gravitational wave detection and parameter estimation: Results with advanced ligo data. Physics Letters B, 778, 64–70. Retrieved from https://www.sciencedirect.com/science/article/pii/S0370269317310390 doi: https://doi.org/10.1016/j.physletb.2017.12.053 CrossRef Google Scholar

George, D., & Huerta, E. A. (2018b, Feb). Deep neural networks to enable real-time multimes- senger astrophysics. Phys. Rev. D, 97, 044039. Retrieved from https://link.aps.org/doi/10.1103/PhysRevD.97.044039 doi: 10.1103/PhysRevD.97.044039 Google Scholar

Giorgi, G. M., & Gigliarano, C. (2017). The gini concentration index: A review of the inference literature. Journal of Economic Surveys, 31(4), 1130–1148. doi: https://doi.org/10.1111/joes.12185 CrossRef Google Scholar

Graham, M. J., Djorgovski, S. G., Mahabal, A., Donalek, C., Drake, A., & Longo, G. (2012, August). Data challenges of time domain astronomy. arXiv e-prints, arXiv:1208.2480. doi: 10.48550/arXiv.1208.2480 Google Scholar

Grojean, C., Paul, A., Qian, Z., & Strümke, I. (2022, March). Interpretable machine learning in Physics. arXiv e-prints, arXiv:2203.08021. doi: 10.48550/arXiv.2203.08021 Google Scholar

Hartley, P., Bonaldi, A., Braun, R., Aditya, J. N. H. S., Aicardi, S., Alegre, L., … Zuo, S. (2023, March). SKA Science Data Challenge 2: analysis and results. arXiv e-prints, arXiv:2303.07943. doi: 10.48550/arXiv.2303.07943 Google Scholar

Hillar, C., & Sommer, F. (2012, October). Comment on the article “Distilling free-form natural laws from experimental data”. arXiv e-prints, arXiv:1210.7273. doi: 10.48550/arXiv.1210.7273 Google Scholar

Hložek, R., Ponder, K. A., Malz, A. I., Dai, M., Narayan, G., Ishida, E. E. O., … Setzer, C. N. (2020, December). Results of the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC). arXiv e-prints, arXiv:2012.12392. doi: 10.48550/arXiv.2012.12392 Google Scholar

Ho, J., Jain, A., & Abbeel, P. (2020, June). Denoising Diffusion Probabilistic Models. arXiv e-prints, arXiv:2006.11239. doi: 10.48550/arXiv.2006.11239 Google Scholar

Hopkins, A. M., Whiting, M. T., Seymour, N., Chow, K. E., Norris, R. P., Bonavera, L., … van der Horst, A. J. (2015, October). The ASKAP/EMU Source Finding Data Challenge. Publ. Astron. Soc. Australia, 32, e037. doi: 10.1017/pasa.2015.37 CrossRef Google Scholar

Iyer, K. G., Speagle, J. S., Caplar, N., Forbes, J. C., Gawiser, E., Leja, J., & Tacchella, S. (2022, August). Stochastic Modelling of Star Formation Histories III. Constraints from Physically-Motivated Gaussian Processes. arXiv e-prints, arXiv:2208.05938. doi: 10.48550/arXiv.2208.05938 Google Scholar

Jimenez, M., Alfaro, E. J., Torres Torres, M., & Triguero, I. (2023, February). CzSL: Learning from citizen science, experts and unlabelled data in astronomical image classification. arXiv e-prints, arXiv:2302.00366. doi: 10.48550/arXiv.2302.00366 Google Scholar

Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., & Yang, L. (2021, January). Physics-informed machine learning. Nature Reviews Physics, 3(6), 422–440. doi: 10.1038/s42254-021-00314-5 CrossRef Google Scholar

Karpov, P. I., Huang, C., Sitdikov, I., Fryer, C. L., Woosley, S., & Pilania, G. (2022, November). Physics-informed Machine Learning for Modeling Turbulence in Supernovae. ApJ, 940(1), 26. doi: 10.3847/1538-4357/ac88cc CrossRef Google Scholar

Kessler, R., Bassett, B., Belov, P., Bhatnagar, V., Campbell, H., Conley, A., … Varughese, M. (2010, December). Results from the Supernova Photometric Classification Challenge. Publications of the Astronomical Society of the Pacific, 122, 1415. doi: 10.1086/657607 CrossRef Google Scholar

Kessler, R., Becker, A. C., Cinabro, D., Vanderplas, J., Frieman, J. A., Marriner, J., … York, D. (2009, November). First-Year Sloan Digital Sky Survey-II Supernova Results: Hubble Diagram and Cosmological Parameters. ApJS, 185(1), 32–84. doi: 10.1088/0067-0049/185/1/32 CrossRef Google Scholar

Kessler, R., Narayan, G., Avelino, A., Bachelet, E., Biswas, R., Brown, P. J., … Transient and Variable Stars Science Collaboration (2019, Sep). Models and Simulations for the Photo- metric LSST Astronomical Time Series Classification Challenge (PLAsTiCC). Publications of the Astronomical Society of the Pacific, 131 (1003), 094501. doi: 10.1088/1538-3873/ab26f1 CrossRef Google Scholar

Kitching, T., Balan, S., Bernstein, G., Bethge, M., Bridle, S., Courbin, F., … Voigt, L. (2010, September). Gravitational Lensing Accuracy Testing 2010 (GREAT10) Challenge Handbook. arXiv e-prints, arXiv:1009.0779. doi: 10.48550/arXiv.1009.0779 CrossRef Google Scholar

Klingner, C. M., Denker, M., Grün, S., Hanke, M., Oeltze-Jafra, S., Ohl, F. W., … Ritter, P. (2022). Overcoming the reproducibility crisis - results of the first community survey of the german national research data infrastructure for neuroscience. bioRxiv. doi: 10.1101/2022.04.07.487439 Google Scholar

Kruk, S., & Merín, B. (2023, April). Citizen Science with ESA Science Data - The Hubble Asteroid Hunter project. Europhysics News, 54(2), 28–31. doi: 10.1051/epn/2023206 CrossRef Google Scholar

Lecar, M. (1968). Bulletin astronomique de l‘observatoire de paris. In (Vol. 3). Centre National de la Research Scientifique.Google Scholar

Levrier, F., Wilman, R. J., Obreschkow, D., Kloeckner, H. R., Heywood, I. H., & Rawlings, S. (2009, January). Mapping the SKA Simulated Skies with the S3-Tools. In Wide field astronomy & technology for the square kilometre array (p. 5). doi: 10.22323/1.132.0005 CrossRef Google Scholar

Lewis, A., Voetberg, M., Nord, B., Jones, C., Hložek, R., Ciprijanovic, A., & Perdue, G. N. (2022, July). DeepBench: A library for simulating benchmark datasets for scientific analysis. In Machine learning for astrophysics (p. 32).Google Scholar

Lin, H.-H., Lin, K.-y., Li, C.-T., Tseng, Y.-H., Jiang, H., Wang, J.-H., … Zhu, H.-M. (2022, September). BURSTT: Bustling Universe Radio Survey Telescope in Taiwan. PASP, 134(1039), 094106. doi: 10.1088/1538-3873/ac8f71 CrossRef Google Scholar

Lintott, C. J., Schawinski, K., Slosar, A., Land, K., Bamford, S., Thomas, D., … Vandenberg, J. (2008, September). Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. MNRAS, 389(3), 1179–1189. doi: 10.1111/j.1365-2966.2008.13689.x CrossRef Google Scholar

Lokken, M., Gagliano, A., Narayan, G., Hložek, R., Kessler, R., Crenshaw, J. F., … LSST Dark Energy Science Collaboration (2023, April). The simulated catalogue of optical transients and correlated hosts (SCOTCH). MNRAS, 520(2), 2887–2912. doi: 10.1093/mnras/stad302 CrossRef Google Scholar

Lorimer, D. R., Bailes, M., McLaughlin, M. A., Narkevic, D. J., & Crawford, F. (2007, November). A Bright Millisecond Radio Burst of Extragalactic Origin. Science, 318(5851), 777. doi: 10.1126/science.1147532 CrossRef Google Scholar PubMed

LSST Dark Energy Science Collaboration. (2012, November). Large Synoptic Survey Telescope: Dark Energy Science Collaboration. arXiv e-prints, arXiv:1211.0310. doi: 10.48550/arXiv.1211.0310 Google Scholar

LSST Dark Energy Science Collaboration (LSST DESC), Abolfathi, B., Alonso, D., Armstrong, R., Aubourg, É., Awan, H., … Zuntz, J. (2021, March). The LSST DESC DC2 Simulated Sky Survey. ApJS, 253(1), 31. doi: 10.3847/1538-4365/abd62c CrossRef Google Scholar

Science Collaboration, LSST, Abell, P. A., Allison, J., Anderson, S. F., Andrew, J. R., Angel, J. R. P., … Zhan, H. (2009, December). LSST Science Book, Version 2.0. arXiv e-prints, arXiv:0912.0201. doi: 10.48550/arXiv.0912.0201 CrossRef Google Scholar

Lucie-Smith, L., Peiris, H. V., & Pontzen, A. (2023, May). Explaining dark matter halo density profiles with neural networks. arXiv e-prints, arXiv:2305.03077.Google Scholar

MacKay, D. J. C. (1992, 05). A Practical Bayesian Framework for Backpropagation Networks. Neural Computation, 4(3), 448–472. doi: 10.1162/neco.1992.4.3.448 CrossRef Google Scholar

Malz, A., Hložek, R., Allam, J., Tarek, Bahmanyar, A., Biswas, R., Dai, M., … Variable Stars Science Collaboration (2018, September). The Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC): Selection of a performance metric for classification probabilities balancing diverse science goals. ArXiv e-prints, arXiv:1809.11145.CrossRef Google Scholar

Mandelbaum, R., Rowe, B., Bosch, J., Chang, C., Courbin, F., Gill, M., … Schrabback, T. (2014, May). The Third Gravitational Lensing Accuracy Testing (GREAT3) Challenge Handbook. The Astrophysical Journal Supplement Series, 212, 5. doi: 10.1088/0067-0049/212/1/5 CrossRef Google Scholar

Megias Homar, G., Meyers, J. E., & Kahn, S. M. (2023, March). Prompt Detection of Fast Optical Bursts with the Vera C. Rubin Observatory. arXiv e-prints, arXiv:2303.02525. doi: 10.48550/arXiv.2303.02525 CrossRef Google Scholar

Metcalf, R. B., Meneghetti, M., Avestruz, C., Bellagamba, F., Bom, C. R., Bertin, E., … Vernardos, G. (2018, February). The Strong Gravitational Lens Finding Challenge. ArXiv e-prints, arXiv:1802.03609.CrossRef Google Scholar

Miller, R. H. (1964, July). Irreversibility in Small Stellar Dynamical Systems. ApJ, 140, 250. doi: 10.1086/147911 CrossRef Google Scholar

Miller, R. H. (1971, January). Experimental studies of the numerical stability of the gravitational n-body problem. Journal of Computational Physics, 8, 449–464. doi: 10.1016/0021-9991(71)90023-4 CrossRef Google Scholar

Modi, C., Lanusse, F., Seljak, U., Spergel, D. N., & Perreault-Levasseur, L. (2021, April). CosmicRIM: Reconstructing Early Universe by Combining Differentiable Simulations with Recurrent Inference Machines. arXiv e-prints, arXiv:2104.12864. doi: 10.48550/arXiv.2104.12864 Google Scholar

Mohan, D., Scaife, A. M. M., Porter, F., Walmsley, M., & Bowles, M. (2022, April). Quantifying uncertainty in deep learning approaches to radio galaxy classification. MNRAS, 511(3), 3722–3740. doi: 10.1093/mnras/stac223 CrossRef Google Scholar

Narayan, G., & ELAsTiCC Team. (2023, January). The Extended LSST Astronomical Time-series Classification Challenge (ELAsTiCC). In American astronomical society meeting abstracts (Vol. 55, p. 117.01).Google Scholar

Nord, B., Connolly, A. J., Kinney, J., Kubica, J., Narayan, G., Peek, J. E. G., … Tollerud, E. J. (2019, September). Algorithms and Statistical Models for Scientific Discovery in the Petabyte Era. In Bulletin of the american astronomical society (Vol. 51, p. 224). doi: 10.48550/arXiv.1911.02479 Google Scholar

Ntampaka, M., Ho, M., & Nord, B. (2021, November). Building Trustworthy Machine Learning Models for Astronomy. arXiv e-prints, arXiv:2111.14566. doi: 10.48550/arXiv.2111.14566 Google Scholar

Ntampaka, M., & Vikhlinin, A. (2022, February). The Importance of Being Interpretable: Toward an Understandable Machine Learning Encoder for Galaxy Cluster Cosmology. ApJ, 926(1), 45. doi: 10.3847/1538-4357/ac423e Google Scholar

Peek, J., & White, R. (2021, June). Search By Image: Citizen Science and Deep Learning for next-generation archives. In American astronomical society meeting abstracts (Vol. 53, p. 301.06).Google Scholar

Piras, D., Peiris, H. V., Pontzen, A., Lucie-Smith, L., Guo, N., & Nord, B. (2023, June). A robust estimator of mutual information for deep learning interpretability. Machine Learning: Science and Technology, 4(2), 025006. doi: 10.1088/2632-2153/acc444 Google Scholar

Psaros, A. F., Meng, X., Zou, Z., Guo, L., & Karniadakis, G. E. (2023, March). Uncertainty quantification in scientific machine learning: Methods, metrics, and comparisons. Journal of Computational Physics, 477, 111902. doi: 10.1016/j.jcp.2022.111902 CrossRef Google Scholar

Quinlan, G. D., & Tremaine, S. (1992, December). On the reliability of gravitational N-body integrations. MNRAS, 259(3), 505–518. doi: 10.1093/mnras/259.3.505 Google Scholar

Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019, February). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707. doi: 10.1016/j.jcp.2018.10.045 CrossRef Google Scholar

Ramírez-Pérez, C., Sanchez, J., Alonso, D., & Font-Ribera, A. (2022, May). CoLoRe: fast cosmological realisations over large volumes with multiple tracers. J. Cosmology Astropart. Phys., 2022(5), 002. doi: 10.1088/1475-7516/2022/05/002 Google Scholar

Razzano, M., Di Renzo, F., Fidecaro, F., Hemming, G., & Katsanevas, S. (2023, March). GWitchHunters: Machine learning and citizen science to improve the performance of gravitational wave detector. Nuclear Instruments and Methods in Physics Research A, 1048, 167959. doi: 10.1016/j.nima.2022.167959 CrossRef Google Scholar

Reza, M., Zhang, Y., Nord, B., Poh, J., Ciprijanovic, A., & Strigari, L. (2022, July). Estimating Cosmological Constraints from Galaxy Cluster Abundance using Simulation-Based Inference. In Machine learning for astrophysics (p. 20). doi: 10.48550/arXiv.2208.00134 CrossRef Google Scholar

Ricker, G. R., Winn, J. N., Vanderspek, R., Latham, D. W., Bakos, G. Á., Bean, J. L., … Villasenor, J. (2014, Aug). Transiting Exoplanet Survey Satellite (TESS). In Space telescopes and instrumentation 2014: Optical, infrared, and millimeter wave (Vol. 9143, p. 914320). doi: 10.1117/12.2063489 CrossRef Google Scholar

Riggi, S., Vitello, F., Becciani, U., Buemi, C., Bufano, F., Calanducci, A., … Umana, G. (2019, October). Cuc(aesar) source finder: Recent developments and testing. Publ. Astron. Soc. Australia, 36, e037. doi: 10.1017/pasa.2019.29 CrossRef Google Scholar

Rosofsky, S. G., Al Majed, H., & Huerta, E. A. (2022, March). Applications of physics informed neural operators. arXiv e-prints, arXiv:2203.12634. doi: 10.48550/arXiv.2203.12634 Google Scholar

Rudin, C. (2018, November). Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. arXiv e-prints, arXiv:1811.10154. doi: 10.48550/arXiv.1811.10154 Google Scholar

Sako, M., Bassett, B., Becker, A. C., Brown, P. J., Campbell, H., Wolf, R., … Zheng, C. (2018, June). The Data Release of the Sloan Digital Sky Survey-II Supernova Survey. PASP, 130(988), 064002. doi: 10.1088/1538-3873/aab4e0 CrossRef Google Scholar

Schmidt, M., & Lipson, H. (2009). Distilling free-form natural laws from experimental data. Science, 324(5923), 81–85. doi: 10.1126/science.1165893 CrossRef Google Scholar PubMed

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2016, October). Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. arXiv e-prints, arXiv:1610.02391. doi: 10.48550/arXiv.1610.02391 Google Scholar

Shukla, K., Xu, M., Trask, N., & Karniadakis, G. E. (2022, May). Scalable algorithms for physics-informed neural and graph networks. arXiv e-prints, arXiv:2205.08332. doi: 10.48550/arXiv.2205.08332 Google Scholar

Siemiginowska, A., Eadie, G., Czekala, I., Feigelson, E., Ford, E. B., Kashyap, V., … Young, C. A. (2019, May). The Next Decade of Astroinformatics and Astrostatistics. BAAS, 51(3), 355. doi: 10.48550/arXiv.1903.06796 Google Scholar

Square Kilometre Array Cosmology Science Working Group, Bacon, D. J., Battye, R. A., Bull, P., Camera, S., Ferreira, P. G., … Zuntz, J. (2020, March). Cosmology with Phase 1 of the Square Kilometre Array Red Book 2018: Technical specifications and performance forecasts. Publ. Astron. Soc. Australia, 37, e007. doi: 10.1017/pasa.2019.51 CrossRef Google Scholar

Stein, G., Seljak, U., Böhm, V., Aldering, G., Antilogus, P., Aragon, C., … Nearby Supernova Factory (2022, August). A Probabilistic Autoencoder for Type Ia Supernova Spectral Time Series. ApJ, 935(1), 5. doi: 10.3847/1538-4357/ac7c08 CrossRef Google Scholar

Sullivan, J. M., Prijon, T., & Seljak, U. (2023, March). Learning to Concentrate: Multi-tracer Forecasts on Local Primordial Non-Gaussianity with Machine-Learned Bias. arXiv e-prints, arXiv:2303.08901. doi: 10.48550/arXiv.2303.08901 Google Scholar

The CHIME/FRB Collaboration, Andersen, B. C., Bandura, K., Bhardwaj, M., Boyle, P. J., … Zwaniga, A. (2023, January). CHIME/FRB Discovery of 25 Repeating Fast Radio Burst Sources. arXiv e-prints, arXiv:2301.08762. doi: 10.48550/arXiv.2301.08762 CrossRef Google Scholar

Tohuvavohu, A., Kennea, J. A., DeLaunay, J., Palmer, D. M., Cenko, S. B., & Barthelmy, S. (2020, September). Gamma-Ray Urgent Archiver for Novel Opportunities (GUANO): Swift/BAT Event Data Dumps on Demand to Enable Sensitive Subthreshold GRB Searches. ApJ, 900(1), 35. doi: 10.3847/1538-4357/aba94f CrossRef Google Scholar

Tulio Ribeiro, M., Singh, S., & Guestrin, C. (2016, February). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. arXiv e-prints, arXiv:1602.04938. doi: 10.48550/arXiv.1602.04938 Google Scholar

Vafaei Sadr, A., Vos, E. E., Bassett, B. A., Hosenie, Z., Oozeer, N., & Lochner, M. (2019, April). DEEPSOURCE: point source detection using deep learning. MNRAS, 484(2), 2793–2806. doi: 10.1093/mnras/stz131 CrossRef Google Scholar

van Leeuwen, J., Kooistra, E., Oostrum, L., Connor, L., Hargreaves, J. E., Maan, Y., … Ziemke, J. (2023, April). The Apertif Radio Transient System (ARTS): Design, commissioning, data release, and detection of the first five fast radio bursts. A&A, 672, A117. doi: 10.1051/0004-6361/202244107 Google Scholar

van Roestel, J., Duev, D. A., Mahabal, A. A., Coughlin, M. W., Mróz, P., Burdge, K., … Kulkarni, S. R. (2021, June). The ZTF Source Classification Project. I. Methods and Infrastructure. AJ, 161(6), 267. doi: 10.3847/1538-3881/abe853 CrossRef Google Scholar

Vanderlinde, K., Liu, A., Gaensler, B., Bond, D., Hinshaw, G., Ng, C., … Kaspi, V. (2019, October). The Canadian Hydrogen Observatory and Radio-transient Detector (CHORD). In Canadian long range plan for astronomy and astrophysics white papers (Vol. 2020, p. 28). doi: 10.5281/zenodo.3765414 CrossRef Google Scholar

Walmsley, M., Smith, L., Lintott, C., Gal, Y., Bamford, S., Dickinson, H., … Wright, D. (2019, 10). Galaxy Zoo: probabilistic morphology through Bayesian CNNs and active learning. Monthly Notices of the Royal Astronomical Society, 491(2), 1554–1574. Retrieved from https://doi.org/10.1093/mnras/stz2816doi:10.1093/mnras/stz2816 Google Scholar

Wang, Y.-T., Liu, H.-Y., & Piao, Y.-S. (2023, February). Self-supervised learning for gravitational wave signal identification. arXiv e-prints, arXiv:2302.00295. doi: 10.48550/arXiv.2302.00295 Google Scholar

Yong, S. Y., Hobbs, G., Huynh, M. T., Rolland, V., Petersson, L., Norris, R. P., … Zic, A. (2022, November). SPARKESX: Single-dish PARKES data sets for finding the uneXpected - a data challenge. MNRAS, 516(4), 5832–5848. doi: 10.1093/mnras/stac2558 Google Scholar

Yu, W., Richards, G., Buat, V., Brandt, W. N., Banerji, M., Ni, Q., … Yang, J. (2022, July). Lsstc agn data challenge 2021. Zenodo. Retrieved from https://doi.org/10.5281/zenodo.6878414doi:10.5281/zenodo.6878414 Google Scholar

Zhang, C., Wang, C., Hobbs, G., Russell, C. J., Li, D., Zhang, S. B., … Ren, Z. Y. (2020, October). Applying saliency-map analysis in searches for pulsars and fast radio bursts. A&A, 642, A26. doi: 10.1051/0004-6361/201937234 Google Scholar

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2015, December). Learning Deep Features for Discriminative Localization. arXiv e-prints, arXiv:1512.04150. doi: 10.48550/arXiv.1512.04150 Google Scholar

Zhu-Ge, J.-M., Luo, J.-W., & Zhang, B. (2023, February). Machine learning classification of CHIME fast radio bursts - II. Unsupervised methods. MNRAS, 519(2), 1823–1836. doi: 10.1093/mnras/stac3599 CrossRef Google Scholar

Zuntz, J., Lanusse, F., Malz, A. I., Wright, A. H., Slosar, A., Abolfathi, B., … LSST Dark Energy Science Collaboration (2021, October). The LSST-DESC 3x2pt Tomography Optimization Challenge. The Open Journal of Astrophysics, 4(1), 13. doi: 10.21105/astro.2108.13418 Google Scholar

Article contents

How do we design data sets for Machine Learning astronomy?

Abstract

Keywords

Information

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests