Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-01-26T21:44:12.271Z Has data issue: false hasContentIssue false

From Bytes to Bites; Advancing Data Collection Methodologies for Enhanced Branded Food Insights

Published online by Cambridge University Press:  16 December 2024

L.B. Kirwan
Affiliation:
Nutritics Ltd, 22C Town centre mall, Main Street, Swords, Dublin, K67 FY88
E. O’Sullivan
Affiliation:
Nutritics Ltd, 22C Town centre mall, Main Street, Swords, Dublin, K67 FY88
S. Hogan
Affiliation:
Nutritics Ltd, 22C Town centre mall, Main Street, Swords, Dublin, K67 FY88
F. Douglas
Affiliation:
Nutritics Ltd, 22C Town centre mall, Main Street, Swords, Dublin, K67 FY88
D’O Kelly
Affiliation:
Nutritics Ltd, 22C Town centre mall, Main Street, Swords, Dublin, K67 FY88
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Nutrition research relies on food databases which are extensively used in dietary surveys, clinical practice, research, and policy development (1). Online data volume is expected to increase up to 180 zettabytes by 2025, due to a proliferation of internet-connected devices, the growth of social media platforms, and a digital transformation of industries (2). Webscraping, a method to extract data from websites, has been previously used in Ireland to evaluate online retailer information as a potential source for monitoring food reformulation efforts in the Irish retail market (3). This study aims to outline a process for, and evaluate the use of, webscraping on online supermarket websites to increase data availability to researchers.

An online supermarket website was selected to trial the new process. Octoparse software version 8 was downloaded. 12 data fields of interest were identified; cost, lifestyle, net weight, Directions for use, Storage instructions, Nutrition information, Front of pack information, legal name, brand name, manufacturer, ingredients, and allergy advice. A process was defined for data web scraping in four main steps; 1) collection of category level URL’s, 2) collection of product level URL’s, 3) collection of data at product level within defined fields and 4) data cleaning and re-structuring. A workflow was created in Octoparse for steps i - iii and step iv was completed using Excel version 16.69.1.

83 category level page links were generated and entered into Octoparse software. Webscraping was completed on 3,095 product level URLs. Data on 1,450 products (47%) were successfully scraped as they had data within the 12 defined data fields. A new dataset was created for the 1,450 products with data fields including information on nutrition (energy, fat, of which saturates, carbohydrate, of which sugars, fibre, protein and salt), costs per serving and per kg, lifestyle factors (e.g. whether a product was vegetarian or vegan), ingredient lists and allergy advice. 637 products (44%) were found to have vegetarian/vegan claims. Micronutrient level data was limited.

An increased availability of online data presents an opportunity for the development of new and more systematically updated datasets, and may increase the availability of information on branded products. Webscraping enables researchers to create new databases, and systematically update datasets, with less resources. This study enhances the availability of data and may enable researchers to explore new avenues for understanding food environments. Future research should test the process on additional websites to increase coverage of the Irish retail market and across different regions, identify sources with more in-depth nutritional data, and evaluate use case in mobile applications. Web scraping offers a promising tool for advancing research in food science and nutrition, and providing access to diverse datasets for research and innovation that change with the times.

Type
Abstract
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of The Nutrition Society

References

Yeung, Andy Wai Kan (2023) Nutrients 15, no. 16: 3548. https://doi.org/10.3390/nu15163548.CrossRefGoogle Scholar
Taylor et al. (2023) Statista. Amount of data created, consumed, and stored 2010-2020, with forecasts to 2025. https://www.statista.com/statistics/871513/worldwide-data-created/.Google Scholar
O’Neill, M et al. (2022) Proceedings of the Nutrition Society. 82 (OCE4), E241.CrossRefGoogle Scholar