This article implements a critical method for assessing bias in large historical datasets that we term the “Environmental Scan.” The Environmental Scan sheds new light on newspaper collections by linking newly available “reference metadata” gathered from historical sources to existing full-text and catalogue metadata. The rise of computational methods in history and the social sciences, in tandem with newly “datafied” source materials, creates a challenge for researchers to adapt their existing critical practices to the increasing scale and complexity of computational research. To help address this challenge, the Environmental Scan situates big historical datasets in much greater context, including estimating what materials are missing, thereby revealing the ways digital collections can be “oligoptic” in nature. Using the British Newspaper Archive (BNA) as a case study, we diagnose the biases and imbalances in the digitised Victorian press. We determine which voices are under- or over-represented in relation to the political composition of the collection as well as its content and we trace the origins of these biases in the digitisation process. This article informs future interdisciplinary discussions about data bias and offers a conceptual model adaptable to diverse historical datasets. The Environmental Scan provides a more nuanced and accurate understanding of how newspaper data reflects past societies, making it a valuable tool for researchers.