Although ‘in-the-wild’ technology testing provides an important opportunity to collect evidence about the performance of new technologies in real world deployment environments, such tests may themselves cause harm and wrongfully interfere with the rights of others. This paper critically examines real-world AI testing, focusing on live facial recognition technology (FRT) trials by European law enforcement agencies (in London, Wales, Berlin, and Nice) undertaken between 2016 and 2020, which serve as a set of comparative case studies. We argue that there is an urgent need for a clear framework of principles to govern real-world AI testing, which is currently a largely ungoverned ‘wild west’ without adequate safeguards or oversight. We propose a principled framework to ensure that these tests are undertaken in an epistemically, ethically, and legally responsible manner, thereby helping to ensure that such tests generate sound, reliable evidence while safeguarding the human rights and other vital interests of others. Although the case studies of FRT testing were undertaken prior to the passage of the EU’s AI Act, we suggest that these three kinds of responsibility should provide the foundational anchor points to inform the design and conduct of real-world testing of high-risk AI systems pursuant to Article 60 of the AI Act.