The penetration strategy of hypersonic vehicles in hostile environments is a critical factor in determining their effectiveness in completing reconnaissance or strike missions. Reinforcement learning (RL), as an end-to-end method, exhibits inherent advantages in addressing complex problems. However, existing research indicates that to enhance the efficiency of RL-based strategies, further advancements are necessary to reduce training costs and improve generalisation capabilities. This paper introduces a RL-based cooperative guidance law for multi-hypersonic vehicles, incorporating the estimated remaining time-of-flight and the absolute value of the bank angle obtained through a predictor-corrector method. The observation space and reward function are specifically designed to simplify the complex decision-making problem into a single-value decision problem, thereby reducing computational complexity and training costs. The proposed guidance law integrates the observation space, reward function and action space within the reinforcement learning framework to control flight trajectories, flight time and penetration of no-fly zones, ensuring compliance with multiple constraints. Model training and simulation tests conducted under multiple constraints demonstrate that the proposed approach reduces the training iterations required for the reinforcement learning agent and improves decision-making efficiency. Furthermore, simulations under different no-fly zone distributions confirm the proposed guidance approach’s high generalisation ability.