This study investigates the incorporation of advanced heating, ventilation, and air conditioning (HVAC) systems with reinforcement learning (RL) control to enhance energy efficiency in low-energy buildings amid the extreme seasonal temperatures of Tehran. We conducted comprehensive simulation assessments using the EnergyPlus and HoneybeeGym platforms to evaluate two distinct reinforcement learning models: traditional Q-learning (Model A) and deep reinforcement learning (DRL) with neural networks (Model B). Model B consisted of a deep convolutional network architecture with 256 neurons in each hidden layer, employing rectified linear units as activation functions and the Adam optimizer at a learning rate of 0.001. The results demonstrated that the RL-managed systems resulted in a statistically significant reduction in energy-use intensity of 25 percent (p < 0.001), decreasing from 250 to 200 kWh/m² annually in comparison to the baseline scenario. The thermal comfort showed notable improvements, with the expected mean vote adjusting to 0.25, which falls within the ASHRAE Standard 55 comfort range, and the percentage of anticipated dissatisfaction reduced to 10%. Model B (DRL) demonstrated a 50 percent improvement in prediction accuracy over Model A, with a mean absolute error of 0.579366 compared to 1.140008 and a root mean square error of 0.689770 versus 1.408069. This indicates enhanced adaptability to consistent daily trends and irregular periodicities, such as weather patterns. The proposed reinforcement learning method achieved energy savings of 10–15 percent compared to both rule-based and model predictive control and approximately 10 percent improvement over rule-based control, while employing fewer building features than existing state-of-the-art control systems.