This study examined the capacity of ChatGPT-4 to assess L2 writing in an accurate, specific, and relevant way. Based on 35 argumentative essays written by upper-intermediate L2 writers in higher education, we evaluated ChatGPT-4’s assessment capacity across four L2 writing dimensions: (1) Task Response, (2) Coherence and Cohesion, (3) Lexical Resource, and (4) Grammatical Range and Accuracy. The main findings were (a) ChatGPT-4 was exceptionally accurate in identifying the issues across the four dimensions; (b) ChatGPT-4 demonstrated more variability in feedback specificity, with more specific feedback in Grammatical Range and Accuracy and Lexical Resource, but more general feedback in Task Response and Coherence and Cohesion; and (c) ChatGPT-4’s feedback was highly relevant to the criteria in the Task Response and Coherence and Cohesion dimensions, but it occasionally misclassified errors in the Grammatical Range and Accuracy and Lexical Resource dimensions. Our findings contribute to a better understanding of ChatGPT-4 as an assessment tool, informing future research and practical applications in L2 writing assessment.