To study the potential of generative AI for generating high-quality input texts for a reading comprehension task on specific CEFR levels in German, we investigated the comparability of reading texts from a high-stakes German exam used as benchmarks for the purpose of this study and those generated by ChatGPT (3.5 and 4). These three types of texts were analyzed according to a variety of linguistic features and evaluated by three assessment experts. Our findings indicate that AI-generated texts provide a valuable starting point for the production of test materials, but they require adjustments to align with benchmark texts. Computational analysis and expert evaluations identified key discrepancies that necessitate careful control of certain textual features. Specifically, modifications are needed to address the frequency of nominalizations, lexical density, the use of technical vocabulary, and non-idiomatic expressions that are direct translations from English. To enhance comparability with benchmark texts, it is essential to incorporate features such as examples illustrating the discussed phenomena and the use of passive constructions in the AI-generated content. We discuss the consequences of the usage of ChatGPT for input text generation and point out important aspects to consider when using generated texts as input materials in assessment tasks.