Policy Significance Statement
Our analysis reveals key policy trends in the implementation of the UN Convention on the Rights of Persons with Disabilities (CRPD) across all existing State Reports (N = 170). We provide policymakers with a systematic assessment of global disability rights, including a strong focus on service access, legal protection, and accessibility improvements, alongside growing emphasis on social justice and economic inclusion. We identify critical policy gaps in international engagement mechanisms. These findings provide actionable insights for policymakers to strengthen disability rights implementation and inclusive development. This study also introduces an innovative methodology to evaluate policy combining text mining, Natural Language Processing (NLP), and GenAI tools. Even with the ongoing challenges and problems with using GenAI tools, our methodology demonstrates significant potential for governments and civil society organizations to monitor multilateral treaties, offering them a tool for evidence-based policy evaluation and accountability, particularly in human rights treaty monitoring.
1. Introduction
Over 16% of the global population (1.3 billion people) live with disabilities (WHO, 2024). Data from this 2024 study indicate a 30% increase from the landmark 2011 World Report on Disability (WHO, 2011). Furthermore, 2023 data suggest that more than 80% of persons with disabilities are living in the Global South (UNDRR, 2023). The UN Sustainable Development Goals (SDGs) and leading socioeconomic development strategies see inclusion of persons with disabilities as critical to reducing the “significant health inequalities and reduced life expectancy for persons with disabilities compared to those without” (UNDRR, 2023). The United Nations (UN) Convention on the Rights of Persons with Disabilities (CRPD) was designed to address these concerns.
Adopted on 13 December 2006, and entered into force on 3 May 2008, the CRPD is both a human rights treaty and an international development instrument intended to mark a shift in the approach to persons with disabilities away from the “medical model”—which treats disability as a “disease to be cured” leading to a charity-based approach—to a more modern “social justice and rights-based approach” (Rimmerman, Reference Rimmerman2013).
The CRPD has been ratified or acceded to by 191 countries, making them State Parties, with 104 also ratifying its Optional Protocol. The Optional Protocol allows individuals in that country to report violations of the CRPD to the independent CRPD Committee at the UN Office of the High Commissioner for Human Rights (OHCHR) in Geneva. All States Parties to the CRPD must submit initial reports to the CRPD Committee within 2 years of ratification and subsequent reports every 4 years (OHCHR, 2020). They must also present the report in person to the committee. These reports provide insights into global, regional, and national CRPD implementation and policy priorities. While critically important, the CRPD is but one example of the more than 500 multilateral treaties that have been signed by more than 193 countries around the world. These treaties address such critical and diverse global issues as human rights, environmental protection, and trade. Monitoring and evaluating progress toward implementing these treaties is critical to governments, researchers, and civil society advocates around the world.
2. Purpose
Nearly 18 years after its adoption, assessing the implementation of the CRPD is vital. This study aimed to: (1) establish a replicable process for identifying CRPD implementation priorities at national, regional, and global levels; (2) evaluate the focus on social justice and economic inclusion for persons with disabilities; and (3) assess our hybrid approach of text mining and Natural Language Processing (NLP) with GenerativeAI tools using large language models (LLMs).
3. Literature review
Our literature review examines six areas. The first area provides a brief overview of relevant literature on national, regional, and global monitoring of CRPD implementation. The second reviews methodological approaches comparing traditional text mining and NLP to LLMs for large-scale text analysis. Third, we focus on knowledge translation articles that contextualize our research in terms of broader policy and social implications. The fourth area investigates the shift away from the medical to rights-based model incited by the CRPD. The fifth expands upon the structural and institutional factors that affect CRPD implementation around the world. Finally, the sixth area explores regional variation in disability rights and inclusion data quality and availability for monitoring of CRPD implementation progress.
3.1. National CRPD implementation
Many CRPD State Parties, including Australia, India, Ecuador, and South Africa, have developed national strategies and action plans for CRPD implementation (UNDESA, 2024). Many of these plans include legislation and programmatic strategies, focusing on service delivery, mental health, intellectual disabilities, accessibility, gender equity, and intersections with disabilities (UN, 2006). We have applied human rights frameworks to assess the alignment of Canadian policy documents with the CRPD, including for children with disabilities (Cogburn et al., Reference Cogburn, Shikako-Thomas, Lai, Bui and Sprague2020). Steen and Lord (Reference Steen and Lord2006), and categorize the 50 CRPD articles into five levels:
-
(1) Introductory: Articles 1 and 2 define the treaty’s purpose and key terms.
-
(2) General applications: Articles 3–9 guide the interpretation and application of all subsequent articles.
-
(3) Substantive: Articles 10–30 outline the core rights and obligations to ensure full human rights enjoyment by persons with disabilities.
-
(4) Implementation and monitoring: Articles 31–40 describe the procedural frameworks, including the roles of the CRPD Committee.
-
(5) Operational rules: Articles 41–50 cover the treaty’s operational aspects such as entry into force and language authenticity.
3.2. Regional implementation
Broader regional strategies have also shaped CRPD implementation. The Inter-American Commission and Court on Human Rights incorporated the CRPD. It expanded upon the 2001 Inter-American Convention on the Elimination of All Forms of Discrimination against Persons with Disabilities (CIADDIS) by providing improved rights definitions and a socially conscious approach for persons with disabilities in the Americas (Chen and McDonough, Reference Chen and McDonough2022). All 10 countries in the Association of Southeast Asian Nations (ASEAN) have ratified the CRPD and are supported by the ASEAN Intergovernmental Commission for Human Rights (Hidayahtulloh, Reference Hidayahtulloh2019). The United Nations Economic and Social Commission for Asia and the Pacific (UNESCAP)‘s Incheon Strategy to “Make the Right Real” focuses on disability rights and inclusion across the region (UNESCAP, 2012). The ASEAN Enabling Master Plan 2025 builds upon CRPD-defined rights (Ryuhei, Reference Ryuhei2020). However, diverse contexts challenge uniform implementation in ASEAN (Cogburn and Kempin-Reuter, Reference Cogburn and Kempin-Reuter2017). In Africa, regional efforts include the African Union’s Disability Protocol (AU, 2018). All EU member states have ratified the CRPD, and it monitors implementation through its framework (EUAFR, 2024) and reflected CRPD objectives in its European Disability Strategy (2010–2020) (Hosking, Reference Hosking, Waddington, Quinn and Flynn2013). These differing strategies demonstrate compatibility with CRPD goals while also reflecting regional priorities and supplemental efforts for their individual disability inclusion structural and institutional needs. In contrast, international frameworks are unifying and cohesive with a heavier emphasis on monitoring and data collection.
3.3. Global implementation of the convention
International agencies such as the World Health Organization (WHO) and UN have led numerous initiatives related to global implementation of the CRPD. For example, the WHO Global Disability Action Plan (2014–2021) prioritizes accessibility, assistive technology, rehabilitation, and data collection to enhance the quality of life for persons with disabilities (WHO, 2014). In 2019, the UN established the Disability Inclusion Strategy, which measures progress across the system towards improved disability rights globally and have prompted further disability rights initiatives such as linking Disability rights to the SDGs as a way to advance implementation and considerations for persons with disabilities across multiple policy strategies (Mitra and Dominik, Reference Mitra and Dominik2023). Many organizations and scholars studying CRPD implementation continue to point to challenges of data collection and monitoring of implementation efforts (IDA, 2010). NLP tools can help unlock textual sources as data.
3.4. Text analysis with NLP and LLMs
NLP techniques range from statistical approaches to advanced modeling, utilizing open source languages like R and Python (Silge and Robinson, Reference Silge and Robinson2017; Bengfort et al., Reference Bengfort, Bilbro and Ojeda2018). In global policy analysis, Smith et al. (Reference Smith, Vacca, Mantegazza and Capua2021) used document embedding to assess UN SDG report similarities, while Kim and LaFleur (Reference Kim and LaFleur2020) analyzed UN resolutions to track changing policy priorities.
Recent research integrates LLMs with traditional text analysis. Riyadi et al. (Reference Riyadi, Kovacs, Serdült and Kryssanov2024) developed IndoGovBERT, a domain-specific model for SDG document processing. Törnberg (Reference Törnberg2023) proposes LLMs for various analytical tasks, while others examine their use in research coding (Tai et al., Reference Tai, Bentley, Xia, Sitt, Fankhauser, Chicas-Mosier and Monteith2024) and text clustering (Petukhova et al., Reference Petukhova, Matos-Carvalho and Fachada2024). Though “prompt engineering” has become important (Zhu et al., Reference Zhu, Wang, Qiang and Wu2023; Liu et al., Reference Liu, Tao, Meng, Yao, Zhao and Yang2024), generative LLMs have limitations in depth and clarity, potentially perpetuating bias and hallucinating information (Dulka, Reference Dulka2023; Du et al., Reference Du, Wang, Zhao, Deng, Liu, Lou, Zou, Venkit, Zhang, Srinath, Zhang, Gupta, Li, Li, Wang, Liu, Liu, Gao, Xia and Yin2024).
Devlin et al. (Reference Devlin, Chang, Lee and Toutanova2019) suggested that “feature-based” and “fine-tuning” approaches may improve contextual understanding, while Angin et al. (Reference Angin, Taşdemir, Yılmaz, Demiralp, Atay, Angin and Dikmener2022) advocated using LLMs to process UN sustainability reports, noting the need for automation given data volume.
Given the varying formats of CRPD State Reports, our hybrid approach combines traditional NLP’s explainability with LLMs’ flexibility, using GUIs and crafted prompts to increase accessibility for non-programmers.
3.5. Knowledge translation and evaluation
Effective knowledge translation (KT) calls for strategies that combine efforts from academia, community, and policy to inform societal advances. Information, especially when used to inform policy or drive change, needs to be accessible and actionable for a broader audience—requiring rethinking how data are presented to remove barriers between content and comprehension.
However, effective KT remains a formidable challenge due to a lack of infrastructure and skilled personnel required to translate data into action (Edwards et al., Reference Edwards, Zweigenthal and Olivier2019; Evans et al., Reference Evans, Zielinski, Chiba, Garcia-Soto, Ojaveer, Park, Ruwa, Schmidt, Simcock, Strati and Vu2021; Kalbarczyk et al., Reference Kalbarczyk, Rodriguez, Mahendradhata, Sarker, Seme, Majumdar, Akinyemi, Kayembe and Alonge2021). Traditional KT tools, such as workshops and printed materials, can become obsolete in fast-changing policy landscapes. Thus, the challenge becomes two-fold: how can we make research findings widely accessible while ensuring they stay relevant? And how can researchers equip communities to engage in high-level policy processes?
Edwards et al. (Reference Edwards, Zweigenthal and Olivier2019) recommended fostering collaboration between researchers and policymakers through KT strategies that include capacity-building initiatives, and digitized resources are recommended to aid in improving accessibility and providing real-time updates in everchanging policy contexts (Evans et al., Reference Evans, Zielinski, Chiba, Garcia-Soto, Ojaveer, Park, Ruwa, Schmidt, Simcock, Strati and Vu2021; Green et al., Reference Green, Moszczynski, Asbah, Morgan, Klyn, Foutry, Ndira, Selman, Monawe, Likaka, Sibande and Smith2021).
Our study’s capacity-building use of NLP and AI creates digital tools that: (1) can help streamline complex data into actionable insights and (2) are accessible to nonspecialist users, thus expanding possibilities for inclusive, multistakeholder participatory policymaking—especially in settings where governments or organizations may lack the resources to process voluminous data.
3.6. From Medical to rights-based models of disability
The CRPD marked a shift from the medical model of disability toward a human rights model, reframing disability as an issue of dignity, equality, and legal personhood. This model requires states to eliminate discriminatory systems and promote autonomy, unlike the medical model which treats disability as an individual deficit (Rimmerman, Reference Rimmerman2013).
Kayess and French (Reference Kayess and French2008) noted that this change transforms disabled persons from welfare recipients to active rights-holders. The CRPD incorporates elements of the social model, which identifies how societal structures create exclusion. Lawson and Beckett (Reference Lawson and Beckett2020) argued that this perspective works best alongside the rights-based model to translate identified injustices into enforceable obligations.
Despite progress, implementation remains uneven. Medicalized frameworks persist in some regions that prioritize protection over rights. Oyaro (Reference Oyaro2015) criticized African disability law’s historical charity focus, while Chaney (Reference Chaney2021) found that CRPD ratification often fails to produce reform in post-Soviet countries due to paternalistic attitudes. Even in the EU, Šubic and Ferri (Reference Šubic and Ferri2022) observed that inclusive language in national strategies frequently lacks meaningful implementation. Effective CRPD evaluation requires tools that distinguish between genuine rights-based engagement and performative alignment without enforceable protections.
3.7. Structural and Institutional Factors Driving CRPD Implementation
CRPD implementation varies globally based on existing institutional frameworks. Following CRPD Article 33, 118 member states have established National Human Rights Institutions (NHRIs) to monitor disability inclusion (GANHRI, 2024). Countries with NHRIs, including Australia, Germany, and the Philippines, demonstrate enhanced capacity for CRPD implementation (GANHRI, 2025). Regional frameworks such as the African Disability Protocol and the 2001 CIADDIS complement the CRPD (Oyaro, Reference Oyaro2015; Chen and McDonough, Reference Chen and McDonough2022). Nations with pre-existing disability inclusion institutions generally show better implementation progress.
Structural challenges persist in many countries. In Iran, “inadequate awareness,” economic neglect, and cultural barriers hinder health services for disabled persons (Najafi et al., Reference Najafi, Abdi, Khanjani, Dalvand and Amiri2021). South Africa struggles with CRPD Article 24 implementation in rural areas due to infrastructure deficits and “poor policy implementation” despite existing legislation (Chirowamhangu, Reference Chirowamhangu2024). The UN Global Disability Fund (2023) identified five preconditions for successful CRPD implementation: equality, service delivery, accessibility, compliant budgeting, and data accountability.
3.8. Regional variation in data quality and availability
Disability inclusion data quality varies globally, complicating CRPD implementation monitoring. The Disability Data Initiative’s DS-QR Database, containing over 3000 censuses and surveys from 199 countries, shows increasing adoption of UN Washington Group recommended questions, with Sub-Saharan Africa leading this trend more than Europe and Central Asia Carpenter et al., Reference Carpenter, Kamalakannan, Saikam, Alvarez, Hanass-Hancock, Murthy, Pinilla-Roncancio, Rivas Velarde, Teodoro and Mitra2024.
The EU’s 2021–2030 Disability Rights Strategy faces challenges due to incompatible national data systems and varying statistical methods (ECA, 2023). Sweden, the Netherlands, Spain, and Romania exemplify these discrepancies with unique approaches to defining disability status, limiting Eurostat’s effectiveness (ECA, 2023).
In Latin America, data gaps exist in education inclusion and gender-specific health services due to insufficient disaggregation (IDA, 2015). When countries in Asia-Pacific were requested to provide data to assess progress on UNESCAP’s Incheon Strategy to make the right real for persons with disabilities, fewer than 50% of respondents from 31 countries responded sufficiently (UNESCAP, 2022). While Thailand provides supporting data, smaller nations like Kiribati struggle due to limited budgets and insufficient infrastructure (UNESCAP, 2022). These regional variations highlight the need for accessible CRPD implementation monitoring approaches.
4. Conceptual framework
CRPD implementation is driven by a combination of state and nonstate actors, including regional and subregional organizations. Country State Reports signal national priorities. Through a detailed analysis of State Reports, we can identify CRPD progress and areas requiring more attention as illustrated by Figure 1.

Figure 1. Conceptualizing CRPD implementation.
5. Research questions
We ask three broad research questions, each operationalized with a series of inductive (exploratory) and deductive (confirmatory) italicized subquestions:
RQ1. What national priorities for CRPD implementation are reflected in the global corpus of State Reports?
RQ1.1 What are the most frequently occurring keywords and phrases in the corpus?
RQ1.2 What are the most important keywords and phrases in the corpus?
RQ1.3 What topics are present in the corpus?
RQ2. What named entities are found in the corpus?
RQ2.1 What types of named entities appear most frequently in the corpus?
RQ2.2 Are prominent organizations of persons with disabilities mentioned in higher/lower frequency than governmental entities?
RQ3. To what extent is the breadth of the CRPD covered in the corpus?
RQ3.1 What CRPD articles are most represented in the corpus?
RQ3.2 What CRPD article paragraphs are most represented in the corpus?
RQ3.3 What categories of the CRPD are most and least represented in the corpus?
RQ3.4 How does the distribution of CRPD articles differ by region?
RQ3.5 How does the distribution of CRPD articles differ by subregion?
RQ3.6 How does the distribution of CRPD articles differ by economic income group?
RQ3.7 What are the differences in the corpus between states that have and have not ratified the Optional Protocol?
RQ3.8 To what extent is the transition from the medical model to the social justice and rights-based model represented in the corpus?
RQ4. To what extent can GenAI tools also answer the preceding research questions?
6. Methodology
Our hybrid methodology applies inductive and deductive text mining and NLP techniques to analyze the complete corpus of CRPD State Reports (n = 179) and uses GenAI tools to further analyze a subset of the corpus (n = 20). We use RStudio (http://posit.co/) as our Integrated Development Environment (IDE) and R and Python programming languages. We leverage a suite of R and Python packages including: tm, tidytext, quanteda, Gensim, spaCy, and TextBlob. For the GenAI methods, we used Google AI Studio and the Gemini 1.5 Flash LLM (http://aistudio.google.com) along with their NotebookLM (http://notebooklm.google.com) and the PaLM 2 LLM, and OpenAI’s ChatGPT and the GPT-4o LLM (http://chatgpt.com).
6.1. Data Collection and preprocessing
The CRPD State Reports are available to the public via the OHCHR Treaty Body Database (https://tbinternet.ohchr.org/_layouts/15/TreatyBodyExternal/Home.aspx). We scraped the site using a custom Selenium script, and secured PDF documents of the State Reports. The total number of extant State Reports was 179.
The data preparation consisted of six parts:
-
(1) Imported the PDFs into RStudio and converted them to a dataframe with five variables: doc_id which is the document ID; text which is the text of the report; country which contained the report’s originating country; session which contained the report edition, for example, 1st, 2nd; year when the report was submitted.
-
(2) Compiled a list of UN regions by country (UNSD, n.d.) and merged that to the dataframe to get the variables Region and SubRegion.
-
(3) Compiled a list of income groups by country (The World Bank, n.d. and merged that to the dataframe to get the variable IncomeGroup.
-
(4) Created a variable called OptionalProtocol to identify states signing the Optional Protocol.
-
(5) Created a variable called cld3_lang to store a report’s language then filtered out non-English reports (n = 9). To identify a report’s language, we used Google’s Compact Language Detector v3 (CLD3) which is an open-source “neural network model for language identification” (Google, 2020). We used the R package “cld3” (Ooms, Reference Ooms2023) to apply the CLD3 model.
-
(6) Finally, we performed basic text cleaning and more advanced techniques. We applied a stopword dictionary to remove common English stopwords (e.g. “and,” “the,” “so”). We also built a custom stopword dictionary by selecting words with a frequency count in the 99.5th percentile. There were 248 custom stopwords with some examples being “disability,” “person,” and “article.” In addition, we removed numbers, domain names, and other unneeded patterns.
The resulting dataset was called “CRPD_StateReports,” consisting of 170 reports and 8 variables. For nonregional analysis, all 170 reports were used. For the regional analysis, reports containing missing values for region (n = 7) were removed resulting in 163 reports for analysis.
6.2. Term/phrase frequencies, TF*IDF, and topic modeling
To answer RQ1.1 and RQ1.2, we used term and phrase-frequency (collectively called N-grams) and TF*IDF (term frequency by inverted document frequency). N-grams analysis counts the frequency of words or phrases to find those occurring most frequently. TF*IDF is an alternative to TF alone and is a common heuristic used to identify “important” terms based on the premise that words and phrases appearing frequently are important but less so if occurring too frequently (Sparck Jones, Reference Sparck Jones1972; Manning et al., Reference Manning, Raghavan and Schütze2008; Cogburn, Reference Cogburn, Bui and Sprague2019).
RQ1.3 required topic modeling to answer, specifically an [unsupervised machine learning] approach to topic modeling called Latent Dirichlet Allocation (LDA). The LDA algorithm assumes that each “topic” in a corpus consists of a variety of different terms, and that each “document” in a corpus consists of multiple topics. The goal is to fit a model that represents both aspects simultaneously.
6.3. Named entity recognition (NER)
To answer RQ2.1 and RQ2.2, we first parsed the text to identify parts-of-speech and sentence structure. This allows us to conduct named entity recognition. For NER, we used the spaCy library in Python and the “en_core_web_sm” small English language model.
6.4. Lexicon/dictionary analysis
To answer RQ3, we employed a deductive approach, using a lexicon/dictionary-based analysis (also called a categorization model). We developed two custom lexicons representing concepts of interest in our corpus (Cogburn, Reference Cogburn, Bui and Sprague2019).
For our first lexicon, we structured CRPD articles (UNDESA, 2024) as our source text. Each article contains one or more paragraphs. Our lexicon is structured as follows:
-
1. Category: CRPD article;
-
2. Subcategory: article paragraph;
-
3. Entry: 1–3-word phrases excluding common stopwords.
We imported all articles and their corresponding paragraphs (242 unique) into Excel. To validate our lexicon, we focused on a high signal-to-noise ratio. For internal validation, we made each entry unique to its paragraph when possible, or at minimum unique to its article. The initial dictionary contained 697 entries covering all 50 articles and 223 paragraphs (92%).
For external validation, we conducted a four-step robustness check:
-
1. Identified the most frequent 1–3 word phrases in the CRPD_StateReports dataset.
-
2. Selected phrases in the 99th percentile frequency and merged them.
-
3. Performed an inner join with our initial dictionary, identifying 88 potentially problematic phrases.
-
4. Used keyword-in-context analysis to review these phrases and other entries for noise.
The final CRPD dictionary contained 628 entries covering all 50 articles and 217 paragraphs (~90%).
We used a similar process for our second dictionary measuring the shift from “medical model” to “social justice and rights-based model” of disability. Based on literature review and subject-matter expertise, we identified unique phrases for each category. After robustness checking, the final dictionary contained 46 entries.
6.5. Generative AI and LLM analysis
To answer RQ4, we utilized Generative AI tools. GenAI tools allow the researcher to harness the capabilities of various LLMs. Using an LLM requires the user to develop a prompt engineering strategy which may include supplying the LLM with “examples.” These examples, called “shots,” may be zero-shot (no examples), one-shot (one example), or few-shot (two or more examples). A related approach is to use retrieval-augmented generation (RAG)-AI approaches. With RAG-AI, the user supports their analysis by supplying specific documents that can be used as “ground truth.” The user then submits the prompt(s) to the LLM using either a GUI or an API. Most current GenAI models use the transformer architecture, such as GPT (generative pretrained transformer), or Google’s BERT (Bidirectional Encoder Representations from Transformers). A Transformer is a deep learning model, first introduced by Google (Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017). Once the prompt is submitted, the transformer works to understand your prompt, and then sends it through its multilevel deep learning model to refine the predicted outcome. Figure 2 illustrates this process.

Figure 2. Transformer architecture, adapted from Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017.
To address RQ4, we developed a multishot prompt and RAG-AI strategy encompassing all questions from our traditional NLP analysis. We used a sample of 20 State Reports as the ground truth in our RAG-AI approach using the Gemini 1.5 Flash LLM. We set the temperature to 1, leaving other settings unchanged. We used a similar RAG-AI approach using Google’s NotebookLM, which uses PaLM (Pathways Language Model) and a sample of 50 State Reports. NotebookLM has a more user-friendly interface and offers many and growing features but is more limited in terms of customizations. We also used OpenAI’s ChatGPT, selecting the GPT-4 LLM. Our initial prompt for each of these GenAI tools was as follows:
“The twenty documents I provided are reports from a sample of countries that have signed and ratified the United Nations Convention on the Rights of Persons with Disabilities (CRPD), making them a state party to the convention. CRPD States Parties are required to produce these state reports. Please review these documents and respond with the following information, organized by the section numbers below: 1. What are the most important ideas contained in these documents, as represented by keywords and phrases? Also, please identify the top ten ‘topics’ represented in the documents. 2. What ‘named entities’ are found in these documents, please organize the response by ‘Organizations’, ‘Organizations of Persons with Disabilities’, ‘People’, ‘Government Entities’, and ‘Legal Documents’. 3. Which of the 50 articles in the CRPD are most represented in these documents?”
Through iterative prompt development, we asked the GenAI tools research questions that mirrored those we asked using the traditional text mining and NLP techniques.
7. Findings
We now organize and present our findings aligned with our research questions and subquestions. Our first main question, RQ1, asks: What national priorities for CRPD implementation are reflected in the global corpus of State Reports? The goal with this question is to develop a basic understanding of the areas on which State Parties report focusing while implementing the convention. To answer RQ1.1 and RQ1.2 about important keywords and phrases, we used term frequency only. TF*IDF, normally a useful heuristic, yielded country names and is not reported. The most frequent word is “regional,” followed by “required” and “professional,” suggesting prevalent regional approaches. Other frequent words such as “independent,” and “comprehensive” indicate the report structure. “Visual” and “impairment” highlight that community. Well-being-related words such as “abuse” and “risk” are present. Figure 3 illustrates frequently occurring words.

Figure 3. Frequently occurring words.
Figure 4 shows the 25 most frequent bigrams, echoing themes from Figure 3. The blind and low vision community represented by “visually impaired.” Technology featured in “assistive devices,” “assistive technology,” and “universal design.” Well-being represented by “cruel inhuman,” and “sexual abuse,” “recreation leisure,” and “fundamental freedoms.”

Figure 4. Frequently occurring phrases (bigrams).
RQ1.3 asks: What topics are present in the corpus? To answer this subquestion, we used LDA topic modeling. We fit a 10-topic LDA model to the data and identified the 30 most salient terms associated with those topics. The Python Gensim package uses a library called pyLDAvis to produce an interactive visualization of topic models, along with salient words. A static version is presented in Figure 5.

Figure 5. Static representation of the most salient terms in the topic model.
Table 1 summarizes the topics, identifying the terms most salient within that specific topic. Please note, the terms “disability” and “person” are ubiquitous across all the topics (which is to be expected, given the dataset) and are removed to aid topic interpretation.
Table 1. LDA salient terms and topics

Three other topics were multilingual documents and should have been pulled.
Our LDA topic model yields a coherence score of 0.461, which is considered moderate. This score ranges from 0 to 1, with higher scores reflecting better topic quality and semantic relatedness among the top words in each topic. In this case, some topics are cohesive, and others lack definition or contain unrelated words.
To better analyze the variety and kinds of people, places, and organizations referenced in the State Reports, RQ2 asks: What named entities are present in the corpus? We ask two subquestions: RQ2.1: What types of named entities appear most frequently in the corpus?; and RQ2.2: Are prominent organizations of persons with disabilities mentioned in higher/lower frequency than governmental entities? We use the spaCy model to identify the common entity types in Table 2.
Table 2. Most common SpaCy entity types

The spaCy package uses transformer-based deep learning models for NER. These models create contextualized token representations that capture semantic and syntactic patterns, utilizing either integrated pre-trained transformers or spaCy’s own word vectors as foundational features (Honnibal et al., Reference Honnibal, Montani, Van Landeghem and Boyd2020). Figure 6 illustrates the frequently occurring named entities in our corpus.

Figure 6. Frequently occurring entities.
The most frequently occurring named entities in the CRPD State Reports are CARDINAL numbers, organizations, dates, legal references, geopolitical entities, and persons. These categories suggest a strong emphasis on quantitative reporting, institutional actors, and references to legal frameworks and jurisdictions throughout the corpus. The high frequency of CARDINAL numbers suggests that State Reports often include quantitative information, such as statistics, counts of services, or budget figures, to demonstrate compliance or progress. The prominence of organizations (ORG) highlights the central role of institutional actors, including government bodies, international agencies, and civil society organizations, in implementing the CRPD. Frequent mentions of dates indicate temporal framing, such as referencing legislation enactment, program start dates, or reporting periods. The presence of legal references (LAW) reflects a strong reliance on rights-based language and formal frameworks, while the frequent occurrence of geopolitical entities (GPEs) shows that national and regional contexts are a key dimension of how States frame disability inclusion. Finally, the identification of persons (PERSON), although less frequent, may reflect individual case examples, testimonies, or leadership references used to humanize or exemplify progress. Overall, the distribution of named entities reveals how reporting is grounded in quantifiable evidence, institutional structures, and legal commitments, with some narrative elements woven in through references to individuals. These findings are certainly supported by our expert close-reading sanity check as well.
Next, we explored the top 10 named entities within selected high-frequency entity types, including ORG, PERSON, GPE, NORP, and LAW. Figure 7 displays the most frequently mentioned organizational entities (ORG) in the CRPD State Reports.

Figure 7. Frequently occurring ORG entities.
The most common ORG entity is “State,” which likely reflects frequent references to the “State Party” or “the State” in the context of reporting obligations under the CRPD. This aligns with the treaty’s structure, which centers State responsibility for implementation. Other commonly referenced organizational entities include “Convention” and “Government,” underscoring the role of formal governance structures and international legal instruments. Mentions of “EU” and “UN” point to the influence of regional and international organizations in shaping disability policy and monitoring. Terms such as “PWD,” “CRPD,” and “Reply” likely stem from standardized reporting language, including responses to the Committee on the Rights of Persons with Disabilities. The appearance of “the Ministry of Health” indicates the involvement of national ministries, particularly in areas like health service provision and disability assessment. This finding may indicate the continued persistence of “medical model” approaches. These results suggest that organizational references in the reports are heavily institutional and formal in nature, centered on governance, accountability, and international oversight. They also reflect the administrative structure through which States articulate their implementation of the CRPD—primarily through national ministries and through interactions with global bodies.
To better understand what specific portions of the CRPD are receiving the most attention in State Party reporting, RQ3 asks: To what extent is the breadth of the CRPD covered in the corpus? We ask eight operationalized subquestions—to answer RQ3.1 to RQ3.7, we applied the CRPD_dictionaryFinal; to answer RQ3.8, we applied the MedvsRights_dictionaryFinal.
RQ3.1 asks: What CRPD articles are most represented in the corpus? Overall, in the 25 most represented articles shown in Figure 8, Article 8 (Awareness-raising) is the most represented and deals with measures to raise awareness and foster respect for persons with disabilities (UN, 2006). The second is Article 23 (Respect for home and the family) and deals with measures to ensure marital, childbearing, familial, and parental rights for persons with disabilities (UN, 2006).

Figure 8. Most represented CRPD articles.
RQ3.2 asks: What CRPD article paragraphs are most represented in the corpus? Overall, the 25 most represented paragraphs are shown in Figure 9. The most represented CRPD paragraph is Article 1 (Purpose)—Paragraph 2 which outlines who a person with disability is and how their disability in interaction with other barriers may “hinder their full and effective participation in society on an equal basis with others” (UN, 2006). Article 8 (Awareness-raising)—Paragraph 1(b) is the second most represented. It specifies that State Parties should adopt effective measures to “combat stereotypes, prejudices and harmful practices relating to persons with disabilities, including those based on sex and age, in all areas of life” (UN, 2006).

Figure 9. Most represented CRPD paragraphs.
RQ3.3 asks: What categories of the CRPD are most and least represented in the Corpus? The dictionary captures the following article categories: introductory (2, 100%), substantive (13, 61%), general applications (5, 71%), implementation and monitoring (4, 40%), and operational rules (1, 10%). The most frequently occurring article paragraphs come from the following article categories: introductory 2 (100%), substantive 9 (43%), general applications 3 (43%), and implementation and monitoring 3 (30%). Figure 10 illustrates the least represented CRPD articles; they fall into the following article categories: 0 introductory, 8 substantive (39%), 2 general applications (29%), 6 implementation and monitoring (60%), and 9 operational rules (90%).

Figure 10. Least represented CRPD articles.
RQ3.4 asks: How does the distribution of CRPD articles differ by region? The top 10 most represented articles by region are shown in Figure 11. All five regions share two substantive articles: Article 13 (Access to justice), which deals with ensuring effective access to justice for persons with disabilities by facilitating their role as participants (witness, defendants, and so on) and providing training for working with persons with disabilities in the field of administration of justice such as police and prison staff (UN, 2006), and Article 23 (Respect for home and the family). Europe was the only region that had the substantive Article 26 (Habilitation and Rehabilitation), which deals with measures to enable persons with disabilities to attain and maintain maximum independence, full physical, mental, social and vocational ability by organizing, strengthening, and extending comprehensive habilitation and rehabilitation services (UN, 2006).

Figure 11. Most represented CRPD articles by region.
The top 10 least represented articles by region are shown in Figure 12. All five regions only share one substantive article, Article 10 (Right to life), which recognizes the inherent right to life for all human beings and deals with measures to ensure the effective enjoyment of that right by persons with disabilities (UN, 2006). All regions except Oceania share the substantive Article 20 (Personal mobility) that deals with measures “to ensure personal mobility with the greatest independence” (UN, 2006). Europe and Oceania share the substantive Article 17 (Protecting the integrity of the person), which recognizes the right to respect for physical and mental integrity (UN, 2006).

Figure 12. Least represented CRPD articles by region.
RQ3.5 asks: How does the distribution of CRPD articles differ by subregion? The top 10 most represented articles by subregion are shown in Figures 13 and 14. Some subregions show 11 instead of the 10 most represented articles because the percentages of the 10th and 11th are tied. Analysis was done for all subregions but, to save space, only the analysis for the European and Asian regions are shown.

Figure 13. Most represented CRPD Articles by Asia.

Figure 14. Most represented CRPD articles by Europe.
The five Asian regions, shown in Figure 13, share three substantive articles: Article 12 (Equal recognition before the law), which deals with measures to ensure persons with disabilities can enjoy and exercise their legal capacity as well as have equal rights to own or inherit property and have equal access to financial systems (UN, 2006), Article 23 (Respect for home and the family), and Article 27 (Work and employment), which deals with the rights of persons with disabilities to: (1) work on an equal basis with others; (2) gain a living by work freely chosen or accepted in a labor market; and (3) a work environment that is open, inclusive, and accessible (UN, 2006).
South-Eastern, Southern, and Western Asia each have no unique substantive article. Central Asia has two unique substantive articles: Article 26 (Habilitation and Rehabilitation) and Article 30 (Participation in cultural life, recreation, leisure and sport) which deals with recognition of persons with disabilities’ specific cultural and linguistic identity (sign languages, deaf culture, and so on), and the rights of persons with disabilities to participate on an equal basis in cultural life (television shows, museums, and so on), recreational, leisure, and sporting activities (UN, 2006). The only unique substantive article for Eastern Asia is Article 16 (Freedom from exploitation, violence, and abuse) that deals with all legislative, administrative, social, and educational measures to prevent (and assist in recovery from) all forms of exploitation, violence, and abuse (UN, 2006).
The four European regions, shown in Figure 14, share three substantive articles: Article 12 (Equal recognition before the law), Article 23 (Respect for home and the family), and Article 24 (Education) that states that persons with disabilities are able to fully and effectively participate in all aspects of education and educational systems (UN, 2006).
Northern Europe has no unique substantive article. Eastern Europe has two substantive articles: Article 11 (Situations of risk and humanitarian emergencies), which deals with measures to ensure the protection of persons with disabilities in situations of risk such as armed conflict, humanitarian emergencies, and natural disasters (UN, 2006), and Article 30 (Participation in cultural life, recreation, leisure, and sport). Southern Europe’s only unique substantive article is Article 16 (Freedom from exploitation, violence, and abuse). Western Europe’s only unique substantive article is Article 25 (Health) that recognizes and deals with measures to ensure that persons with disabilities have the right to enjoy the highest attainable standard of health without discrimination on the basis of disability (UN, 2006).
RQ3.6 asks: How does the distribution of CRPD articles differ by economic income group? The top 10 most represented articles by income group are shown in Figure 15 below. All four income groups share two substantive articles: Article 23 (Respect for home and the family) and Article 27 (Work and employment). Two substantive articles are shared by all the income groups except the high-income group: Article 11 (Situations of risk and humanitarian emergencies) and Article 24 (Education). The low-income group is the only income group that does not share the substantive Article 12 (Equal recognition before the law). In terms of substantive articles unique to only one income group: Article 16 (Freedom from exploitation, violence and abuse) is unique to the upper-middle-income group and Article 26 (Habilitation and rehabilitation) is unique to the high-income group.

Figure 15. Most represented CRPD articles by income group.
The top 10 least represented articles by income group are shown in Figure 16. All four income groups share two substantive articles: Article 10 (Right to life) and Article 20 (Personal mobility). The high- and upper-middle-income groups are the only ones that share the substantive Article 17 (Protecting the integrity of the person).

Figure 16. Least represented CRPD articles by income group.
RQ3.7 asks: What are the differences in the corpus between states that have and have not ratified the Optional Protocol? The top 10 most represented articles by whether or not the Optional Protocol was ratified are shown in Figure 17. Only the group of countries that had ratified the Optional Protocol had Article 13 (Access to justice)—this was the only substantive article not shared between the two groups.

Figure 17. Most represented CRPD articles by Optional Protocol ratification.
RQ3.8 asks: To what extent is the transition from the medical model to the rights-based model represented in the corpus? Overall, there is indeed a shift from the medical model to the rights-based model as the rights-based model is the most represented globally at 64%. Figure 18 shows that this shift held across all five regions.

Figure 18. Most represented disability model by region.
In order to understand the potential of Generative AI tools to conduct this type of analysis and treaty compliance, RQ4. asks: To what extent can GenAI tools answer the preceding research questions? Our reporting of this analysis is limited by space constraints. In response to our “mega-prompt,” both models returned surprisingly coherent responses. A summarized example of the various LLM responses is presented as follows:
Gemini 1.5 output: Top three “important ideas”: Shifting paradigms: these documents repeatedly emphasize the move away from the “medical model” of disability, which views disability as a personal defect, to the “social model,” which recognizes that societal barriers create disadvantages for people with disabilities. Focus on inclusion: the reports highlight the importance of social inclusion and the need to ensure that people with disabilities have full and equal participation in all aspects of society, including education, employment, and civic life. Accessibility as a fundamental right: the reports stress that accessibility is not just a matter of convenience, but a human right. They address physical, informational, and communication accessibility to ensure equal access.
The top ten topics (using topic modeling): (1) equality and nondiscrimination, (2) awareness-raising and information dissemination, (3) accessibility, (4) education, (5) employment, (6) legal capacity and decision-making, (7) health and medical treatment, (8) rehabilitation and habilitation, (9) participation in public and political life, and (10) social protection.
NER structured by the requested entities (e.g., top five Organizations):
-
• National Council for Persons with Disabilities (CONAIPD) (El Salvador, Venezuela)
-
• Ministry of Education (multiple countries)
-
• Australian Human Rights Commission (Australia)
-
• Supreme Council for Family Affairs (Qatar)
-
• Productivity Commission (Australia)
GPT-4o output: Top three “important ideas”: Inclusion and accessibility: the documents emphasize the importance of making education, employment, and public facilities accessible to persons with disabilities. This includes physical access, educational materials, and technology adaptations. Legal and policy frameworks: the reports highlight the necessity of robust legal and policy frameworks to protect the rights of persons with disabilities, aligning national laws with the CRPD. Awareness and education: awareness raising campaigns and educational programs are crucial for changing societal attitudes toward persons with disabilities and promoting their rights.
The top ten topics (using topic modeling): 1) education access, 2) employment and labor rights, 3) healthcare and rehabilitation, 4) legislative frameworks, 5) accessibility in public spaces, 6) awareness and advocacy, 7) intersectionality and vulnerable groups, 8) independent living and community support, 9) technology and innovation, and 10) international collaboration.
NER structured by the requested entities (e.g., top five Organizations):
-
• National Council for Persons with Disabilities (CONAIPD) (El Salvador, Venezuela)
-
• Ministry of Education (multiple countries)
-
• Australian Human Rights Commission (Australia)
-
• Supreme Council for Family Affairs (Qatar)
-
• Productivity Commission (Australia)
On subsequent iterative prompts, the LLMs were able to identify content related to all questions. Space constraints limit our ability to present these responses.
8. Discussion and recommendations
This study highlights varying global CRPD priorities and assesses computational text mining with LLM-based analyses. Word frequency analyses reveal shifts from medical- to rights-based models, confirmed by our custom lexicon showing the rights-based model’s predominance globally and regionally. However, limited emphasis on stigma and barriers indicates reporting gaps. The LDA topic model’s low coherence suggests that future research should improve model parameters and text processing.
Regional differences and their relationship with major regional disability networks designed to support CRPD implementation may contribute to the variation found in our empirical data. In the findings section, we observed specific variation of CRPD articles by regions. These variations reflect differing regional priorities. Europe uniquely emphasizes Article 26 (Habilitation and rehabilitation), highlighting its own regional focus on maximizing the independence of persons with disabilities. The prioritization of Article 26 in Europe could be reflective of the independence-focused influence of major disability networks such as The European Network on Independent Living and the European Disability Forum (EDF, 2021) (ENIL, 2022).
Income group disparities reveal that foundational rights like Articles 23 and 27 (Work) span all groups, while Article 24 (Education) is prominent in low-/middle-income countries and Article 26 in high-income countries. Concerning is Article 12’s (Equal recognition) absence in low-income countries. Countries ratifying the Optional Protocol uniquely emphasize Article 13, suggesting deeper commitment to justice through external accountability.
Our hybrid methodology leverages both GenAI and traditional NLP. Traditional NLP analysis serves as a verification mechanism for GenAI results. While Gemini 1.5 captures the medical-to-rights model shift, neither Gemini nor GPT-4o identify nuances in disability paradigms, well-being, and interventions from our term-frequency analysis. Both LLMs capture aspects of the most represented Articles except Article 23, likely due to over-indexing on early report sections. Additionally, we found periodic reports respond to issues from previous reports rather than providing comprehensive coverage.
Comparing topic models, traditional NLP focuses on children (related to Article 23), while LLMs’ topics reflect their overindexing on Articles 1–5, though GPT-4o less so. Both traditional and LLM-based NER identify similar organization entities, with LLMs adding country context.
Our approach demonstrates traditional NLP’s current advantages over GenAI, though GenAI is improving and will likely address whole-document analysis to mitigate overindexing. This hybrid framework provides valuable CRPD implementation insights while making text analytics accessible to nontechnical researchers. Broadening these methods will help civil society monitor implementation gaps, facilitate parallel reporting, enhance accountability, and improve implementation of complex policies such as the CRPD and SDGs.
9. Limitations
Our analysis, while comprehensive, has limitations. We rely on self-reported data, which may not fully reflect actual policy priorities or impact. Our custom dictionary might miss nuances in CRPD articles, and we did not fine-tune our NER model. Due to token count limits, the LLM analysis covers only a subset of reports, unlike the full-corpus NLP analysis.
10. Conclusions and future research
The study confirms the widespread adoption of the CRPD’s rights-based approach and its role in promoting social justice and economic inclusion. The region with the most complete transition to the rights-based model is the Americas. The paper contributes to both an understanding of the global implementation of the CRPD but also provides a methodology for ongoing monitoring at national, regional, and global levels. We see the paper making a practical contribution to national public policy debates, especially in terms of what are national priorities and where are the gaps in CRPD implementation. Future research will add additional data, including the alternative reports, state action plans, and CRPD Committee reports, and the full corpus for the LLMs.
Abbreviations
- CRPD
-
convention on the rights of persons with disabilities
- LLM
-
large language model
- NLP
-
natural language processing
Data availability statement
CRPD State Reports, Cogburn, Ochieng, Shikako, Woods, Aydin, 2025, “Replication data and code for: Uncovering Policy Priorities for Disability Inclusion: NLP and LLM Approaches to Analyzing CRPD State Reports,” https://github.com/derrickcogburn/CRPD_State_Reports, https://doi.org/10.5281/zenodo.15367519.
Acknowledgments
The authors are grateful for the administrative support provided by the American University Institute on Disability and Public Policy, the AU Internet Governance Lab, and the UNESCO Associate Chair in Transnational Challenges and Governance. The authors also thank the Center for Interdisciplinary Research in Rehabilitation of the Greater Montréal. In addition, we thank the Hawaii International Conference on System Sciences and its Minitrack on Culture, Identity, and Inclusion for facilitating the excellent double-blind peer review of the original conference paper on which this manuscript is based.
Author contribution
Conceptualization: D.L.C; T.A.O; K.S. Data Curation: D.L.C; T.A.O. Formal analysis: D.L.C; T.A.O; K.S. Funding acquisition: D.L.C. Investigation: D.L.C; T.A.O; K.S. Methodology: D.L.C; T.A.O. Project Administration: D.L.C; T.A.O. Resources: D.L.C. Software: D.L.C; T.A.O. Supervision: D.L.C. Validation: D.L.C; K.S. Visualization: D.L.C; T.A.O. Writing original draft: D.L.C; T.A.O; J.W; M.A. Writing review and editing: D.L.C; T.A.O; K.S; J.W; M.A. All authors approved the final submitted draft.
Funding statement
This research was supported in part by financial resources provided to Dr. Derrick Cogburn by American University, especially the Kogod School of Business, Department of Information Technology & Analytics and the School of International Service, Department of Environment, Development & Health. In addition, the Canada Research Chairs Program supports the Research Chair in Childhood Disabilities: Participation and Knowledge Translation of Dr. Keiko Shikako at McGill University. Neither American University nor McGill University had a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests
There are no financial, professional, contractual, or personal relationships or situations that could be perceived to impact the presentation of this work.
Comments
No Comments have been published for this article.