Introduction
Currently, a significant portion of primary liver cancer cases is caused by HBV. If early large-scale intervention measures can be implemented, liver cancer may become the second cancer, following cervical cancer, to be effectively controlled globally [Reference Shi, Cao and Wang1]. Therefore, it is crucial in clinical practice to assess the risk of primary liver cancer in HBV patients early on, in order to guide patients through validated interventions such as vaccination, reduce the incidence of liver cancer, and improve their prognosis. Many studies have analysed the influencing factors of primary liver cancer, but there is still a need to further explore systematic liver cancer risk-prediction tools [Reference Cao, Li and Sun2–Reference Zhao, Bai and Qi4]. Various risk scores have already been developed to predict the risk of liver cancer. The PAGE-B risk score is used to predict liver cancer risk in white patients receiving antiviral therapy, while the REACH-B, GAG-HCC, and CU-HCC risk scores are applicable to untreated Asian populations [Reference Chun, Papatheodoridis and Lee5]. These risk scores use factors such as age, gender, HBV DNA viral load, and liver cirrhosis to predict liver cancer risk. However, these scores lack the ability to explain the impact of comorbidities and other factors such as diabetes, hypertension, and obesity on liver cancer. Additionally, the complexity of these scores makes them difficult to use in clinical practice, limiting their convenience. This study analyses the influencing factors of primary liver cancer in HBV patients and incorporates some of the aforementioned factors to construct a nomogram prediction model. By combining this with existing risk scores, the prediction of liver cancer risk may be further improved.
Subjects and methods
Study subjects
A retrospective selection of 153 patients with HBV complicated by primary liver cancer and 153 patients with HBV without primary liver cancer, treated in our hospital between January 2022 and August 2024, was used as the study subjects. The patients were randomly assigned into a training set and a validation set in a 7:3 ratio (This allows for a relatively larger validation dataset, enabling a better assessment of the model’s generalization performance. Moreover, it makes the validation results more representative, especially when the dataset is large, as this ratio can effectively evaluate the model’s performance across different scenarios.). The validation set included 107 patients with HBV complicated by primary liver cancer (complicated group) and 107 patients with HBV without primary liver cancer (non-complicated group), with a total of 92 patients in the validation set (46 patients each for the complicated and non-complicated groups). The external validation group comprised 446 hepatitis B patients from other hospitals, including 15 with primary liver cancer.
Inclusion criteria: (i) meet the diagnostic criteria for HBV and primary liver cancer [Reference Cong, Bu and Chen6, Reference Terrault, Lok and McMahon7]; (ii) mentally stable, capable of normal communication; (iii) the study has been approved by the Medical Ethics Committee; and (iv) local permanent residents who have lived locally for more than 3 years. Exclusion criteria: (i) patients with drug-induced liver disease, Wilson’s disease, or autoimmune liver disease; (ii) patients with primary tumours of other systems; (iii) missing clinical test data; and (iv) patients with hepatitis caused by hepatitis A or C virus. The flowchart of case collection is shown in Figure 1.

Figure 1. Case collection flowchart.

Figure 2. Nomogram prediction model for HBV patients complicated with primary liver cancer.
Collection of clinical data
The clinical data collected from patients included: gender, age, body mass index (BMI), smoking, drinking, hypertension, hyperlipidaemia, diabetes, monthly income, education level, family history of liver cancer, alcoholic fatty liver disease, liver cirrhosis, dietary habits, HBV DNA viral load, and antiviral treatment.
Statistical analysis
Statistical analysis was performed using IBM-SPSS 25.0 software and R software (version R4.3.3). Measurement data [Mean ± standard deviation/(±)] were analysed using an independent sample t-test, and categorical data [Number of cases (percentage)/n (%)] were compared using the χ2 test. Multivariate logistic regression analysis was used to screen independent risk factors for primary liver cancer in HBV patients. Variables with P < 0.05 in the multivariate logistic regression analysis were included as predictive factors, and a nomogram prediction model was constructed using the RMS package in R software. ROC curves and calibration curves were used to analyse the nomogram’s ability to predict primary liver cancer in HBV patients. Decision curve analysis was performed to reflect the clinical net benefit brought by the nomogram to patients. A two-sided P-value <0.05 was considered statistically significant.
Results
Comparison of clinical data between the training set and validation set
There were no statistically significant differences (P > 0.05) in gender, age, BMI, smoking, drinking, hypertension, hyperlipidaemia, diabetes, monthly income, education level, family history of liver cancer, alcoholic fatty liver disease, liver cirrhosis, dietary habits, HBV DNA viral load, and antiviral treatment between the training set and the validation set. See Table 1.
Table 1. Comparison of clinical data between the training set and validation set [n(%)/(±)]

Univariate analysis of HBV patients complicated with primary liver cancer in the training set
There were no statistically significant differences (P > 0.05) in age, smoking, hypertension, hyperlipidaemia, monthly income, education level, alcoholic fatty liver disease, dietary habits, and antiviral treatment between the non-complicated group and the complicated group in the training set. However, the proportions of males, alcohol consumption, diabetes, family history of liver cancer, liver cirrhosis, HBV DNA viral load ≥104 cps/mL, and BMI were higher in the complicated group compared to the non-complicated group (P < 0.05). See Table 2.
Table 2. Univariate analysis of HBV patients complicated with primary liver cancer in the training set[n(%)/(±)]

Logistic regression analysis of factors influencing HBV patients complicated with primary liver cancer
Variables with statistically significant differences in the univariate analysis from Table 2 were included in the logistic regression analysis. The variable assignments are shown in Table 3. The results of multivariate analysis are shown in Table 4, where gender, BMI, alcohol consumption, diabetes, family history of liver cancer, liver cirrhosis, and HBV DNA viral load were all found to be influencing factors for HBV patients complicated with primary liver cancer (P < 0.05).
Table 3. Variable assignment table

Table 4. Multivariate logistic regression analysis

Construction of the nomogram for predicting primary liver cancer in HBV patients
Based on the variables with statistically significant differences from the analysis results in Table 4, a nomogram was constructed using R software to predict primary liver cancer in HBV patients (see Figure 2). Each variable was assigned a score from 0 to 100 proportionally, with upward vertical lines used to determine the score for each variable and downward vertical lines used to determine the total score’s corresponding predicted probability, which represents the risk of primary liver cancer in HBV patients.
Internal validation of the prediction model for primary liver cancer in HBV patients
The ROC curve results for the training set (Figure 3a) and the validation set (Figure 3b) showed that the AUCs were 0.882 (95%CI: 0.837 ~ 0.926) and 0.859 (95%CI: 0.810 ~ 0.909), respectively, indicating that the nomogram prediction model had high discriminative power.

Figure 3. ROC curves (a, b) and calibration curves (c, d) of the prediction model for primary liver cancer in HBV patients in the training set and validation set.

Figure 4. Decision curve analysis diagram.
The calibration curve results for the training set (Figure 3c) and the validation set (Figure 3d) showed that the Hosmer–Lemeshow test yielded χ2 = 2.648, 4.117, P = 0.954, 0.846, respectively, indicating that the nomogram prediction model matched the results in both cohorts, with a good degree of agreement between the model predictions and the actual outcomes.
Analysis of the clinical application value of the nomogram model
The decision curve analysis results showed that in the high-risk threshold probability range of 0.07 ~ 0.95, the nomogram model’s line was above both the ‘All’ line and the ‘None’ line, indicating a good net positive benefit and high clinical value. The ‘All’ line represents the net benefit when all hepatitis B patients receive treatment, while the ‘None’ line represents the net benefit when no hepatitis B patients receive treatment (see Figure 4).
External validation of the prediction model for primary liver cancer in HBV patients
The ROC curve for external validation showed an AUC of 0.863 (95% CI: 0.814–0.911), indicating good discriminatory ability of the nomogram model. The calibration curve demonstrated good agreement between predicted and actual outcomes, with a Hosmer–Lemeshow test result of χ2 = 7.999 and P = 0.434. Decision curve analysis showed that within a high-risk threshold probability range of 0.14 to 0.95, the nomogram curve lay above the ‘All’ and ‘None’ lines, suggesting favourable net clinical benefit and high clinical utility (Figure 5).

Figure 5. ROC curve (a), calibration curve (b), and decision curve analysis (b) of the prediction model for primary liver cancer in HBV patients in the external validation group.
Discussion
The HBV virus is a DNA-based virus, belonging to the family of hepatotropic DNA viruses. It replicates extensively within liver cells and interacts with various cellular proteins, causing hepatitis B in infected individuals and increasing the risk of liver cancer [Reference Guo, Zhao and Jiang8]. Prevention is the most effective method for controlling liver cancer. Liver cancer prevention is divided into three levels: primary prevention involves vaccination for the general population starting from birth; secondary prevention involves the use of antiviral drugs for high-risk individuals with chronic HBV infection; and tertiary prevention also uses antiviral drugs to prevent recurrence in high-risk individuals with chronic HBV infection [Reference Singal, Kanwal and Llovet9]. Therefore, it is necessary to accurately assess high-risk patients with primary liver cancer among HBV patients based on relevant influencing factors to guide clinical treatment.
Multivariate regression analysis in this study revealed that gender, BMI, alcohol consumption, diabetes, family history of liver cancer, liver cirrhosis, and HBV DNA viral load are important risk factors for HBV patients complicated with primary liver cancer. (1) Gender and Alcohol Consumption: The study by Xu et al. [Reference Xu, Cheng and Chen10] showed that aflatoxin exposure increases the risk of hepatocellular carcinoma in HBV patients, with male patients exhibiting higher expression levels of genes related to aflatoxin metabolism than female patients. Wang et al. [Reference Wang, S-H and Chen11] found that androgens enhance the genotoxicity and inflammation induced by aflatoxin in male HBV patients, which may explain the higher risk of liver cancer in male patients. Additionally, alcohol-related liver disease is a major cause of chronic liver disease worldwide, and it may contribute to metabolic dysfunction in patients, thereby playing a role in the progression of liver cancer [Reference Mackowiak, Fu and Maccioni12]. (2) BMI: Fan et al. [Reference Fan and Hou13] found that central obesity is a risk factor for hepatocellular carcinoma in Asian patients with chronic HBV infection. Obesity, characterized by excessive fat accumulation in the body, is generally indicated by a high BMI. The liver, being the main organ for fat storage, secretes pro-inflammatory cytokines, which act as carcinogenic signalling mediators in the progression of liver cancer [Reference Sohn, Lee and Lee14]. (3) Diabetes: Yip et al. [Reference Yip, Wong and Lai15] suggested that whether chronic HBV patients also have diabetes affects the accuracy of multiple liver cancer risk scores. This may indicate that the liver cancer risk profiles differ between chronic HBV patients with and without diabetes, and existing scores cannot strictly distinguish between them. Furthermore, Campbell et al. [Reference Campbell, Wang and McNaughton16] found that diabetes is an independent risk factor for liver cirrhosis and hepatocellular carcinoma in chronic HBV patients. Previous observational studies have shown that the global burden of primary liver cancer is exacerbated by the coexistence of type 2 diabetes, with the disease burden consistently higher in males across all age groups [Reference Xie, Lin and Fan17]. It is hypothesized that diabetes, closely associated with insulin resistance and hyperglycaemia, may lead to dysregulated intracellular signalling pathways and promote the development of liver cancer. (4) Family History of Liver Cancer and HBV DNA Viral Load: Previous studies have found significant familial clustering of primary liver cancer, mainly influenced by genetic factors. Relevant oncogenes can integrate into liver cells through the replication of HBV DNA, leading to the occurrence and progression of liver cancer [Reference Guo, Li and Zhao18]. A higher HBV DNA viral load represents a higher level of viral replication in the patient, increasing the probability of primary liver cancer. (5) Liver Cirrhosis: Huang et al. [Reference Huang, Mathurin and Cortez-Pinto19] found that liver cirrhosis is a significant risk factor for primary liver cancer. Cirrhosis induced by HBV increases the risk of liver tissue fibrosis, leading to abnormal immune system responses, which in turn cause elevated expression of tumour factors and decreased expression of anti-tumour factors, ultimately resulting in primary liver cancer [Reference Deng, Huang and Wong20].
In recent years, nomogram models have gained widespread popularity due to their convenient modelling approach and ability to integrate multiple variables. If the AUC of a nomogram model exceeds 0.7, the model is considered to have good discriminative power [Reference Li, Zheng and Gu21]. Wu et al. [Reference Wu, Zeng and Sun22] found that the REAL-B risk score had high discriminative power in predicting hepatocellular carcinoma in chronic HBV patients, with AUCs of 0.76 (3 years) and 0.75 (5 years) through internal and external validation. In this study, the performance of the nomogram model was evaluated using ROC curves and calibration curves. The results showed that the AUCs of the model in the training set and validation set were 0.882 and 0.859, respectively, both above 0.7 and higher than those in the aforementioned studies. Additionally, the calibration curves followed the reference lines. Therefore, we preliminarily conclude that the nomogram model constructed in this study has good discriminative and calibration capabilities, making it useful for predicting liver cancer. Decision curve analysis in this study showed that the nomogram could yield net benefits within most reasonable high-risk threshold probability ranges, suggesting that it could help clinicians promptly detect disease progression in patients and take effective intervention measures to improve their quality of life. Further external validation using cases from other hospitals demonstrated that the model maintained high discriminative ability, with an AUC of 0.863 on the ROC curve, consistent with the internal validation results from our hospital. The calibration curve also showed good agreement between predicted and observed probabilities, indicating high calibration accuracy. Decision curve analysis confirmed that the nomogram continued to provide a high net benefit in external hospital settings, supporting its value in clinical decision-making.
However, the results of this study may have the following limitations and shortcomings. Firstly, this was a retrospective study, so inaccuracies may arise due to the small data size, and there are inevitable confounding biases. Prospective clinical studies are needed to verify the current results. Secondly, some potential influencing factors (e.g., genetic status, biomarkers) are still unclear, and including such important factors may further improve the nomogram’s effectiveness. Thirdly, due to limitations in the timeframe of case inclusion, the nomogram has not yet reached an ideal state, with AUCs of only 0.882 and 0.859, leaving room for improvement, which requires further refinement. Fourthly, the generalizability of the model constructed in this study requires further validation. The current dataset has no missing values and includes multiple exclusion criteria, which may affect the model’s applicability to other datasets. Fifth, the validation sample size is relatively small, and each variable in the multivariate analysis does not meet the principle of at least 10 events per variable, potentially leading to low statistical power of the model.
Conclusion
In conclusion, this study constructed a nomogram for predicting primary liver cancer in HBV patients based on an analysis of influencing factors. The nomogram not only includes baseline information such as gender, BMI, alcohol consumption, diabetes, family history of liver cancer, liver cirrhosis, and HBV DNA viral load but also greatly improves the sensitivity and specificity of the model. If the nomogram can be validated in an external dataset that includes a broader population representing real-world clinical data, it can be used to help clinicians identify high-risk individuals and implement targeted screening and personalized treatment.
Data availability statement
The datasets used and/or analysed during the present study are available from the corresponding author on reasonable request.
Author contribution
Qunmei Cao: Project development, Data Collection, Manuscript writing. Yilin Zhou: Project development, Data Collection. Changlong Wen: Data collection, Data analysis. Qinglan Li: Project development, Data collection, Data analysis, Manuscript editing.
Competing interests
The authors declares none.
Ethical standard
This study involving human participants was in accordance with the ethical standards of the Medical Ethics Committee of the First Affiliated Hospital of Ganzhou People’s Hospital (GZSRMYY2024070012) and with the 1964 Helsinki Declaration. We obtained informed consent of the patient or their guardian with a sign on the informed consent form.
Consent for publication
All authors give their consent for publication.