Prediction model of random forest for the risk of hyperuricemia in a Chinese basic health checkup test

Abstract Objectives: The present study aimed to develop a random forest (RF) based prediction model for hyperuricemia (HUA) and compare its performance with the conventional logistic regression (LR) model. Methods: This cross-sectional study recruited 91,690 participants (14,032 with HUA, 77,658 without HUA). We constructed a RF-based prediction model in the training sets and evaluated it in the validation sets. Performance of the RF model was compared with the LR model by receiver operating characteristic (ROC) curve analysis. Results: The sensitivity and specificity of the RF models were 0.702 and 0.650 in males, 0.767 and 0.721 in females. The positive predictive value (PPV) and negative predictive value (NPV) were 0.372 and 0.881 in males, 0.159 and 0.978 in females. AUC of the RF models was 0.739 (0.728–0.750) in males and 0.818 (0.799–0.837) in females. AUC of the LR models were 0.730 (0.718–0.741) for males and 0.815 (0.795–0.835) for females. The predictive power of RF was slightly higher than that of LR, but was not statistically significant in females (Delong tests, P=0.0015 for males, P=0.5415 for females). Conclusion: Compared with LR, the good performance in HUA status prediction and the tolerance of features associations or interactions showed great potential of RF in further application. A prospective cohort is necessary for HUA developing prediction. People with high risk factors should be encouraged to actively control to reduce the probability of developing HUA.

). The predictive power of RF was slightly higher than that of LR, but it was not statistically significant in females (Delong tests, p=0.0015 for males, p=0.5415 for females). Conclusion: Compared with the LR, the good performance in HUA status prediction and the tolerance of features associations or interactions showed great potential of RF in further application. A prospective cohort is necessary for HUA developing prediction. People with high risk factors should be encouraged to actively control to reduce the probability of developing HUA.

Keywords: Hyperuricemia (HUA), random forest (RF), prediction model Introduction：
Uric acid (UA) is the final metabolite of purines in human. Hyperuricemia(HUA) may result from under-excretion or over-production of UA in congenital or acquired ways [1]. According to the National Health and Nutrition Examination Survey (NHANES) 2015-2016 in the USA, the prevalence of HUA was 20.1% (20.2% and 20.0% in males and females) and was stable during the last decade(p=0.24) [2]. A meta-analysis conducted in China, which pooled study results from 2000 to 2014, showed the prevalence of HUA in China was 13.3% (19.4% and 7.9% in males and females) [3]. A higher prevalence of 16.4% was reported in a separate meta-analysis pooling study from 2000 to 2019 in China (20.4% and 9.8% in males and females) [4]. Although it is lower than Downloaded from http://portlandpress.com/bioscirep/article-pdf/doi/10.1042/BSR20203859/906163/bsr-2020-3859.pdf by guest on 24 March 2021 the USA, the prevalence has climbed steadily to over two times during the last two decades and become another common metabolic disease after diabetes mellitus (DM).
Extensive evidence has shown that, in addition to triggering of gout, elevated serum UA is an independent risk factor for chronic kidney disease, hypertension, cardiovascular diseases, dyslipidemia and impaired glucose metabolism, as it also plays an important role in the premature mortality observed in the diseases [5,6]. It has also been demonstrated that patients with asymptomatic HUA may have urate deposition over joints or even bone erosion, suggesting that asymptomatic HUA to gout is a continuous pathological process [7,8]. Therefore, it is of great significance to predict HUA early and conduct secondary prevention for high-risk groups.
Thus far, there are few prediction models for HUA. Cao et al developed a Cox regression model using routine anthropometric and blood biomarkers in urban Han Chinese adult [9]. Zeng et al developed an artificial neural network prediction model in Chinese adults based on dietary risk factors [10]. Lee et al tried several machine learning algorithms to predict HUA status in Korea people over 40 [11]. As a good clinical prediction model, it should have the characteristics of high prediction power, easy-understanding and convenient-operating. Random forest (RF) is an algorithm that integrates multiple decision trees through the idea of ensemble learning and is capable of representing high order interactions. With further research on HUA, predictors of the model can also be supplemented and replaced without worrying about interaction or association between variables. In this study, based on the RF model, we established a gender-specific prediction model for HUA, and compared its performance with the conventional logistic regression (LR) model, with the aim of developing a predictive model that can be easily generalized and preventing more further adverse health consequences from HUA.

Study population
The study protocol was approved by the Institutional Review Board and Ethics Committee of

Measurements and definitions
Anthropometric measurements and blood tests of the participants were performed during their visits to our institutions. Height and weight were measured in centimeters and kilograms.
Body mass index (BMI) was calculated by dividing weight (kg) by the square of height (m 2 ).
Waist circumference (WC) was measured in centimeters. Systolic and diastolic blood pressures (SBP/ DBP) were measured by an automated sphygmomanometer while the subjects were in a seated position after resting for 5 minutes. Blood tests were measured by an auto-analyzer (Hitachi Model 7600 analyzer, Hitachi, Tokyo, Japan). Alcohol intake and smoking history were from the self-recorded questionnaires.
The diagnostic criteria for HUA was SUA>420μmol/L in males and >360μmol/L in females [12]. The diagnostic criteria for DM was fasting plasma glucose ≥7.0mmol/L [13].

Statistical analysis
We tested all parameters for the normality by the Kolmogorov-Smirnov test. The subject characteristics were assessed using Mann-Whitney U test. Differences were considered significant at P<0.05. The results were represented as median (quartile 1-quartile 3), otherwise stated. Odds ratio (OR) with 95% confidence interval (CI) was calculated by the LR. Statistical analysis was performed using SPSS version 26 (IBM Corporation). Comparison of the receiver operating characteristic (ROC) curves was done with the Delong test using MedCalc version 19.
Models were developed using Python 3.7.

Model evaluation
The discriminatory power of models was analyzed by ROC curves. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated. Cutoff score with the maximum sum of sensitivity and specificity was considered optimum [14].
Since the number of non-HUA was much larger than HUA, simple down-sampling was carried out on the non-HUA set based on the sample size of HUA to adjust for data imbalance. As a result, the sample distribution of the training sets was 9077/8986 for male and 2054/2033 for female (HUA/non-HUA). The validation set retained the real-world population distribution. The distribution of the validation sets was 2399/8123(29.53%) for male and 502/7315(6.86%) for female (HUA/non-HUA).
A total of 21 variables including anthropometric measurements, blood tests, alcohol intake and smoking history were firstly examined in the RF model (shown in the supplementary material), of which the top 10 with the highest weight were selected for each gender ( Table 2). All variables differed significantly between groups (p<0.001). 8/10 variables were consistent across gender models. Among HUA patients, the distribution of age and TC were higher in female(p<0.001), and the other variables were significantly higher in males(p<0.001).

Evaluation of LR model predictive ability
We further incorporated these parameters into the conventional LR model. First, we conducted univariate LR in both gender groups and all variables except FPG showed statistical significance (Table 3). Next, we conducted multiple LR for both. Since LR model is sensitive to the collinearity of variables, we screened the 10 variables in model 1 with tolerance=0.2 as the cutoff in each gender. For males, BMI, weight and WC suggested significant collinearity, so that we only left one. Multiple LR suggested that variables except WBC were of statistical significance in BMI-retained model (model 2). Interestingly, all variables were shown to be statistically significant when weight was the one to be retained (not shown in the table). For females, collinearity was shown between weight and BMI. The BMI-retained model suggested that WC was not statistically significant, while weight-retained model showed a contradictory result. AUC  Figure   1). The predictive power of RF was slightly higher than that of LR, although it was not statistically significant in females (Delong tests, p=0.0015 for males, p=0.5415 for females).

Discussion
In the study, we obtained a balanced training set by simple down-sampling, then developed a gender-specific RF model to predict the status of HUA and compared its performance with the most commonly used LR model. It showed that the predictive power of RF was higher than LR in both genders, although it was not significant in females.
Frequently the classifiers based on medical data are developed using class-imbalanced data, where the number of with a particular disease is far less than the healthy. Our study contained 91690 participants, of which 14032 were HUA and 77658 were non-HUA. Standard classification methods applied to class-imbalanced data tend to produce classifiers biased towards the majority class [15]. Therefore, increasing attention has been paid to between-class imbalance problem to improve the accuracy of the minority class prediction [16]. However, practice of imbalanced data processing is not widely used in clinical practice [17]. In this study, based on our relatively sufficient sample size, we tried simple down-sampling to ameliorate the data imbalance [17].
Machine learning has been very popular in various fields for years. Due to its complexity in understanding and operation, it has not been fully applied in clinical work. RF, as an ensemble learning algorithm, integrates multiple decision trees, of which each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest [18]. Compared with the conventional statistical methods, it has the following advantages: 1. RF is a high-throughput algorithm that can handle high-dimensional features without dimension reduction. 2. RF gives the weighted value of each feature in the classification, so that the impact of variables on the classification outcome can be quantified and comparable. 3. RF can deal with imbalanced or missing data to some extent. 4.
Interaction or association between variables does not impact much on the classification, as RF can represent high-order interaction [19].
In this paper, we used two different methods as RF and LR to predict the status of HUA.
The variables based on RF were significantly different between HUA and non-HUA (Table 2), that the PPV of both gender groups are not high, which may be related to the prevalence of HUA in the population. When sensitivity and specificity in a diagnostic test remained unchanged, higher PPV will be obtained in population with a higher prevalence. However, due to the relatively low prevalence in the real-world data, the true positive sets of the diagnostic test do not increase significantly even if the sensitivity increases, so the method based on adjusting sensitivity has little effect on improving the positive predictive value. By increasing specificity can significantly reduce the number of false positive sets, so as to improve the PPV. However, the model is for screening HUA in the population, so that the sensitivity should not be abandoned while the specificity is improved. Therefore, we made a trade-off in our study and chose the cutoff point with the maximum sum of sensitivity and specificity.
Comparing with LR, we found that the AUC of RF was higher in both gender models, although it was not significant in females. However, as mentioned above, any interaction or association between variables does not adversely affect the RF classification while it does in LR. Since LR is easily affected by the collinearity of variables, when building the LR model, we took the tolerance=0.2 as the bound value, and only one variable in the collinearity set is reserved. We found that when retaining different variables, the effect on the remaining variables was different. Indeed, when one confounding factor is adjusted for, any relevant variable will also be changed. In this case, the choice about the retained variables becomes difficult, especially when there are many variables. At this point, it is methodologically better to use RF, which that is not that sensitive to collinearity. To sum up, we believe that RF has great potential in HUA prediction.
As for the risk factors, in addition to indicators discussed in previous studies, we found that the role of FPG is worth mentioning. FPG ranked 10 th in males and 15 th in females. After adjusting other covariates, there was no significant association between FPG level and HUA in normal FPG group(P=0.154), but a negative association was found in DM group in males (OR=0.811, P=0.002), which was in consistent with previous studies [20]. Previous studies have indicated that diabetic patients with glycosuria had a null prevalence of HUA and excreted more UA than those without glycosuria [21]. Chino et.al showed that glycosuria induced by SGLT2 inhibitors at the proximal tubule may inhibit UA reabsorption [21].
Glycosuria resulting from elevated glucose levels in diabetic patients might lead to a competitive inhibition of UA reabsorption. However, the relationship between FPG and serum UA still remains controversial. Studies have reported that serum UA can cause pancreatic β-cell dysfunction, and HUA become an independent risk factor for type 2 DM [22,23]. In addition, the weight of FPG in female classification is relatively low, and whether there is an interaction between FPG and hormones still needs to be further explored.
The study also has some limitations. First, the dataset is based on a single-center cross-sectional study, and there may be selection bias. Secondly, underlying disease of participants from the program could not be as detailed as the hospital records, which might introduce confounding factors.

Conclusions:
In conclusion, we developed a prediction model of HUA status based on a cross-sectional data, and the model has achieved good results. Compared with the LR, RF has its own irreplaceable advantages, and we believe that RF has great potential in HUA prediction. A further prospective cohort is necessary for HUA developing prediction. Moreover, variables in this study only included common anthropometric and blood biomarkers. Other variables such as lifestyle, education background and income may also be integrated as a comprehensive epidemiological prediction model. This study is only based on the preliminary study of machine learning. We believe that it will provide a direction for the further study of machine learning in clinical research.