A random forest-based analysis of cassava mosaic disease-related factors affecting the on-farm livelihoods of cassava farmers
DOI:
https://doi.org/10.21839/jaar.2024.v9.8993Keywords:
Cassava mosaic disease, Cassava farmers, Cassava income, Random ForestAbstract
This study aimed to identify key CMD-related factors affecting Cameroon cassava farmers’ incomes originating from both the sale of cassava cuttings (V215) and the sale of cassava roots (V216). To achieve this, nine CMD-related variables were used to independently train two Random Forest models. These models were later employed for regression-based prediction of both financial targets V215 and V216. The Random Forest (RF)-based mean absolute percentage error for targets V215 and V216 were 0.19 and 1.25 respectively. The RF-based mean Gaussian deviance for targets V215 and V216 were 0.07 and 0.51 respectively. Based on RF feature importance scores (RFFI), the top 3 factors affecting income originating from the sale of cassava cuttings were found to be: late appearance of symptoms as a difficulty associated with regular field monitoring (RFFI of 0.2594), removal of infected plants as a method of controlling frequent occurrence of viral diseases in respondents’ cassava fields (RFFI of 0.1633) and lack of healthy planting material due to frequent occurrence of viral diseases in respondents’ cassava fields (RFFI of 0.1495). Also, the top 3 factors affecting income originating from the sale of cassava roots were found to be: the replacement of infected plants with healthy cuttings as a method of controlling the frequent occurrence of viral diseases in respondents’ cassava fields (RFFI of 0.1974), decrease in yield due to frequent occurrence of viral diseases in respondents’ cassava fields (RFFI of 0.1530) and poor plant growth due to frequent occurrence of viral diseases in respondents’ cassava fields (RFFI of 0.1388).
Downloads
References
Agarwal, A., Kenney, A. M., Tan, Y. S., Tang, T. M., & Yu, B. (2023). MDI+: A Flexible Random Forest-Based Feature Importance Framework. arXiv:2307.01932. https://arxiv.org/abs/2307.01932v1
Akiyo, S. (2013). Cassava Processing and Marketing by Rural Women in the Central Region of Cameroon. African Study Monographs, 34(4), 203-219. https://doi.org/10.14989/185092
Alabi, O. J., & Mulenga, R. M. (2017). African cassava mosaic virus (African cassava mosaic). CABI Compendium, 2535. https://doi.org/10.1079/cabicompendium.2535
Aslam, M., & Smarandache, F. (2023). Chi-square test for imprecise data in consistency table. Frontiers in Applied Mathematics and Statistics, 9, 1279638. https://doi.org/10.3389/fams.2023.1279638
Baniecki, H., Kretowicz, W., & Biecek, P. (2023). Fooling Partial Dependence via Data Poisoning. In M. R. Amini, S. Canu, A. Fischer, T. Guns, P. K. Novak, & G. Tsoumakas (Eds.), Machine Learning and Knowledge Discovery in Databases (Vol. 13715) Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-031-26409-2_8
Barreñada, L., Dhiman, P., Timmerman, D., Boulesteix, A.-L., & Van Calster, B. (2024). Understanding random forests and overfitting: A visualization and simulation study. arXiv:2402.18612. https://doi.org/10.48550/arXiv.2402.18612
Baumgärtner, L., Herzog, R., Schmidt, S., & Weiß, M. (2023). The Proximal Map of the Weighted Mean Absolute Error. arXiv:2209.13545. https://doi.org/10.48550/arXiv.2209.13545
Belliardo, F., & Giovannetti, V. (2020). Achieving Heisenberg scaling with maximally entangled states: An analytic upper bound for the attainable root mean square error. Physical Review A, 102(4), 042613. https://doi.org/10.1103/PhysRevA.102.042613
Benhamou, E., & Melot, V. (2018). Seven proofs of the Pearson Chi-squared independence test and its graphical interpretation. arXiv:1808.09171. https://doi.org/10.48550/arXiv.1808.09171
Bilong, E. G., Abossolo-Angue, M., Ajebesone, F. N., Anaba, B. D., Madong, B. À., Nomo, L. B., & Bilong, P. (2022). Improving soil physical properties and cassava productivity through organic manures management in the southern Cameroon. Heliyon, 8(6), e09570. https://doi.org/10.1016/j.heliyon.2022.e09570
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Broutin, N., Devroye, L., Lugosi, G., & Oliveira, R. I. (2024). Subtractive random forests arXiv:2210.10544. https://doi.org/10.48550/arXiv.2210.10544
Busch, P., Lahti, P., & Werner, R. F. (2014). Quantum root-mean-square error and measurement uncertainty relations. Reviews of Modern Physics, 86(4), 1261-1281. https://doi.org/10.1103/RevModPhys.86.1261
Cardona, L. A. S., Vargas-Cardona, H. D., Navarro González, P., Cardenas Peña, D. A., & Orozco Gutiérrez, Á. Á. (2020). Classification of Categorical Data Based on the Chi-Square Dissimilarity and t-SNE. Computation, 8(4), 4. https://doi.org/10.3390/computation8040104
Cassava Mosaic Disease. (n.d.). Cassava Mosaic Disease: A Curse to Food Security in Sub-Saharan Africa. Retrieved from https://www.apsnet.org/edcenter/apsnetfeatures/Pages/cassava.aspx
Chaiyana, A., Khiripet, N., Ninsawat, S., Siriwan, W., Shanmugam, M. S., & Virdis, S. G. P. (2024). Mapping and predicting cassava mosaic disease outbreaks using earth observation and meteorological data-driven approaches. Remote Sensing Applications: Society and Environment, 35, 101231. https://doi.org/10.1016/j.rsase.2024.101231
Chamma, A., Thirion, B., & Engemann, D. A. (2023, December 18). Variable Importance in High-Dimensional Settings Requires Grouping. arXiv:2312.10858. https://arxiv.org/abs/2312.10858v1
Cheema, M., Amin, M., Mahmood, T., Faisal, M., Brahim, K., & Elhassanein, A. (2023). Deviance and Pearson Residuals-Based Control Charts with Different Link Functions for Monitoring Logistic Regression Profiles: An Application to COVID-19 Data. Mathematics, 11(5), 5. https://doi.org/10.3390/math11051113
Chi, C.-M., Fan, Y., & Lv, J. (2023). FACT: High-Dimensional Random Forests Inference. arXiv:2207.01678). https://doi.org/10.48550/arXiv.2207.01678
Chikoti, P. C., & Tembo, M. (2022). Expansion and impact of cassava brown streak and cassava mosaic diseases in Africa: A review. Frontiers in Sustainable Food Systems, 6, 1076364. https://doi.org/10.3389/fsufs.2022.1076364
Chikoti, P. C., Mulenga, R. M., Tembo, M., & Sseruwagi, P. (2019). Cassava mosaic disease: A review of a threat to cassava production in Zambia. Journal of Plant Pathology, 101(3), 467-477. https://doi.org/10.1007/s42161-019-00255-0
Curth, A., Jeffares, A., & van der Schaar, M. (2024). Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers. arXiv:2402.01502. https://doi.org/10.48550/arXiv.2402.01502
Das, K., Jiang, J., & Rao, J. N. K. (2004). Mean squared error of empirical predictor. The Annals of Statistics, 32(2), 818-840. https://doi.org/10.1214/009053604000000201
De Myttenaere, A., Golden, B., Grand, B. L., & Rossi, F. (2015a). Using the Mean Absolute Percentage Error for Regression Models. arXiv:1506.04176. https://doi.org/10.48550/arXiv.1506.04176
De Myttenaere, A., Golden, B., Grand, B. L., & Rossi, F. (2016). Mean Absolute Percentage Error for regression models. Neurocomputing, 192, 38-48. https://doi.org/10.1016/j.neucom.2015.12.114
De Myttenaere, A., Grand, B. L., & Rossi, F. (2015b). Empirical risk minimization is consistent with the mean absolute percentage error. arXiv:1509.02357. https://doi.org/10.48550/arXiv.1509.02357
Evouna, J. S. M., Molua, E. L., Choumbou, R. F. D., & Kambiet, P. L. K. (2024). Structure and performance of cassava markets: Challenges of food security and connecting small farmers to markets in Cameroon. Frontiers in Sustainable Food Systems, 8, 1353565. https://doi.org/10.3389/fsufs.2024.1353565
Ferry, J., Fukasawa, R., Pascal, T., & Vidal, T. (2024). Trained Random Forests Completely Reveal your Dataset. arXiv:2402.19232. https://doi.org/10.48550/arXiv.2402.19232
Fondong, V. N. (2017). The Search for Resistance to Cassava Mosaic Geminiviruses: How Much We Have Accomplished, and What Lies Ahead. Frontiers in Plant Science, 8, 408. https://doi.org/10.3389/fpls.2017.00408
Fumagalli, F., Muschalik, M., Hüllermeier, E., & Hammer, B. (2023). Incremental Permutation Feature Importance (iPFI): Towards Online Explanations on Data Streams. Machine Learning, 112, 4863-4903. https://doi.org/10.1007/s10994-023-06385-y
Gaboardi, M., woo Lim, H., Rogers, R., & Vadhan, S. (2016). Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing. arXiv:1602.03090. https://doi.org/10.48550/arXiv.1602.03090
Hareesh, P. S., Resmi, T. R., Sheela, M. N., & Makeshkumar, T. (2023). Cassava mosaic disease in South and Southeast Asia: Current status and prospects. Frontiers in Sustainable Food Systems, 7, 1086660. https://doi.org/10.3389/fsufs.2023.1086660
Hassan, A., Paik, J. H., Khare, S., & Hassan, S. A. (2021). PPFS: Predictive Permutation Feature Selection. arXiv:2110.10713. https://arxiv.org/abs/2110.10713v1
Hawinkel, S., Waegeman, W., & Maere, S. (2024). The out-of-sample R2: Estimation and inference. The American Statistician, 78(1), 15-25. https://doi.org/10.1080/00031305.2023.2216252
Heath, D. G., Kasif, S., & Salzberg, S. (1993). Induction of Oblique Decision Trees. International Joint Conference on Artificial Intelligence.
Ho, T. K. (1995). Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, 1, 278-282. https://doi.org/10.1109/ICDAR.1995.598994
Huang, M. L., Kerman, R., & Spektor, S. (2017). An estimate of the root mean square error incurred when approximating an f∈L2(R) by a partial sum of its Hermite series. arXiv:1709.03039. https://doi.org/10.48550/arXiv.1709.03039
Inouye, D. I., Leqi, L., Kim, J. S., Aragam, B., & Ravikumar, P. (2020). Automated Dependence Plots. arXiv:1912.01108v3. https://arxiv.org/abs/1912.01108v3
Jin, H., & Montúfar, G. (2023). Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks. arXiv:2006.07356. https://doi.org/10.48550/arXiv.2006.07356
Jones, T. (2019). A Coefficient of Determination for Probabilistic Topic Models. arXiv:1911.11061. https://doi.org/10.48550/arXiv.1911.11061
Kato, S., & Hotta, K. (2021). MSE Loss with Outlying Label for Imbalanced Classification. arXiv:2107.02393. https://doi.org/10.48550/arXiv.2107.02393
Kim, T., Oh, J., Kim, N., Cho, S., & Yun, S.-Y. (2021). Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation. arXiv:2105.08919. https://doi.org/10.48550/arXiv.2105.08919
Kouakou, B. S. M., Yoboué, A. A. N., Pita, J. S., Mutuku, J. M., Otron, D. H., Kouassi, N. K., Kouassi, K. M., Vanié-Léabo, L. P. L., Ndougonna, C., Zouzou, M., & Sorho, F. (2024). Gradual Emergence of East African cassava mosaic Cameroon virus in Cassava Farms in Côte d’Ivoire. Agronomy, 14(3), 3. https://doi.org/10.3390/agronomy14030418
Li, W., Cook, D., Tanaka, E., & VanderPlas, S. (2023). A Plot is Worth a Thousand Tests: Assessing Residual Diagnostics with the Lineup Protocol. arXiv:2308.05964v2. https://arxiv.org/abs/2308.05964v2
Li, X., Wang, Y., Basu, S., Kumbier, K., & Yu, B. (2019). A Debiased MDI Feature Importance Measure for Random Forests. arXiv:1906.10845v2. https://arxiv.org/abs/1906.10845v2
Lingasubramanian, K., Alam, S. M., & Bhanja, S. (2011). Maximum Error Modeling for Fault-Tolerant Computation using Maximum a posteriori (MAP) Hypothesis. Microelectronics Reliability, 51(2), 485-501. https://doi.org/10.1016/j.microrel.2010.07.156
Malik, A. I., Sophearith, S., Delaquis, E., Cuellar, W. J., Jimenez, J., & Newby, J. C. (2022). Susceptibility of Cassava Varieties to Disease Caused by Sri Lankan Cassava Mosaic Virus and Impacts on Yield by Use of Asymptomatic and Virus-Free Planting Material. Agronomy, 12(7), 7. https://doi.org/10.3390/agronomy12071658
Meyo, E. S. M., & Liang, D. (2012). Gap Analysis of Cassava Sector in Cameroon. International Journal of Economics and Management Engineering, 6(11), 2792-2799.
Molnar, C., Freiesleben, T., König, G., Casalicchio, G., Wright, M. N., & Bischl, B. (2021). Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process. In L. Longo (Eds.), Explainable Artificial Intelligence (Vol. 1901) Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-031-44064-9_24
Moosbauer, J., Herbinger, J., Casalicchio, G., Lindauer, M., & Bischl, B. (2021, November 8). Explaining Hyperparameter Optimization via Partial Dependence Plots. arXiv:2111.04820v2. https://arxiv.org/abs/2111.04820v2
Nam, Y., & Han, S. (2023). Random Forest Variable Importance-based Selection Algorithm in Class Imbalance Problem. arXiv:2312.10573. https://doi.org/10.48550/arXiv.2312.10573
Naseem, S., & Winter, S. (2016). Quantification of African cassava mosaic virus (ACMV) and East African cassava mosaic virus (EACMV-UG) in single and mixed infected Cassava (Manihot esculenta Crantz) using quantitative PCR. Journal of Virological Methods, 227, 23-32. https://doi.org/10.1016/j.jviromet.2015.10.001
Njukwe, E., Onadipe, O., Amadou Thierno, D., Hanna, R., Kirscht, H., Maziya-Dixon, B. B., Araki, S., & Ngue-Bissa, T. (2014). Cassava processing among smallholder farmers in Cameroon: Opportunities and challenges. International Journal of Agricultural Policy and Research, 2(4), 113-124.
Oliveira, N. L., Lei, J., & Tibshirani, R. J. (2023). Unbiased Test Error Estimation in the Poisson Means Problem via Coupled Bootstrap Techniques. arXiv:2212.01943. https://doi.org/10.48550/arXiv.2212.01943
Piepho, H.-P. (2018). A Coefficient of Determination (R2) for Linear Mixed Models. arXiv:1805.01124. https://doi.org/10.48550/arXiv.1805.01124
Popuri, S. K. (2022). An Approximation Method for Fitted Random Forests. arXiv:2207.02184. https://doi.org/10.48550/arXiv.2207.02184
Qi, J., Du, J., Siniscalchi, S. M., Ma, X., & Lee, C.-H. (2020a). Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network Based Vector-to-Vector Regression. IEEE Transactions on Signal Processing, 68, 3411-3422. https://doi.org/10.1109/TSP.2020.2993164
Qi, J., Du, J., Siniscalchi, S. M., Ma, X., & Lee, C.-H. (2020b). On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression. IEEE Signal Processing Letters, 27, 1485-1489. https://doi.org/10.1109/LSP.2020.3016837
Raymaekers, J., & Rousseeuw, P. J. (2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. arXiv:2106.08814v2. https://arxiv.org/abs/2106.08814v2
Reiter, S., & Werner, S. W. R. (2024). Interpolatory model order reduction of large-scale dynamical systems with root mean squared error measures. arXiv:2403.08894. https://doi.org/10.48550/arXiv.2403.08894
Scornet, E. (2020). Trees, forests, and impurity-based variable importance. arXiv:2001.04295v3. https://arxiv.org/abs/2001.04295v3
Sheat, S., & Winter, S. (2023). Developing broad-spectrum resistance in cassava against viruses causing the cassava mosaic and the cassava brown streak diseases. Frontiers in Plant Science, 14, 1042701. https://doi.org/10.3389/fpls.2023.1042701
Sheat, S., Zhang, X., & Winter, S. (2022). High-Throughput Virus Screening in Crosses of South American and African Cassava Germplasm Reveals Broad-Spectrum Resistance against Viruses Causing Cassava Brown Streak Disease and Cassava Mosaic Virus Disease. Agronomy, 12(5), 5. https://doi.org/10.3390/agronomy12051055
Shirima, R. R., Wosula, E. N., Hamza, A. A., Mohammed, N. A., Mouigni, H., Nouhou, S., Mchinda, N. M., Ceasar, G., Amour, M., Njukwe, E., & Legg, J. P. (2022). Epidemiological Analysis of Cassava Mosaic and Brown Streak Diseases, and Bemisia tabaci in the Comoros Islands. Viruses, 14(10), 10. https://doi.org/10.3390/v14102165
Sluijterman, L., Kreuwel, F., Cator, E., & Heskes, T. (2024). Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. arXiv:2406.02293. https://doi.org/10.48550/arXiv.2406.02293
Soro, M., Tiendrébéogo, F., Pita, J. S., Traoré, E. T., Somé, K., Tibiri, E. B., Néya, J. B., Mutuku, J. M., Simporé, J., & Koné, D. (2021). Epidemiological assessment of cassava mosaic disease in Burkina Faso. Plant Pathology, 70(9), 2207-2216. https://doi.org/10.1111/ppa.13459
Surve, T., & Pradhan, R. (2024). Example-based Explanations for Random Forests using Machine Unlearning. arXiv:2402.05007. https://doi.org/10.48550/arXiv.2402.05007
Thresh, J. M., & Cooter, R. J. (2005). Strategies for controlling cassava mosaic virus disease in Africa. Plant Pathology, 54(5), 587-614. https://doi.org/10.1111/j.1365-3059.2005.01282.x
Thuy, C. T. L., Lopez-Lavalle, L. A. B., Vu, N. A., Hy, N. H., Nhan, P. T., Ceballos, H., Newby, J., Tung, N. B., Hien, N. T., Tuan, L. N., Hung, N., Hanh, N. T., Trang, D. T., Ha, P. T. T., Ham, L. H., Hoi Pham, X., Quynh, D. T. N., Rabbi, I. Y., Kulakow, P. A., & Zhang, X. (2021). Identifying New Resistance to Cassava Mosaic Disease and Validating Markers for the CMD2 Locus. Agriculture, 11(9), 9. https://doi.org/10.3390/agriculture11090829
Tize, I., Fotso, A. K., Nukenine, E. N., Masso, C., Ngome, F. A., Suh, C., Lendzemo, V. W., Nchoutnji, I., Manga, G., Parkes, E., Kulakow, P., Kouebou, C., Fiaboe, K. K. M., & Hanna, R. (2021). New cassava germplasm for food and nutritional security in Central Africa. Scientific Reports, 11, 7394. https://doi.org/10.1038/s41598-021-86958-w
Uke, A., Tokunaga, H., Utsumi, Y., Vu, N. A., Nhan, P. T., Srean, P., Hy, N. H., Ham, L. H., Lopez-Lavalle, L. A. B., Ishitani, M., Hung, N., Tuan, L. N., Van Hong, N., Huy, N. Q., Hoat, T. X., Takasu, K., Seki, M., & Ugaki, M. (2022). Cassava mosaic disease and its management in Southeast Asia. Plant Molecular Biology, 109(3), 301-311. https://doi.org/10.1007/s11103-021-01168-2
Utkin, L. V., & Konstantinov, A. V. (2022). Attention and Self-Attention in Random Forests. arXiv:2207.04293. https://doi.org/10.48550/arXiv.2207.04293
Waltz, N. (2024). Grafting: Making Random Forests Consistent. arXiv:2403.06015. https://doi.org/10.48550/arXiv.2403.06015
Wang, X., Hua, Y., Kodirov, E., Clifton, D. A., & Robertson, N. M. (2023). IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude’s Variance Matters. arXiv:1903.12141. https://doi.org/10.48550/arXiv.1903.12141
Warton, D. I. (2022). Global simulation envelopes for diagnostic plots in regression models. arXiv:2208.01811v2. https://arxiv.org/abs/2208.01811v2
Watson, D. S., Blesch, K., Kapar, J., & Wright, M. N. (2023). Adversarial random forests for density estimation and generative modeling. arXiv:2205.09435. https://doi.org/10.48550/arXiv.2205.09435
Wüthrich, M. V., & Merz, M. (2023). Selected Topics in Deep Learning. In M. V. Wüthrich & M. Merz (Eds.), Statistical Foundations of Actuarial Learning and its Applications (pp. 453-535) Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-031-12409-9_11
Xie, P. (2024). Hyb Error: A Hybrid Metric Combining Absolute and Relative Errors. arXiv:2403.07492. https://doi.org/10.48550/arXiv.2403.07492
Xin, X., Hooker, G., & Huang, F. (2024). Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots. arXiv:2404.18702v2. https://arxiv.org/abs/2404.18702v2
Zhang, Q. (2019). A Class of Association Measures for Categorical Variables Based on Weighted Minkowski Distance. Entropy, 21(10), 10. https://doi.org/10.3390/e21100990
Zhang, Q. (2024). On the properties of distance covariance for categorical data: Robustness, sure screening, and approximate null distributions. arXiv:2403.17882. https://doi.org/10.48550/arXiv.2403.17882
Zhu, W. (2022). Statistical parameters for assessing environmental model performance related to sample size: Case study in ocean color remote sensing. arXiv:2208.05743. https://doi.org/10.48550/arXiv.2208.05743
Zollanvari, A., & Dougherty, E. R. (2013). Moments and Root-Mean-Square Error of the Bayesian MMSE Estimator of Classification Error in the Gaussian Model. arXiv:1310.1519. https://doi.org/10.48550/arXiv.1310.1519