A random forest-based analysis of cassava mosaic disease-related factors affecting the on-farm livelihoods of cassava farmers

Authors

  • Dickmi Vaillam Claudette Faculty of Agronomy and Agricultural Sciences, University of Dschang, P.O. Box 96, Dschang, West Region, Cameroon
  • Tchouamo Isaac Roger Faculty of Agronomy and Agricultural Sciences, University of Dschang, P.O. Box 96, Dschang, West Region, Cameroon

DOI:

https://doi.org/10.21839/jaar.2024.v9.8993

Keywords:

Cassava mosaic disease, Cassava farmers, Cassava income, Random Forest

Abstract

This study aimed to identify key CMD-related factors affecting Cameroon cassava farmers’ incomes originating from both the sale of cassava cuttings (V215) and the sale of cassava roots (V216). To achieve this, nine CMD-related variables were used to independently train two Random Forest models. These models were later employed for regression-based prediction of both financial targets V215 and V216. The Random Forest (RF)-based mean absolute percentage error for targets V215 and V216 were 0.19 and 1.25 respectively. The RF-based mean Gaussian deviance for targets V215 and V216 were 0.07 and 0.51 respectively. Based on RF feature importance scores (RFFI), the top 3 factors affecting income originating from the sale of cassava cuttings were found to be: late appearance of symptoms as a difficulty associated with regular field monitoring (RFFI of 0.2594), removal of infected plants as a method of controlling frequent occurrence of viral diseases in respondents’ cassava fields (RFFI of 0.1633) and lack of healthy planting material due to frequent occurrence of viral diseases in respondents’ cassava fields (RFFI of 0.1495). Also, the top 3 factors affecting income originating from the sale of cassava roots were found to be: the replacement of infected plants with healthy cuttings as a method of controlling the frequent occurrence of viral diseases in respondents’ cassava fields (RFFI of 0.1974), decrease in yield due to frequent occurrence of viral diseases in respondents’ cassava fields (RFFI of 0.1530) and poor plant growth due to frequent occurrence of viral diseases in respondents’ cassava fields (RFFI of 0.1388).

Downloads

Download data is not yet available.

References

Agarwal, A., Kenney, A. M., Tan, Y. S., Tang, T. M., & Yu, B. (2023). MDI+: A Flexible Random Forest-Based Feature Importance Framework. arXiv:2307.01932. https://arxiv.org/abs/2307.01932v1

Akiyo, S. (2013). Cassava Processing and Marketing by Rural Women in the Central Region of Cameroon. African Study Monographs, 34(4), 203-219. https://doi.org/10.14989/185092

Alabi, O. J., & Mulenga, R. M. (2017). African cassava mosaic virus (African cassava mosaic). CABI Compendium, 2535. https://doi.org/10.1079/cabicompendium.2535

Aslam, M., & Smarandache, F. (2023). Chi-square test for imprecise data in consistency table. Frontiers in Applied Mathematics and Statistics, 9, 1279638. https://doi.org/10.3389/fams.2023.1279638

Baniecki, H., Kretowicz, W., & Biecek, P. (2023). Fooling Partial Dependence via Data Poisoning. In M. R. Amini, S. Canu, A. Fischer, T. Guns, P. K. Novak, & G. Tsoumakas (Eds.), Machine Learning and Knowledge Discovery in Databases (Vol. 13715) Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-031-26409-2_8

Barreñada, L., Dhiman, P., Timmerman, D., Boulesteix, A.-L., & Van Calster, B. (2024). Understanding random forests and overfitting: A visualization and simulation study. arXiv:2402.18612. https://doi.org/10.48550/arXiv.2402.18612

Baumgärtner, L., Herzog, R., Schmidt, S., & Weiß, M. (2023). The Proximal Map of the Weighted Mean Absolute Error. arXiv:2209.13545. https://doi.org/10.48550/arXiv.2209.13545

Belliardo, F., & Giovannetti, V. (2020). Achieving Heisenberg scaling with maximally entangled states: An analytic upper bound for the attainable root mean square error. Physical Review A, 102(4), 042613. https://doi.org/10.1103/PhysRevA.102.042613

Benhamou, E., & Melot, V. (2018). Seven proofs of the Pearson Chi-squared independence test and its graphical interpretation. arXiv:1808.09171. https://doi.org/10.48550/arXiv.1808.09171

Bilong, E. G., Abossolo-Angue, M., Ajebesone, F. N., Anaba, B. D., Madong, B. À., Nomo, L. B., & Bilong, P. (2022). Improving soil physical properties and cassava productivity through organic manures management in the southern Cameroon. Heliyon, 8(6), e09570. https://doi.org/10.1016/j.heliyon.2022.e09570

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324

Broutin, N., Devroye, L., Lugosi, G., & Oliveira, R. I. (2024). Subtractive random forests arXiv:2210.10544. https://doi.org/10.48550/arXiv.2210.10544

Busch, P., Lahti, P., & Werner, R. F. (2014). Quantum root-mean-square error and measurement uncertainty relations. Reviews of Modern Physics, 86(4), 1261-1281. https://doi.org/10.1103/RevModPhys.86.1261

Cardona, L. A. S., Vargas-Cardona, H. D., Navarro González, P., Cardenas Peña, D. A., & Orozco Gutiérrez, Á. Á. (2020). Classification of Categorical Data Based on the Chi-Square Dissimilarity and t-SNE. Computation, 8(4), 4. https://doi.org/10.3390/computation8040104

Cassava Mosaic Disease. (n.d.). Cassava Mosaic Disease: A Curse to Food Security in Sub-Saharan Africa. Retrieved from https://www.apsnet.org/edcenter/apsnetfeatures/Pages/cassava.aspx

Chaiyana, A., Khiripet, N., Ninsawat, S., Siriwan, W., Shanmugam, M. S., & Virdis, S. G. P. (2024). Mapping and predicting cassava mosaic disease outbreaks using earth observation and meteorological data-driven approaches. Remote Sensing Applications: Society and Environment, 35, 101231. https://doi.org/10.1016/j.rsase.2024.101231

Chamma, A., Thirion, B., & Engemann, D. A. (2023, December 18). Variable Importance in High-Dimensional Settings Requires Grouping. arXiv:2312.10858. https://arxiv.org/abs/2312.10858v1

Cheema, M., Amin, M., Mahmood, T., Faisal, M., Brahim, K., & Elhassanein, A. (2023). Deviance and Pearson Residuals-Based Control Charts with Different Link Functions for Monitoring Logistic Regression Profiles: An Application to COVID-19 Data. Mathematics, 11(5), 5. https://doi.org/10.3390/math11051113

Chi, C.-M., Fan, Y., & Lv, J. (2023). FACT: High-Dimensional Random Forests Inference. arXiv:2207.01678). https://doi.org/10.48550/arXiv.2207.01678

Chikoti, P. C., & Tembo, M. (2022). Expansion and impact of cassava brown streak and cassava mosaic diseases in Africa: A review. Frontiers in Sustainable Food Systems, 6, 1076364. https://doi.org/10.3389/fsufs.2022.1076364

Chikoti, P. C., Mulenga, R. M., Tembo, M., & Sseruwagi, P. (2019). Cassava mosaic disease: A review of a threat to cassava production in Zambia. Journal of Plant Pathology, 101(3), 467-477. https://doi.org/10.1007/s42161-019-00255-0

Curth, A., Jeffares, A., & van der Schaar, M. (2024). Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers. arXiv:2402.01502. https://doi.org/10.48550/arXiv.2402.01502

Das, K., Jiang, J., & Rao, J. N. K. (2004). Mean squared error of empirical predictor. The Annals of Statistics, 32(2), 818-840. https://doi.org/10.1214/009053604000000201

De Myttenaere, A., Golden, B., Grand, B. L., & Rossi, F. (2015a). Using the Mean Absolute Percentage Error for Regression Models. arXiv:1506.04176. https://doi.org/10.48550/arXiv.1506.04176

De Myttenaere, A., Golden, B., Grand, B. L., & Rossi, F. (2016). Mean Absolute Percentage Error for regression models. Neurocomputing, 192, 38-48. https://doi.org/10.1016/j.neucom.2015.12.114

De Myttenaere, A., Grand, B. L., & Rossi, F. (2015b). Empirical risk minimization is consistent with the mean absolute percentage error. arXiv:1509.02357. https://doi.org/10.48550/arXiv.1509.02357

Evouna, J. S. M., Molua, E. L., Choumbou, R. F. D., & Kambiet, P. L. K. (2024). Structure and performance of cassava markets: Challenges of food security and connecting small farmers to markets in Cameroon. Frontiers in Sustainable Food Systems, 8, 1353565. https://doi.org/10.3389/fsufs.2024.1353565

Ferry, J., Fukasawa, R., Pascal, T., & Vidal, T. (2024). Trained Random Forests Completely Reveal your Dataset. arXiv:2402.19232. https://doi.org/10.48550/arXiv.2402.19232

Fondong, V. N. (2017). The Search for Resistance to Cassava Mosaic Geminiviruses: How Much We Have Accomplished, and What Lies Ahead. Frontiers in Plant Science, 8, 408. https://doi.org/10.3389/fpls.2017.00408

Fumagalli, F., Muschalik, M., Hüllermeier, E., & Hammer, B. (2023). Incremental Permutation Feature Importance (iPFI): Towards Online Explanations on Data Streams. Machine Learning, 112, 4863-4903. https://doi.org/10.1007/s10994-023-06385-y

Gaboardi, M., woo Lim, H., Rogers, R., & Vadhan, S. (2016). Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing. arXiv:1602.03090. https://doi.org/10.48550/arXiv.1602.03090

Hareesh, P. S., Resmi, T. R., Sheela, M. N., & Makeshkumar, T. (2023). Cassava mosaic disease in South and Southeast Asia: Current status and prospects. Frontiers in Sustainable Food Systems, 7, 1086660. https://doi.org/10.3389/fsufs.2023.1086660

Hassan, A., Paik, J. H., Khare, S., & Hassan, S. A. (2021). PPFS: Predictive Permutation Feature Selection. arXiv:2110.10713. https://arxiv.org/abs/2110.10713v1

Hawinkel, S., Waegeman, W., & Maere, S. (2024). The out-of-sample R2: Estimation and inference. The American Statistician, 78(1), 15-25. https://doi.org/10.1080/00031305.2023.2216252

Heath, D. G., Kasif, S., & Salzberg, S. (1993). Induction of Oblique Decision Trees. International Joint Conference on Artificial Intelligence.

Ho, T. K. (1995). Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, 1, 278-282. https://doi.org/10.1109/ICDAR.1995.598994

Huang, M. L., Kerman, R., & Spektor, S. (2017). An estimate of the root mean square error incurred when approximating an f∈L2(R) by a partial sum of its Hermite series. arXiv:1709.03039. https://doi.org/10.48550/arXiv.1709.03039

Inouye, D. I., Leqi, L., Kim, J. S., Aragam, B., & Ravikumar, P. (2020). Automated Dependence Plots. arXiv:1912.01108v3. https://arxiv.org/abs/1912.01108v3

Jin, H., & Montúfar, G. (2023). Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks. arXiv:2006.07356. https://doi.org/10.48550/arXiv.2006.07356

Jones, T. (2019). A Coefficient of Determination for Probabilistic Topic Models. arXiv:1911.11061. https://doi.org/10.48550/arXiv.1911.11061

Kato, S., & Hotta, K. (2021). MSE Loss with Outlying Label for Imbalanced Classification. arXiv:2107.02393. https://doi.org/10.48550/arXiv.2107.02393

Kim, T., Oh, J., Kim, N., Cho, S., & Yun, S.-Y. (2021). Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation. arXiv:2105.08919. https://doi.org/10.48550/arXiv.2105.08919

Kouakou, B. S. M., Yoboué, A. A. N., Pita, J. S., Mutuku, J. M., Otron, D. H., Kouassi, N. K., Kouassi, K. M., Vanié-Léabo, L. P. L., Ndougonna, C., Zouzou, M., & Sorho, F. (2024). Gradual Emergence of East African cassava mosaic Cameroon virus in Cassava Farms in Côte d’Ivoire. Agronomy, 14(3), 3. https://doi.org/10.3390/agronomy14030418

Li, W., Cook, D., Tanaka, E., & VanderPlas, S. (2023). A Plot is Worth a Thousand Tests: Assessing Residual Diagnostics with the Lineup Protocol. arXiv:2308.05964v2. https://arxiv.org/abs/2308.05964v2

Li, X., Wang, Y., Basu, S., Kumbier, K., & Yu, B. (2019). A Debiased MDI Feature Importance Measure for Random Forests. arXiv:1906.10845v2. https://arxiv.org/abs/1906.10845v2

Lingasubramanian, K., Alam, S. M., & Bhanja, S. (2011). Maximum Error Modeling for Fault-Tolerant Computation using Maximum a posteriori (MAP) Hypothesis. Microelectronics Reliability, 51(2), 485-501. https://doi.org/10.1016/j.microrel.2010.07.156

Malik, A. I., Sophearith, S., Delaquis, E., Cuellar, W. J., Jimenez, J., & Newby, J. C. (2022). Susceptibility of Cassava Varieties to Disease Caused by Sri Lankan Cassava Mosaic Virus and Impacts on Yield by Use of Asymptomatic and Virus-Free Planting Material. Agronomy, 12(7), 7. https://doi.org/10.3390/agronomy12071658

Meyo, E. S. M., & Liang, D. (2012). Gap Analysis of Cassava Sector in Cameroon. International Journal of Economics and Management Engineering, 6(11), 2792-2799.

Molnar, C., Freiesleben, T., König, G., Casalicchio, G., Wright, M. N., & Bischl, B. (2021). Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process. In L. Longo (Eds.), Explainable Artificial Intelligence (Vol. 1901) Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-031-44064-9_24

Moosbauer, J., Herbinger, J., Casalicchio, G., Lindauer, M., & Bischl, B. (2021, November 8). Explaining Hyperparameter Optimization via Partial Dependence Plots. arXiv:2111.04820v2. https://arxiv.org/abs/2111.04820v2

Nam, Y., & Han, S. (2023). Random Forest Variable Importance-based Selection Algorithm in Class Imbalance Problem. arXiv:2312.10573. https://doi.org/10.48550/arXiv.2312.10573

Naseem, S., & Winter, S. (2016). Quantification of African cassava mosaic virus (ACMV) and East African cassava mosaic virus (EACMV-UG) in single and mixed infected Cassava (Manihot esculenta Crantz) using quantitative PCR. Journal of Virological Methods, 227, 23-32. https://doi.org/10.1016/j.jviromet.2015.10.001

Njukwe, E., Onadipe, O., Amadou Thierno, D., Hanna, R., Kirscht, H., Maziya-Dixon, B. B., Araki, S., & Ngue-Bissa, T. (2014). Cassava processing among smallholder farmers in Cameroon: Opportunities and challenges. International Journal of Agricultural Policy and Research, 2(4), 113-124.

Oliveira, N. L., Lei, J., & Tibshirani, R. J. (2023). Unbiased Test Error Estimation in the Poisson Means Problem via Coupled Bootstrap Techniques. arXiv:2212.01943. https://doi.org/10.48550/arXiv.2212.01943

Piepho, H.-P. (2018). A Coefficient of Determination (R2) for Linear Mixed Models. arXiv:1805.01124. https://doi.org/10.48550/arXiv.1805.01124

Popuri, S. K. (2022). An Approximation Method for Fitted Random Forests. arXiv:2207.02184. https://doi.org/10.48550/arXiv.2207.02184

Qi, J., Du, J., Siniscalchi, S. M., Ma, X., & Lee, C.-H. (2020a). Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network Based Vector-to-Vector Regression. IEEE Transactions on Signal Processing, 68, 3411-3422. https://doi.org/10.1109/TSP.2020.2993164

Qi, J., Du, J., Siniscalchi, S. M., Ma, X., & Lee, C.-H. (2020b). On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression. IEEE Signal Processing Letters, 27, 1485-1489. https://doi.org/10.1109/LSP.2020.3016837

Raymaekers, J., & Rousseeuw, P. J. (2021). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. arXiv:2106.08814v2. https://arxiv.org/abs/2106.08814v2

Reiter, S., & Werner, S. W. R. (2024). Interpolatory model order reduction of large-scale dynamical systems with root mean squared error measures. arXiv:2403.08894. https://doi.org/10.48550/arXiv.2403.08894

Scornet, E. (2020). Trees, forests, and impurity-based variable importance. arXiv:2001.04295v3. https://arxiv.org/abs/2001.04295v3

Sheat, S., & Winter, S. (2023). Developing broad-spectrum resistance in cassava against viruses causing the cassava mosaic and the cassava brown streak diseases. Frontiers in Plant Science, 14, 1042701. https://doi.org/10.3389/fpls.2023.1042701

Sheat, S., Zhang, X., & Winter, S. (2022). High-Throughput Virus Screening in Crosses of South American and African Cassava Germplasm Reveals Broad-Spectrum Resistance against Viruses Causing Cassava Brown Streak Disease and Cassava Mosaic Virus Disease. Agronomy, 12(5), 5. https://doi.org/10.3390/agronomy12051055

Shirima, R. R., Wosula, E. N., Hamza, A. A., Mohammed, N. A., Mouigni, H., Nouhou, S., Mchinda, N. M., Ceasar, G., Amour, M., Njukwe, E., & Legg, J. P. (2022). Epidemiological Analysis of Cassava Mosaic and Brown Streak Diseases, and Bemisia tabaci in the Comoros Islands. Viruses, 14(10), 10. https://doi.org/10.3390/v14102165

Sluijterman, L., Kreuwel, F., Cator, E., & Heskes, T. (2024). Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss. arXiv:2406.02293. https://doi.org/10.48550/arXiv.2406.02293

Soro, M., Tiendrébéogo, F., Pita, J. S., Traoré, E. T., Somé, K., Tibiri, E. B., Néya, J. B., Mutuku, J. M., Simporé, J., & Koné, D. (2021). Epidemiological assessment of cassava mosaic disease in Burkina Faso. Plant Pathology, 70(9), 2207-2216. https://doi.org/10.1111/ppa.13459

Surve, T., & Pradhan, R. (2024). Example-based Explanations for Random Forests using Machine Unlearning. arXiv:2402.05007. https://doi.org/10.48550/arXiv.2402.05007

Thresh, J. M., & Cooter, R. J. (2005). Strategies for controlling cassava mosaic virus disease in Africa. Plant Pathology, 54(5), 587-614. https://doi.org/10.1111/j.1365-3059.2005.01282.x

Thuy, C. T. L., Lopez-Lavalle, L. A. B., Vu, N. A., Hy, N. H., Nhan, P. T., Ceballos, H., Newby, J., Tung, N. B., Hien, N. T., Tuan, L. N., Hung, N., Hanh, N. T., Trang, D. T., Ha, P. T. T., Ham, L. H., Hoi Pham, X., Quynh, D. T. N., Rabbi, I. Y., Kulakow, P. A., & Zhang, X. (2021). Identifying New Resistance to Cassava Mosaic Disease and Validating Markers for the CMD2 Locus. Agriculture, 11(9), 9. https://doi.org/10.3390/agriculture11090829

Tize, I., Fotso, A. K., Nukenine, E. N., Masso, C., Ngome, F. A., Suh, C., Lendzemo, V. W., Nchoutnji, I., Manga, G., Parkes, E., Kulakow, P., Kouebou, C., Fiaboe, K. K. M., & Hanna, R. (2021). New cassava germplasm for food and nutritional security in Central Africa. Scientific Reports, 11, 7394. https://doi.org/10.1038/s41598-021-86958-w

Uke, A., Tokunaga, H., Utsumi, Y., Vu, N. A., Nhan, P. T., Srean, P., Hy, N. H., Ham, L. H., Lopez-Lavalle, L. A. B., Ishitani, M., Hung, N., Tuan, L. N., Van Hong, N., Huy, N. Q., Hoat, T. X., Takasu, K., Seki, M., & Ugaki, M. (2022). Cassava mosaic disease and its management in Southeast Asia. Plant Molecular Biology, 109(3), 301-311. https://doi.org/10.1007/s11103-021-01168-2

Utkin, L. V., & Konstantinov, A. V. (2022). Attention and Self-Attention in Random Forests. arXiv:2207.04293. https://doi.org/10.48550/arXiv.2207.04293

Waltz, N. (2024). Grafting: Making Random Forests Consistent. arXiv:2403.06015. https://doi.org/10.48550/arXiv.2403.06015

Wang, X., Hua, Y., Kodirov, E., Clifton, D. A., & Robertson, N. M. (2023). IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude’s Variance Matters. arXiv:1903.12141. https://doi.org/10.48550/arXiv.1903.12141

Warton, D. I. (2022). Global simulation envelopes for diagnostic plots in regression models. arXiv:2208.01811v2. https://arxiv.org/abs/2208.01811v2

Watson, D. S., Blesch, K., Kapar, J., & Wright, M. N. (2023). Adversarial random forests for density estimation and generative modeling. arXiv:2205.09435. https://doi.org/10.48550/arXiv.2205.09435

Wüthrich, M. V., & Merz, M. (2023). Selected Topics in Deep Learning. In M. V. Wüthrich & M. Merz (Eds.), Statistical Foundations of Actuarial Learning and its Applications (pp. 453-535) Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-031-12409-9_11

Xie, P. (2024). Hyb Error: A Hybrid Metric Combining Absolute and Relative Errors. arXiv:2403.07492. https://doi.org/10.48550/arXiv.2403.07492

Xin, X., Hooker, G., & Huang, F. (2024). Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots. arXiv:2404.18702v2. https://arxiv.org/abs/2404.18702v2

Zhang, Q. (2019). A Class of Association Measures for Categorical Variables Based on Weighted Minkowski Distance. Entropy, 21(10), 10. https://doi.org/10.3390/e21100990

Zhang, Q. (2024). On the properties of distance covariance for categorical data: Robustness, sure screening, and approximate null distributions. arXiv:2403.17882. https://doi.org/10.48550/arXiv.2403.17882

Zhu, W. (2022). Statistical parameters for assessing environmental model performance related to sample size: Case study in ocean color remote sensing. arXiv:2208.05743. https://doi.org/10.48550/arXiv.2208.05743

Zollanvari, A., & Dougherty, E. R. (2013). Moments and Root-Mean-Square Error of the Bayesian MMSE Estimator of Classification Error in the Gaussian Model. arXiv:1310.1519. https://doi.org/10.48550/arXiv.1310.1519

Published

24-06-2024

Issue

Section

Articles