Integrating genetic markers and adiabatic quantum machine learning to improve disease resistance-based marker assisted plant selection
DOI:
https://doi.org/10.25081/jsa.2023.v7.8556Keywords:
Plant disease resistance, Marker-assisted plant selection, Genetic markers, Adiabatic quantum computingAbstract
The goal of this research was to create a more accurate and efficient method for selecting plants with disease resistance using a combination of genetic markers and advanced machine learning algorithms. A multi-disciplinary approach incorporating genomic data, machine learning algorithms and high-performance computing was employed. First, genetic markers highly associated with disease resistance were identified using next-generation sequencing data and statistical analysis. Then, an adiabatic quantum machine learning algorithm was developed to integrate these markers into a single predictor of disease susceptibility. The results demonstrate that the integrative use of genetic markers and adiabatic quantum machine learning significantly improved the accuracy and efficiency of disease resistance-based marker-assisted plant selection. By leveraging the power of adiabatic quantum computing and genetic markers, more effective and efficient strategies for disease resistance-based marker-assisted plant selection can be developed.
Downloads
References
Adhikari, P., Oh, Y., & Panthee, D. R. (2017). Current Status of Early Blight Resistance in Tomato: An Update. International Journal of Molecular Sciences, 18(10), 2019. https://doi.org/10.3390/ijms18102019
Adhikari, T. B., Siddique, M. I., Louws, F. J., Sim, S.-C., & Panthee, D. R. (2023). Molecular mapping of quantitative trait loci for resistance to early blight in tomatoes. Frontiers in Plant Science, 14, 1135884. https://doi.org/10.3389/fpls.2023.1135884
AlNuaimi, N., Masud, M. M., Serhani, M. A., & Zaki, N. (2020). Streaming feature selection algorithms for big data: A survey. Applied Computing and Informatics, 18(1/2), 113-135. https://doi.org/10.1016/j.aci.2019.01.001
Arafa, R. A., Rakha, M. T., Soliman, N. E. K., Moussa, O. M., Kamel, S. M., & Shirasawa, K. (2017). Rapid identification of candidate genes for resistance to tomato late blight disease using next-generation sequencing technologies. PLoS One, 12(12), e0189951. https://doi.org/10.1371/journal.pone.0189951
Atashgahi, Z., Zhang, X., Kichler, N., Liu, S., Yin, L., Pechenizkiy, M., Veldhuis, R., & Mocanu, D. C. (2023). Supervised Feature Selection with Neuron Evolution in Sparse Neural Networks (arXiv:2303.07200). arXiv. https://doi.org/10.48550/arXiv.2303.07200
Bacanin, N., Zivkovic, M., Antonijevic, M., Venkatachalam, K., Lee, J., Nam, Y., Marjanovic, M., Strumberger, I., & Abouhawwash, M. (2023). Addressing feature selection and extreme learning machine tuning by diversity-oriented social network search: An application for phishing websites detection. Complex & Intelligent Systems. https://doi.org/10.1007/s40747-023-01118-z
Barzilay, O., & Brailovsky, V. L. (1999). On domain knowledge and feature selection using a support vector machine. Pattern Recognition Letters, 20(5), 475-484. https://doi.org/10.1016/S0167-8655(99)00014-8
Bashir, S., Rehman, N., Zaman, F. F., Naeem, M. K., Jamal, A., Tellier, A., Ilyas, M., Arias, G. A. S., & Khan, M. R. (2022). Genome-wide characterization of the NLR gene family in tomato (Solanum lycopersicum) and their relatedness to disease resistance. Frontiers in Genetics, 13. https://doi.org/10.3389/fgene.2022.931580
Benos, L., Tagarakis, A. C., Dolias, G., Berruto, R., Kateris, D., & Bochtis, D. (2021). Machine Learning in Agriculture: A Comprehensive Updated Review. Sensors, 21(11), 3758. https://doi.org/10.3390/s21113758
Bhat, J. A., Ali, S., Salgotra, R. K., Mir, Z. A., Dutta, S., Jadon, V., Tyagi, A., Mushtaq, M., Jain, N., Singh, P. K., Singh, G. P., & Prabhu, K. V. (2016). Genomic Selection in the Era of Next Generation Sequencing for Complex Traits in Plant Breeding. Frontiers in Genetics, 7, 221. https://doi.org/10.3389/fgene.2016.00221
Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., & Lloyd, S. (2017). Quantum Machine Learning. Nature, 549, 195-202. https://doi.org/10.1038/nature23474
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140. https://doi.org/10.1007/BF00058655
Brzezinski, D. (2020). Fibonacci and k-Subsecting Recursive Feature Elimination (arXiv:2007.14920). arXiv. https://doi.org/10.48550/arXiv.2007.14920
Buschjäger, S., & Morik, K. (2021). There is no Double-Descent in Random Forests (arXiv:2111.04409). arXiv. https://doi.org/10.48550/arXiv.2111.04409
Clark, S. A., & van der Werf, J. (2013). Genomic best linear unbiased prediction (gBLUP) for the estimation of genomic breeding values. In C. Gondro, J. van der Werf & B. Hayes (Eds.), Genome-Wide Association Studies and Genomic Prediction: Methods in Molecular Biology (Vol. 1019, pp. 321-330) Totowa, New Jersey: Humana Press. https://doi.org/10.1007/978-1-62703-447-0_13
Clarke, G. P., & Kapelner, A. (2020). The Bayesian Additive Regression Trees Formula for Safe Machine Learning-Based Intraocular Lens Predictions. Frontiers in Big Data, 3. https://doi.org/10.3389/fdata.2020.572134
Colombelli, F., Kowalski, T. W., & Recamonde-Mendoza, M. (2021). A Hybrid Ensemble Feature Selection Design for Candidate Biomarkers Discovery from Transcriptome Profiles (arXiv:2108.00290). arXiv. https://doi.org/10.48550/arXiv.2108.00290
Consul-Pacareu, S., Montaño, R., Rodriguez-Fernandez, K., Corretgé, À., Vilella-Moreno, E., Casado-Faulí, D., & Atchade-Adelomou, P. (2023). Quantum Machine Learning hyperparameter search (arXiv:2302.10298). arXiv. https://doi.org/10.48550/arXiv.2302.10298
Czosnek, H., Eybishtz, A., Sade, D., Gorovits, R., Sobol, I., Bejarano, E., Rosas-Díaz, T., & Lozano-Durán, R. (2013). Discovering Host Genes Involved in the Infection by the Tomato Yellow Leaf Curl Virus Complex and in the Establishment of Resistance to the Virus Using Tobacco Rattle Virus-based Post Transcriptional Gene Silencing. Viruses, 5(3), 998-1022. https://doi.org/10.3390/v5030998
Das, R., Kasieczka, G., & Shih, D. (2022). Feature Selection with Distance Correlation (arXiv:2212.00046). arXiv. https://doi.org/10.48550/arXiv.2212.00046
Date, P., & Potok, T. (2021). Adiabatic Quantum Linear Regression. Scientific Reports, 11, 21905. https://doi.org/10.1038/s41598-021-01445-6
Difabachew, Y. F., Frisch, M., Langstroff, A. L., Stahl, A., Wittkop, B., Snowdon, R. J., Koch, M., Kirchhoff, M., Cselényi, L., Wolf, M., Förster, J., Weber, S., Okoye, U. J., & Zenke-Philippi, C. (2023). Genomic prediction with haplotype blocks in wheat. Frontiers in Plant Science, 14, 1168547. https://doi.org/10.3389/fpls.2023.1168547
Dorleon, G., Megdiche, I., Bricon-Souf, N., & Teste, O. (2022, August 22-24). Feature Selection Under Fairness and Performance Constraints. Big Data Analytics and Knowledge Discovery: 24th International Conference, DaWaK 2022, Vienna, Austria (pp. 125-130). https://doi.org/10.1007/978-3-031-12670-3_11
Duan, Y., Duan, S., Xu, J., Zheng, J., Hu, J., Li, X., Li, B., Li, G., & Jin, L. (2021). Late Blight Resistance Evaluation and Genome-Wide Assessment of Genetic Diversity in Wild and Cultivated Potato Species. Frontiers in Plant Science, 12, 710468. https://doi.org/10.3389/fpls.2021.710468
Elaziz, M. A., Ewees, A. A., Al-qaness, M. A. A., Alshathri, S., & Ibrahim, R. A. (2022). Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization. Mathematics, 10(23), 4565. https://doi.org/10.3390/math10234565
Freijeiro-González, L., Febrero-Bande, M., & González-Manteiga, W. (2020). A critical review of LASSO and its derivatives for variable selection under dependence among covariates (arXiv:2012.11470). arXiv. https://doi.org/10.48550/arXiv.2012.11470
Ghosh, M., Dey, N., Mitra, D., & Chakrabarti, A. (2022). A Novel Quantum Algorithm for Ant Colony Optimization. IET Quantum Communication, 3(1), 13-29. https://doi.org/10.1049/qtc2.12023
Gujju, Y., Matsuo, A., & Raymond, R. (2023). Quantum Machine Learning on Near-Term Quantum Devices: Current State of Supervised and Unsupervised Techniques for Real-World Applications (arXiv:2307.00908). arXiv. https://doi.org/10.48550/arXiv.2307.00908
Han, W., Zhao, J., Deng, X., Gu, A., Li, D., Wang, Y., Lu, X., Zu, Q., Chen, Q., Chen, Q., Zhang, J., & Qu, Y. (2022). Quantitative Trait Locus Mapping and Identification of Candidate Genes for Resistance to Fusarium Wilt Race 7 Using a Resequencing-Based High Density Genetic Bin Map in a Recombinant Inbred Line Population of Gossypium barbadense. Frontiers in Plant Science, 13, 815643. https://doi.org/10.3389/fpls.2022.815643
Jeon, D., Kang, Y., Lee, S., Choi, S., Sung, Y., Lee, T.-H., & Kim, C. (2023). Digitalizing breeding in plants: A new trend of next-generation breeding based on genomic prediction. Frontiers in Plant Science, 14, 1092584. https://doi.org/10.3389/fpls.2023.1092584
Khaire, U. M., & Dhanalakshmi, R. (2022). Stability of feature selection algorithm: A review. Journal of King Saud University - Computer and Information Sciences, 34(4), 1060-1073. https://doi.org/10.1016/j.jksuci.2019.06.012
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1-2), 273-324. https://doi.org/10.1016/S0004-3702(97)00043-X
Krauth, W. (2021). Event-Chain Monte Carlo: Foundations, Applications, and Prospects. Frontiers in Physics, 9, 663457. https://doi.org/10.3389/fphy.2021.663457
Landy, J. (2017). Stepwise regression for unsupervised learning (arXiv:1706.03265). arXiv. https://doi.org/10.48550/arXiv.1706.03265
Letzgus, S., Wagner, P., Lederer, J., Samek, W., Müller, K.-R., & Montavon, G. (2022). Toward Explainable AI for Regression Models. IEEE Signal Processing Magazine, 39(4), 40-58. https://doi.org/10.1109/MSP.2022.3153277
Liu, S., & Motani, M. (2022). Improving Mutual Information based Feature Selection by Boosting Unique Relevance (arXiv:2212.06143). arXiv. https://doi.org/10.48550/arXiv.2212.06143
Louppe, G. (2015). Understanding Random Forests: From Theory to Practice (arXiv:1407.7502). arXiv. https://doi.org/10.48550/arXiv.1407.7502
Ma, N., Chu, W., & Gong, J. (2023). Adiabatic quantum learning (arXiv:2303.01023). arXiv. https://doi.org/10.48550/arXiv.2303.01023
Mahmood, U., Li, X., Fan, Y., Chang, W., Niu, Y., Li, J., Qu, C., & Lu, K. (2022). Multi-omics revolution to promote plant breeding efficiency. Frontiers in Plant Science, 13, 1062952. https://doi.org/10.3389/fpls.2022.1062952
Mao, X., Peng, L., & Wang, Z. (2022). Nonparametric Feature Selection by Random Forests and Deep Neural Networks (arXiv:2201.06821). arXiv. https://doi.org/10.48550/arXiv.2201.06821
Massi, M. C., Franco, N. R., Manzoni, A., Paganoni, A. M., Park, H. A., Hoffmeister, M., Brenner, H., Chang-Claude, J., Ieva, F., & Zunino, P. (2023). Learning high-order interactions for polygenic risk prediction. PLoS One, 18(2), e0281618. https://doi.org/10.1371/journal.pone.0281618
Mathew, B., Hauptmann, A., Léon, J., & Sillanpää, M. J. (2022). NeuralLasso: Neural Networks Meet Lasso in Genomic Prediction. Frontiers in Plant Science, 13, 800161. https://doi.org/10.3389/fpls.2022.800161
Merrick, L. F., Lozada, D. N., Chen, X., & Carter, A. H. (2022). Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.). Frontiers in Genetics, 13, 835781. https://doi.org/10.3389/fgene.2022.835781
Mühlenbein, H. (1990). Limitations of multi-layer perceptron networks—Steps towards genetic neural networks. Parallel Computing, 14(3), 249-260. https://doi.org/10.1016/0167-8191(90)90079-O
Oreski, D., Oreski, S., & Klicek, B. (2017). Effects of dataset characteristics on the performance of feature selection techniques. Applied Soft Computing, 52, 109-119. https://doi.org/10.1016/j.asoc.2016.12.023
Pabuccu, H., & Barbu, A. (2023). Feature Selection for Forecasting (arXiv:2303.02223). arXiv. https://doi.org/10.48550/arXiv.2303.02223
Pandey, A. K., Kumar, A., Dinesh, K., Varshney, R., & Dutta, P. (2022). The hunt for beneficial fungi for tomato crop improvement – Advantages and perspectives. Plant Stress, 6, 100110. https://doi.org/10.1016/j.stress.2022.100110
Pudjihartono, N., Fadason, T., Kempa-Liehr, A. W., & O’Sullivan, J. M. (2022). A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Frontiers in Bioinformatics, 2, 927312. https://doi.org/10.3389/fbinf.2022.927312
Robbiati, M., Cruz-Martinez, J. M., & Carrazza, S. (2023). Determining probability density functions with adiabatic quantum computing (arXiv:2303.11346). arXiv. https://doi.org/10.48550/arXiv.2303.11346
Rocha, A. V., Shamarova, E., & Simas, A. B. (2017). Improved residuals for linear regression models under heteroskedasticity of unknown form (arXiv:1607.07926). arXiv. https://doi.org/10.48550/arXiv.1607.07926
Saeys, Y., Abeel, T., & van de Peer, Y. (2008). Robust Feature Selection Using Ensemble Feature Selection Techniques. In W. Daelemans, B. Goethals & K. Morik (Eds.), Machine Learning and Knowledge Discovery in Databases (Vol. 5212, pp. 313-325). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-540-87481-2_21
Saibene, A., & Gasparini, F. (2023). Genetic algorithm for feature selection of EEG heterogeneous data. Expert Systems with Applications, 217, 119488. https://doi.org/10.1016/j.eswa.2022.119488
Sengupta, S., Basak, S., & Peters II, R. A. (2018). Particle Swarm Optimization: A survey of historical and recent developments with hybridization perspectives. Machine Learning and Knowledge Extraction, 1(1), 157-191. https://doi.org/10.3390/make1010010
Simeone, O. (2022). An Introduction to Quantum Machine Learning for Engineers (arXiv:2205.09510). arXiv. https://doi.org/10.48550/arXiv.2205.09510
Sisiaridis, D., & Markowitch, O. (2017). Feature Extraction and Feature Selection: Reducing Data Complexity with Apache Spark (arXiv:1712.08618). arXiv. https://doi.org/10.48550/arXiv.1712.08618
Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267-288.
van Wieringen, W. N. (2023). Lecture notes on ridge regression (arXiv:1509.09169). arXiv. https://doi.org/10.48550/arXiv.1509.09169
Vlasic, A., Grant, H., & Certo, S. (2023). An Advantage Using Feature Selection with a Quantum Annealer (arXiv:2211.09756). arXiv. https://doi.org/10.48550/arXiv.2211.09756
Wang, C.-C. J., & Bennink, R. S. (2023). Variational quantum regression algorithm with encoded data structure (arXiv:2307.03334). arXiv. https://doi.org/10.48550/arXiv.2307.03334
Wang, H., Hans-DietrichHaasis, Du, P., Xu, X., Su, M., Wen, S., Yue, W., & Zhang, S. (2021a). Adaptive Group Collaborative Artificial Bee Colony Algorithm (arXiv:2112.01215). arXiv. https://doi.org/10.48550/arXiv.2112.01215
Wang, X., Liu, J., & Liu, G. (2021b). Diseases Detection of Occlusion and Overlapping Tomato Leaves Based on Deep Learning. Frontiers in Plant Science, 12, 792244. https://doi.org/10.3389/fpls.2021.792244
Wang, Z., Dhakal, S., Cerit, M., Wang, S., Rauf, Y., Yu, S., Maulana, F., Huang, W., Anderson, J. D., Ma, X.-F., Rudd, J. C., Ibrahim, A. M. H., Xue, Q., Hays, D. B., Bernardo, A., St. Amand, P., Bai, G., Baker, J., Baker, S., & Liu, S. (2022). QTL mapping of yield components and kernel traits in wheat cultivars TAM 112 and Duster. Frontiers in Plant Science, 13, 1057701. https://doi.org/10.3389/fpls.2022.1057701
Williamson, H. F., Brettschneider, J., Caccamo, M., Davey, R. P., Goble, C., Kersey, P. J., May, S., Morris, R. J., Ostler, R., Pridmore, T., Rawlings, C., Studholme, D., Tsaftaris, S. A., & Leonelli, S. (2023). Data management challenges for artificial intelligence in plant and agricultural research. F1000Research, 10, 324. https://doi.org/10.12688/f1000research.52204.2
Wu, J., Ainsworth, E. A., Wang, S., Guan, K., & He, J. (2022). Adaptive Transfer Learning for Plant Phenotyping (arXiv:2201.05261). arXiv. https://doi.org/10.48550/arXiv.2201.05261
Xu, Z. E., Huang, G., Weinberger, K. Q., & Zheng, A. X. (2019). Gradient Boosted Feature Selection (arXiv:1901.04055). arXiv. https://doi.org/10.48550/arXiv.1901.04055
Xue, Y., Tang, Y., Xu, X., Liang, J., & Neri, F. (2022). Multi-Objective Feature Selection With Missing Data in Classification. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2), 355-364. https://doi.org/10.1109/TETCI.2021.3074147
Yang, Y., Wang, W., Fu, H., & Kuo, C.-C. J. (2022). On Supervised Feature Selection from High Dimensional Feature Spaces (arXiv:2203.11924). arXiv. https://doi.org/10.48550/arXiv.2203.11924
Zhang, C., Soda, P., Bi, J., Fan, G., Almpanidis, G., & Garcia, S. (2021). An Empirical Study on the Joint Impact of Feature Selection and Data Re-sampling on Imbalance Classification (arXiv:2109.00201). arXiv. https://doi.org/10.48550/arXiv.2109.00201
Zhou, X., Carbonetto, P., & Stephens, M. (2013). Polygenic Modeling with Bayesian Sparse Linear Mixed Models. PLoS Genetics, 9(2), e1003264. https://doi.org/10.1371/journal.pgen.1003264
Published
How to Cite
Issue
Section
Copyright (c) 2023 Enow Takang Achuo Albert, Ngalle Hermine Bille, Bell Joseph Martin, Ngonkeu Mangaptche Eddy Leonard
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.