Measuring the Performance of Supervised Machine Learning Algorithms for Optimizing Wheat Productivity Prediction Models: A Comparative Study
DOI:
https://doi.org/10.53560/PPASA(60-4)820Keywords:
Model Optimizations, Machine Learning Algorithms, Prediction Models, Performance MeasurementAbstract
The issue of precise crop prediction gained worldwide attention in the midst of food security concerns. In this study, the efficacies of different machine learning (ML) algorithms, i.e., multiple linear regression (MLR), decision tree regression (DTR), random forest regression (RFR), and support vector regression (SVR) are integrated to predict wheat productivity. The performances of ML algorithms are then measured to get the optimized model. The updated dataset is collected from the Crop Reporting Service for various agronomical constraints. Randomized data partitions, hyper-parametric tuning, complexity analysis, cross-validation measures, learning curves, evaluation metrics and prediction errors are used to get the optimized model. ML model is applied using 75% training dataset and 25% testing datasets. RFR achieved the highest R2 value of 0.90 for the training model, followed by DTR, MLR, and SVR. In the testing model, RFR also achieved an R2 value of 0.74, followed by MLR, DTR, and SVR. The lowest prediction error (P.E) is found for the RFR, followed by DTR, MLR, and SVR. K-Fold cross-validation measures also depict that RFR is an optimized model when compared with DTR, MLR and SVR.
References
J.A. Polonsky, A. Baidjoe, Z.N. Kamvar, A. Cori, K. Durski, W.J. Edmunds, R.M. Eggo, S. Funk, L. Kaiser, P. Keating, O.P. Waroux, M. Marks, P. Moraga, O. Morgan, P. Nouvellet, R. Ratnayake, C.H. Roberts, J. Whitworth and T. Jombart. Outbreak analytics: a developing data science for informing the response to emerging pathogens. Philosophical Transactions of the Royal Society B 374: 1-11 (2019).
H. Yang. Building an Agile and Scalable Data Science Organization; (Chapter 3) In: Data Science, AI, and Machine Learning in Drug Development. Chapman and Hall/CRC (2022).
T.H. Davenport. From analytics to artificial intelligence. Journal of Business Analytics 1(2): 73-80 (2018).
Y, Nazarathy, and H. Klok. Statistics with Julia: Fundamentals for data science, machine learning and artificial intelligence. Springer Nature (2021).
B. Mahesh. Machine learning algorithms-a review. International Journal of Science and Research (IJSR) 9: 381-386 (2020).
C. Park, C.C. Took, and J.K. Seong. Machine learning in biomedical engineering. Biomedical Engineering Letters 8: 1-3 (2018).
N. Yadav, S.M. Alfayeed, and A. Wadhawan. Machine learning in agriculture: techniques and applications. International Journal of Engineering Applied Sciences and Technology 5(7): 118-122 (2020).
K. Patel, and H.B. Patel. A comparative analysis of supervised machine learning algorithm for agriculture crop prediction. Paper presented at the 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT) (2021).
A.M. Lad, K.M. Bharathi, B.A. Saravanan, and R. Karthik. Factors affecting agriculture and estimation of crop yield using supervised learning algorithms. Materials Today: Proceedings 62: 4629-4634 (2022).
K.G. Liakos, P. Busato, D. Moshou, S. Pearson, and D. Bochtis. Machine learning in agriculture: A review. Sensors 18(8): 1-29 (2018).
Y. Mekonnen, S. Namuduri, L. Burton, A. Sarwat, and S. Bhansali. Machine learning techniques in wireless sensor network based precision agriculture. Journal of the Electrochemical Society 167(3): 1-10 (2019).
E. Kamir, F. Waldner, and Z. Hochman. Estimating wheat yields in Australia using climate records, satellite image time series and machine learning methods. ISPRS Journal of Photogrammetry and Remote Sensing 160:124-135 (2020).
M.S. Ud-Din, M. Mubeen, S. Hussain, A. Ahmad, N. Hussain, M.A. Ali, A.E. Sabagh, M. Elsabagh, G.M Shah, S.A. Qaisrani, M. Tahir, H.M.R. Javeed, M.A. H. Ali and W. Nasim. World nations priorities on climate change and food security. Springer pp. 365-384 (2022).
M. Islam, and F. Shehzad. A prediction model optimization critiques through centroid clustering by reducing the sample size, integrating statistical and machine learning techniques for wheat productivity. Scientifica 1-11 (2022).
D. Elavarasan, and P.D.R. Vincent. A reinforced random forest model for enhanced crop yield prediction by integrating agrarian parameters. Journal of Ambient Intelligence and Humanized Computing 12: 10009–10022 (2021).
M. Islam, F. Shehzad, A. Qayyum, M.W. Abbas, and R. Siddiqui. Growth analysis of production of food crops and population growth for Food Security in Pakistan. Proceedings of the Pakistan Academy of Sciences: B. Life and Environmental Sciences 60(1): 83-90 (2023).
P. Feng, B. Wang, D.L. Liu, C. Waters, and Q. Yu. Incorporating machine learning with biophysical model can improve the evaluation of climate extremes impacts on wheat yield in south-eastern Australia. Agricultural and Forest Meteorology 275: 100-113 (2019).
O. Erenstein, M. Jaleta, K. Sonder, K. Mottaleb, and B. Prasanna. Global maize production, consumption and trade: trends and R&D implications. Food Security 14(5): 1295-1319 (2022).
P. Giraldo, E. Benavente, F. Manzano-Agugliaro, and E. Gimenez. Worldwide research trends on wheat and barley: A bibliometric comparative analysis. Agronomy 9(7): 1-18 (2019).
R. Ghimire, H. Wen-Chi, and R.B. Shrestha. Factors affecting adoption of improved rice varieties among rural farm households in Central Nepal. Rice Science 22(1): 35-43 (2015).
S. Ali, Y. Liu, M. Ishaq, T. Shah, A. Ilyas, and I.U. Din. Climate change and its impact on the yield of major food crops: Evidence from Pakistan. Foods 6(6): 1-19 (2017).
A. Chlingaryan, S. Sukkarieh, and B. Whelan. Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Computers and Electronics in Agriculture 151: 61-69 (2018).
D.A. Bondre, and S. Mahagaonkar. Prediction of crop yield and fertilizer recommendation using machine learning algorithms. International Journal of Engineering Applied Sciences and Technology 4(5): 371-376 (2019).
L.S. Cedric, W.Y.H. Adoni, R. Aworka, J.T. Zoueu, F.K. Mutombo, M. Krichen, and C.L.M. Kimpolo. Crops yield prediction based on machine learning models: case of west African countries. Smart Agricultural Technology 2: 1-14 (2022).
S. Mohapatra, and N. Chaudhary. Statistical analysis and evaluation of feature selection techniques and implementing machine learning algorithms to predict the crop yield using accuracy metrics. Engineered Science 21: 1-11 (2022).
V. Gudivada, A. Apon, and J. Ding. Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. International Journal on Advances in Software 10(1): 1-20 (2017).
A. Dogan, and D. Birant. Machine learning and data mining in manufacturing. Expert Systems with Applications 166(2): 114060 (2021).
M. Alloghani, D. Al-Jumeily, J. Mustafina, A. Hussain, and A.J Aljaaf. A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science. In: Supervised and Unsupervised Learning for Data Science. Springer pp. 3-21 (2020).
J.J. Salazar, L. Garland, J. Ochoa, and M.J. Pyrcz. Fair train-test split in machine learning: Mitigating spatial autocorrelation for improved prediction accuracy. Journal of Petroleum Science and Engineering 209: 109885 (2022).
J. Tan, J. Yang, S. Wu, G. Chen, and J. Zhao, A critical look at the current train/test split in machine learning. arXiv preprint arXiv:2106-04525 (2021).
M. Islam. Integrating Statistical and Machine Learning Techniques to Predict Wheat Production in Pakistan. Ph.D. Thesis, Department of Statistics, The Islamia University of Bahawalpur, Pakistan (2022).
P. Chakraborty, S.S. Rafiammal, C. Tharini, and D.N. Jamal. Influence of bias and variance in selection of machine learning classifiers for biomedical applications smart data intelligence. Proceedings of ICSMDI, Springer: 459-472 (2022).
A. Vabalas, E. Gowen, E. Poliakoff, and A.J. Casson. Machine learning algorithm validation with a limited sample size. PLoS One 14(11): e0224365 (2019).
L. Li, K. Jamieson, A. Rostamizadeh, E. Gonina, J. Ben-Tzur, M. Hardt, B. Recht and A Talwalkar. A system for massively parallel hyper parameter tuning. Proceedings of Machine Learning and Systems 2: 230-246 (2020).
F. Osisanwo, J. Akinsola, O. Awodele, J. Hinmikaiye, O. Olakanmi, and J. Akinjobi. Supervised machine learning algorithms: classification and comparison. International Journal of Computer Trends and Technology (IJCTT) 48(3): 128-138 (2017).
B.T. Jijo, and A.M. Abdulazeez. Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends 2(1): 20-28 (2021).
M. Maduranga, and R. Abeysekera. Treeloc: An ensemble learning-based approach for range based indoor localization. International Journal of Wireless and Microwave Technologies (IJWMT) 11(5): 18-25 (2021).
K. Rawal, and A. Ahmad. A comparative analysis of supervised machine learning algorithms for electricity demand forecasting. Paper presented at the 2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T): (2022).
F. Farooq, A.M. Nasir-Amin, K. Khan, M. RehanSadiq, M.F. Javed, F. Aslam, and R. Alyousef. A comparative study of random forest and genetic engineering programming for the prediction of compressive strength of high strength concrete (HSC). Applied Sciences 10(20): 7330 (2020).
H. Drucker, C.J. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. Advances in Neural Information Processing Systems 9: 155-161 (1997).
A.J. Smola, and B. Scholkopf. A tutorial on support vector regression. Statistics and Computing 14(3): 199-222 (2004).
R.T. Nakatsu. An evaluation of four resampling methods used in machine learning classification. IEEE Intelligent Systems 36(3): 51-57 (2020).
B. Bischl, O. Mersmann, H. Trautmann, and C. Weihs. Resampling methods for meta-model validation with recommendations for evolutionary computation. Evolutionary Computation 20(2): 249-275 (2012).
G. Afendras, and M, Markatou. Optimality of training/test size and resampling effectiveness in cross-validation. Journal of Statistical Planning and Inference 199: 286-301 (2019).