Abstract
Owing to the complicated biomass characteristics and a variety of operating parameters, it is challenging to predict the bioethanol yield (Ybeth, %) from various agricultural wastes by consolidated bioprocessing with a microbial consortium. In this study, Gaussian Process Regression (GPR) and Artificial Neural Networks (ANN), which are powerful supervised machine learning models, were employed as predictive models that can be used to estimate bioethanol yield from various agricultural wastes. Ninety-six experimental data points obtained from the literature were preprocessed to remove noise or outliers from the dataset. The Regression Learner App in MATLAB 2021a was used on the refined 50 original data points with parallel computing and cross-validation, and the best model was selected. The squared exponential GPR model gave the best training and testing results, with R2 approaching 1, RMSE, MSE, and MAE approaching 0, the lowest training time, and the highest prediction speed. A larger dataset generally provides more opportunities for the neural network to learn and improve its performance. Therefore, 3500 synthetic data were generated with 35 original seed data using Gretel ACTGAN, which was preprocessed using assumptions from the seed data, reducing it to 1,615 data points. For the ANN model, the MSE and regression R for the refined synthetic dataset (1,615 data points) trained model were close to 0 and 1, respectively. Since consolidated bioprocessing is an economical method of producing bioethanol, further development using machine learning methods will aid in predicting and optimizing the best conditions required for greater yields.
Get full access to this article
View all access options for this article.
