Abstract:
Machine learning models to rapidly quantify lignocellulosic multi-feedstock composition were developed using partial least squares regression (PLSR) and artificial neural networks (ANN) trained using augmented spectroscopic dataset (ASD). ASD helps to overcome spectral sample size limitations. ANN model outperformed PLSR models in predicting biomass composition. Moreover, the optimized ANN models, developed using ASD on lignocellulosic biofingerprint region with 1051 variables, demonstrated exceptional performance with a coefficient of determination (R2) of 99.21, 99.27 and 99.23 % for predicting cellulose, hemicellulose, and lignin composition. Interestingly, ANN models developed with only 68 spectral peaks in the same spectral region identified based on a peak identification algorithm developed earlier, exhibited very similar performance in predicting cellulose, hemicellulose, and lignin composition with an R2 of 99.16, 98.98, and 99.10 %. These findings demonstrate the applicability of ANN models for rapid composition analysis in multi-feedstock and would aid in biomass conversion to fuels and chemicals.