Bediaga, Harbil; Maria Isabel Moreno; Sonia Arrasate; Jose Luis Vilas; Lucia Orbe; Elias Unzueta; Juan Perez Mercader and Humberto Gonzalez-Diaz

Computational models may help to reduce research cost by predicting properties of alternative blends. Nowadays, most efforts focus on prediction of a few properties for sets of gasoline samples. However, there are no reports of models able for classification of gasoline samples with multiple output properties measured in real life refinery plants. In this work, Information Fusion (IF), Perturbation Theory (PT), and Machine Learning (ML) algorithm (IFPTML) was used to model real production data with >230,000 outcomes gathered from a petroleum refinery plant. IF-pre-processing phase assembled the working dataset with 44 physicochemical output properties vs. 574 input variables of 4 production lines distributed in 26 data blocks including 14 different streams and 23 operations carried out in the plant. PT-calculation phase quantifies the effect of perturbations (deviations) in all input variables using PT Operators. Last, in ML-analysis phase involved Linear Discriminant Analysis (LDA) and Artificial Neural Networks (ANN) models training. IFPTML-LDA model presented AUROC = 0.936 with overall Sensitivity Sn and Specificity Sp approximate to 84-91% for training and validation sets. In internal control experiment we obtained an IFPTML-FT-NIR model with similar Sn and Sp approximate to 86-97%, for >25,000 values of 16 properties measured FT-NIR technique; demonstrating the robustness of the algorithm to changes on the experimental techniques used. This model could be useful for the design of new alternatives blends (biofuels, refuse-derived fuels, etc.) with lower environmental impact.