PLS validation option using Thermo Method Generator software

JuanG's picture

Dear all,

Since few time I have to use the Method Generator software to build PLS model on a handheld NIR. The software is easy to use, for someone wich a basic chemometrics background. However, when we start to see in detail the calculations, it is not easy to find this information.

My question concerns the last step to generate the PLS model file, if you are familiar with the software you know that there is an option in the last Model parameter pop-up windows called PLS Validation (see image attached).

This option if I well understood has same aim than HotelingT2 and Xresiduals. In this software the statistics used are colled "scores (stdev)" and "Resid (stdev)". Based on that I understand it is the Standard deviation of scores (which ones?) and standard deviation of residuals. However, the default values are quite far of values I expected, for example software propose 5 for scores (stdev) and 15 for Resid(stdev), which is quite huge compared to my results in my data set. Of course depending of these limits the predicted spectra can be considered as valid or invalid, so they are quite important paramenter.

I tried to find how these values are calculated, but I dind not find any literature explaining how these values are calculated.

Hope someone in the community can help me to understand how these values are calculated, because in the Thermo user manual and other information there is nothing about.

Thank you in advance

Best Regards,

Juan G.



Uploaded Images: 
russell's picture

It's perhaps difficult for a new practicioner to digest, but here's a link to a classic publilshed paper on the subject.


JuanG's picture

Thank you for your post russell,

the link you sent is useful and one of the pilars in Chemometrics, but I am familiar with PLS and how everyothing is calculated, that is fine, my question is specific to this software and the terma mentioned to define the limitis.

When we use The Unscrambler, PLS toolbox, SIMCA or Matlab we know how limits are caluclated because there use quite standard and information is accessbile. But in the case of this software (Method Generator) there is no information how these limits are calculated. In the user manual only is mentioned that if the default value does not work, increase them, but based on what?. For example, for me to have 15 in the limits of residuals is completly strange, so this 15 problably means other thing and not the clasical residuals, and this is that I am triyng to understand.



miguelG's picture


I cannot find the manual of that software, is the manual available online?

I think those limits, the 5 for scores and 15 for residues, are related to outliers: when the validation scores are higher than 5 times the stdev of the calibration the NIR spectrum is considered an outlier, although the value seems a bit high.

This is just my guess and can be far from the truth! If you can find the manual it might help, since I have never worked with that software.

JuanG's picture

Hi Miguel,

Thank you for your response, yes definetly this values are to detect outliers, I would also think something like that you said, it could be 5 times limits of std or something similar?. The problem is there is nothing in the user manual. I attached the pages concerning the PLS method. These limits are shortly mentioned in the 3rd page and sentence in yellow. As you can see, there is nothing that can explain how to calculate and use them.



Uploaded Images: 
hlmark's picture

This is also a guess, but sometimes the software is written so that it does all the calculations correctly, but the programmer failed to include the final Square Root, so that the result displayed and printed is the square of the actual standard deviation. But since you have the values of the scores and residuals, you can check the calculations by computing the respective standard deviations yourself, either with a short program that you write, or even by hand with a calculator. You have to be careful about typos on the data entry, but if your results match the computer's result to 4 or 5 places or more (either with or without the square root computation), then you cna be pretty sure that you know what they did.

miguelG's picture

Hi again,

At the beginning of page 27 there is a reference to standard deviation "point to point standard deviation". It might be related to that plot. Another guess.

Follow Mark suggestion, it is a good advice.