https://www.euro-online.org/enog/inoc2007/Papers/mac-slots.html https://www.euro-online.org/enog/inoc2007/Papers/m https://www.euro-online.org/enog/inoc2007/Papers/mac-slots.html

Calculation of SEC for PLS calibrations, n or n-k-1 as DF?

schultz's picture
Forums: 

Hi

Browsing through text books, norms, standards and this forum, consensus seems to be that for PLS models the SEC is calculated as

sec = sqrt(sum(y_i - y_i^hat)^2/df)

where df = n-k-1 (n=number of calibration samples, k=PLS factors, 1 if mean centered)

However when comparing to pupular software products, it seems consensus is to use df=n.

Unscrambler documents this in their Technical Reference, but as far as I have found this appears to be the standard.

The ISO 12099:2010 defines SEC as: "for a calibration model, an expression of the average difference between predicted and reference values for
samples used to derive the model
NOTE As for definitions C.3.4 to C.3.7, in this statistic, this expression of the average difference refers to the square
root of the sum of squared residual values divided by the number of values corrected for degrees of freedom, where 68 %
of the errors are below this value."

My interpretation of this definition is to use df=n, otherwise he 68% condition is not met.

Academic papers seem to agree that one PLS factor > 1 df, but differ in how to find that number bigger than 1.

 

So it appears the theoretical/statistical correct way is to use df=n-k-1; but that practical implementations use df=n.

Does anyone know why that is?

I have considered if it has to do with terminology, RMSEC vs SEC, but came to the conclusion that is not the case. In any case if anyone could enlioghten me it would be highly appreciated.

I might have missed an important point, whcih I would be happy to be made aware of.

 

Stay safe and enjoy the holidays ahead.

 

/jakob