Quantitative Calibration-->no result by Prediction

Zahra's picture

I performed the quantitative calibration (PLSR method) by NIR Spectrometer (Pharmatest IM 100) to measure  the concentration ( in range of 0.02-0.5 mg/ml) of Resveratrol  als drug in an Ethanol-Water (1:2) solution . The value of  Multiple correlation coefficient is 0.96 and standard error of estimate is 0.05. I performed the Prediction for new known solution to test the  calibration but the result is totally unacceptable and there is a big difference between the predicted values and the actual values and there isn’t any chance of prediction. 

I guessed maybe the calibration is overfitted and I tried to carry out the calibration again by smaller range of Wavelength but the result was the same, any one have an explanation?


Hope someone can help me.

Thank you in advance for you answer.


td's picture

Hello Zahra,

Yes you may have over fitted; how many calibration samples did you use and how many factors did you retain?

The correct way of doing PLSR is to have three sets: training set, factor selection set and validation set. You run the calibration and test it with the factor selection set. As you increase the number of factors the calibration will improve until it reaches a minium value. You should use the calibration produced using one factor less than the number that gave the minium. Then you test that calibration with the validation set. IF you are restricted in the number of samples avaiable you can use "cross-validation" to replace the factor selection set. The final validation samples must be samples that are not used in the calibration.

Are there other ingredients in your calibration samples? Are all the solution clear? Are the solution temperatures the same? Whose software are you using?

Hope this helps.

Best wishes,



Zahra's picture

Thank you for taking the time to answer me.

I made 3 separate stock solutions with same concentration (0.5 mg/ml Resveratrol in Ethanol-Water (1:2)). Then I made 11 different solutions from each of  them with range of concentration between 0.002 and 0,5 mg/ml (totally 33 samples). I took 25 spectra from each samples. 

The number of factors is 9. 

I used just Resveratrol, Ethanol (absolute) and sterile Water to make the solutions and the solutions seemed clear. For the final validation I used the spectra that were not in the calibration set and were from another new stock solution.

I didn’t measure the temperature during the experiment. I'm using SL Calibration Workshop of SensoLogic.

jcg2000's picture

Regarding predictive modeling, one thing many people ignore or are not ware, is that the samples you put in the calibration must be representive of the samples you want to make prediciton for. You are not going to get good prediction if you use model developed with white cats to predict black cats. 

Jerry Jin

hbrookechem's picture

I think we would need some more information to truly help troubleshoot this issue. How many factors are you using, how many calibration samples, what sort of pre-processing are you using, etc. On the surface, however, it does sound like an overfitting situation.


Zahra's picture

Thank you for taking the time to answer me. 

I made 3 separate stock solution with same concentration (0.5 mg/ml Resveratrol in Ethanol-Water (1:2)). Then I made 11 different solutions from each of  the stock solutions with concentration between 0.002 and 0,5 mg/ml (totally 33 samples). 

I took 25 spectra from each samples.

The Absorbance was used as Transformation for the spectra.The number of factors is 9 and Wavelength range for the calibration is from 1042 to 1873 nm.

jcg2000's picture

The concentration you tried to measure seems too low to be measured by NIR. The model you developed may not taget the API, but some excipient. Check the loadings of your PLS factors to see they are similiar to your API spectrum. 


Jerry Jin

Zahra's picture

Thank you for taking the time to answer me. 

Can you please tell me, what you mean by API ?

dwhopkins's picture

Hi Zahra,

Clearly, a calibration with a SEC value of 0.05 is not going to have any value below 0.05 mg/mL and little value at 0.1 mg/mL.  You need to look at the spectrum of one of the most concentrated samples, and compare the Absorbance to the instrument noise, and see whether the low concentrations have any chance of being detected.  I doubt that you can detect the resveratrol at the low concentrations you have selected.

Is there any chance you could use the UV?

Best wishes,

Dave Hopkins


Zahra's picture

Thank you for taking the time to answer me. 

Can you please tell me, how I can get instrument noise ? and I would like to know more about  ,how to interprete the value of SEC.

Using NIR is part of my master's thesis therfore I can't use UV.


Zahra's picture

Here is my Calibration's Protocol. 

Uploaded Files: 
dwhopkins's picture

Hi Zahra,

The SEC value is essentially the standard deviation of the differences between the predicted values and the reference values for the calibration sample set.  Therefore, the rules for the distribution of these differences apply, 95% of the observations should be within 2*SEC of the expected values, and 99% within 3*SEC.  I recommend you look up the equations for SEC, SEP, and RMSEP in a standard text such as Naes, Isaksson, Fearn and Davies, "A User-friendly guide to Multivariate Calibration and Classification", and study the section on PLS.

One easy way to look at the Absorbance noise is to form the difference of 2 scans of a blank spectrum (no analyte, or even better, no cuvette).  Take the standard deviation of the readings over various wavelength intervals, and you will discover that the noise is different over different spectral ranges.  This is one way to measure the noise.  A better way is to scan the blank 10 times, and calculate the SD at each wavelength for these 10 scans.

You can also calculate the SD at each wavelength for each of your 25 repeats of the 11 samples.  That would show you how the noise varies over your spectra.

I would say that 9 factors for your calibration using 11 concentration levels is far too many.  It is not sufficient to increase the number of spectra in your calibration set by including such a large number of repeats.  We might expect that only 2 factors would be allowed, since you have solutions of a single analyte in a mixed solvent, created by dilutions from the same stock sample.  I suggest that you observe the plots of the factors, and use only those that look like reasonable pseudo spectra, with minimal noise spikes.

Best wishes,

Dave Hopkins



dardenne's picture

Hi Zahra,

Could you provide the data? Calibration and validation sets. I like the challenges! 

Several experts could play with and give you an actual answer. Better than only words!

As Jerry said the concentrations are very low. 500 ppm at the maximum. Some instruments can reach that, but not all.

As David said, with 3 components, in theory 2 wavelengths (or 2 terms) would be enough.  

Looking on the internet about the instrument you used, it seems the scans are taken in reflexion mode. Is that correct?


hlmark's picture

Zahra - a point that has not been addressed is the question of the experimental conditions. Extraneous environmental variations, such as temperature changes, vibrations, etc. have been known to play havoc with analytical methods, especially when the concentrations are as low as you want to measure. Are you sure you have kept all such variations to a minimum, and are you certain that the conditions for the predictions are the same as for the calibration measurements?






sabki's picture

Hi Zahra,

You must prepare for NIR Calibration with clear of step by step :

  1. n of samples for calibration (representatif)
  2. n of samples for validation  (representatif)
  3. check standart error of referent data (SEL)
  4. SOP when calibration must the same when prediction.
  5. Using low pls factor will reduce Error for prediction.
  6. Please check the RPD (Ratio performance Deviation) after NIR calibration (standard error reference data/standard error prediction)
  7. What is the software of NIRs in your Lab?



gabiruth's picture

We are talking about 0.5 mg/ml to 0.02 mg/ml - that is 0.5/1000 mg = 0.0005x100 = 0.05% down to 0.02/1000 = 0.00002X100 = 0.002%. Before anyone bothers to do a calibration the minimum I would expect is to do the following:
1. Collect spectra from a pure active.
2. Collect spectra from a 50% solution, and check to see if the interaction of the OH groups in the phenol with the water affects their peak position and if so, how. Although not expected, the position of the C-H peaks may shift also due to difference in the charge on the O in the phenol.
3. Prepare a solutions with 5, 1, 0.5 and 0.1% and follow the decrease in the peaks of the reservatrol - verifying that you can still detect these
peaks. If these or some of these peaks are still detectable over the noise in 2nd derivative, there is a reason to start going down to 0.05 and verifying that these peaks are still detectable above the noise.
If you don't see these peaks with suficient strength above the noise, most likely you will waste your time.
4. If you insist on runing regressions regardless of what you find in above preliminary study - and you do get some regression - verify by checking the Loading Weights of each PC to see if you can identify correlation to the peaks.
5. If you get a regression with SEP (SECV) of 0.05 and your range is 0.05 to 0.002 - what is the expectation?

My advice to newcomers is simple - the last thing you want to start with is any kind of regression - the first thing you should start with is the most down to earth - explore and invetsigate the spectra. PLS is just a mathematical tool box - it doesn't know what it is using to create a correlation, it has no chemistry knowledge, it doesn't use "logics" to calculate. It is like a robot. To properly deploy a robot, you need to know its capabilities and limitations. If you ignore its limitations - you will waste your time.

I would like to draw attention to another case where lots of money can be wasted in using chemometrics - this is the case of aqueous solutions of glucose, sucrose and fructose. These three sugars have distinct different spectra in the crystalline form. However, once in solution - there is no way to tell them apart. It is possible to create perfect regressions for all three in solutions containing different proportions of all three, but you will never predict correctly. The best way to understand is to create a set of solutions with different proportions, create the regressions and then analyze the LW of all PC's. You will find them to be such that they will be mirror picture of each other.
I have learned this long time ago when I was a real rookie. Lessons learned the hard way aren't forgottten.

rcmartins's picture


After gabiruth suggestions, I would only poin out that out of the box PLS software should only be used if: i) solutions are simple, or compounds to quantify are the majority of the sample; ii) very significant bands are observed; and iii) analytical accuracy is not the top priority.

If those conditions are met, PLS modelling is good enough for your expectations.