jjakhm's picture

I’m sure every member of this forum, including me, is grateful to be a beneficiary of your (and others) experience. I can certainly appreciate it since the date of your publication would’ve found me either playing Super Mario Bros. on the couch in my NY Yankees pajamas or building snow castles and having snowball fights in the backyard (I stopped building snow castles for personal entertainment quite a while ago… no comment on the snowball fights or Super Mario Bros.). I certainly wasn’t paying attention back then. But I digress and hopefully the moderators won’t scold me… Anyway, this forum and its patrons are a very nice resource and certainly help to advance the field.
In reading your paper in conjunction with the PLSplus user’s guide it has become more clear that the GRAMS implementation of RMSG was intended, as you said, to correct for group size within the principle component factor space (analogous to group size in wavelength space in your paper). Particularly, given the description in the manual, it seems that the implementation would make sense if it was intended for 1 PCA model (one f-dimensional factor space) having been constructed from several (n) classes of compounds. In this approach, it may be conceivable that n RMSGs would be calculated for each of n individual classes; and each used separately to normalize the unknown’s M-distance in n comparisons. However, I don’t believe there is a way to calculate an RMSG for each class in a single PCA model within GRAMS. In fact, there doesn’t appear to be a way to specify classes within a single model (at least in my version). So, one RMSG value is calculated and utilized per PCA model for discriminant analysis. Ideally then, a separate PCA model would be built for each class. Therefore, it would seem that the implementation was incomplete, misunderstood, or accomplished for another reason. Regardless, it may be serendipitous since the normalization could possibly account for the M-distance dependence on PC factors and calibration samples (e.g. all things being equal, a model having more factors and/or fewer samples will also necessarily have a greater average M-distance)…
I have two last questions (ignoring the way things are done in GRAMS):
Would you agree that the RMSG normalization is a reasonable way to adjust for the number of factors in a PCA model (model rank) utilized with M-distance classifier < some number for discriminant analysis?
Given an approach where RMSG is calculated for one PCA model for a single class (hence, the relationship holds, and is all inclusive for the single class, that sum Di^2 = # variables or factors), should one skip calculating RMSG via sums and just calculate the square root of the number of PCA factors to improve numerical precision and avoid propagating arithmetic uncertainty?
The link to the old forum discussion:
I did some digging around and found that your paper, among some others, was cited in the manual (the pages of my manual were separated some time ago). I don’t know if this is proper attribution or not, but I thought you might be more happy to know this.