Psychometric Rigor Simplified: Understanding the Basics
by Roberta Hill

In my last column, I talked about the importance of ensuring that some psychometric rigor had been applied to the specific instrument that you planned on using.  I also encouraged you not to get caught up in the statistics of assessments. This time, I thought it would be important to provide an overview of the three elements that comprise psychometric rigor: reliability, validity and social desirability. People often ask, "How valid is this assessment?" The very term "valid" can be a misnomer. In most dictionaries, there are a number of definitions. Some of the aspects include:

  • Legal efficacy - This becomes an important consideration if the test is being used for selection and recruitment, but it is less important in developmental coaching.
  • Well-grounded, meaningful and relevant - Most assessments attempt to measure some aspect of "personality" based on a particular theory of human development and interactions.  If the underlying principles behind the tool are questioned, or if the tool is looking at something not pertinent to your clients, it may not be suitable for them.
  • Sound and convincing - No matter how "valid" a tool may be, if it is not perceived by the client as being correct, the information will usually be ignored.

What is Psychometric Rigor?

1.  Reliability (PRECISION) determines how well the items on a scale accurately reflect the scale itself.  In layman's terms, does it actually give the same results if taken by the same person on different occasions?

While there are a few different measures of reliability, the most common and easiest to measure is Test Retest Reliability or Stability. This refers to how well an instrument yields consistent results, usually over a time period of six months. Scores range between 0.00 and 1.00. For simplicity's sake, think of this as a percentage measurement of how often the same results will occur. Look for scores over the .75 range.

2.  Validity (ACCURACY) determines the extent to which the association among the scores represents the theory and model on which the instrument is based. In layman's terms, does it actually measure what it says it measures?

Validity establishes the confidence with which we can interpret any given result on a given test. Validity is obviously a much more complex question than reliability, and is ultimately a more important issue.  Unfortunately, determining accuracy is not only more complex but also much more expensive to determine than reliability is, and validity studies are not conducted as often as reliability studies are.  Note that there are at least five different types of validity:

  • Content Validity or Face Validity - a person's perception, usually from the appearance of the instrument, that it is likely to measure something meaningful.
  • Predictive Validity - measures what it claims to measure.
  • Concurrent Validity - relationship to some other concrete criteria.
  • Convergent Validity - results on individual instruments are compared to results obtained on other instruments that measure abilities in the same ability domain.
  • Divergent Validity - results on individual instruments are compared to results obtained on other instruments that measure abilities in different ability domains.

If this sounds too technical for you, don't worry. While I do understand most of what this means, it has been over 25 years since my last proper statistics class at university, and I no longer can decipher the mathematics myself. I just look to see what studies have been done and check whether or not the researchers claim that the results fall within acceptable norms.

Most assessment providers will claim high marks for content validity. This simply means that the person who took the instrument perceived that the results obtained were accurate. While not as relevant scientifically, this is an important component for coaches to consider. When using a tool, we want it to be personally meaningful and convincing for our clients, in the hope that they will be motivated to use the information provided.

3.  Social Desirability (TRANSPARENCY) determines how easy it is to fake the results. This third and final significant measurement component is very important when the assessment is used as a test for recruitment or selection. This measure will not apply to most of the assessments used by coaches, however, since it is not perceived as relevant for our uses. 

As a coach, it is helpful to understand what is meant by psychometric rigor and to know whether the pertinent studies have been conducted. It is not compulsory for us to be able to interpret or explain what the statistics mean. Most people, including coaches, just want to feel confident that the instrument is based on some sound scientific principles and that it will prove meaningful in practice.

Roberta Hill, MBA, is a Professional Certified Coach (PCC), as well as a Professional Mentor Coach (PMC) and Certified Teleclass Leader with Corporate Coach U International. Roberta owns, an online assessment provider with a network of more than 40 qualified coaches worldwide. Read more about Roberta in the WABC Coach Directory. Roberta may be reached by email at

Return to In This Issue.
Click here to view or print entire issue for easy reading.

Copyright (C) 2002-2006 WABC Coaches Inc. All rights reserved.