The self-development and personal effectiveness market is full of personality tests or self-assessment instruments that claim to measure who we are, our character, identity, value and drivers of behaviour. These instruments are usually based on a theoretical framework, such as the Enneagram, supported by psychometric design principles. When coaches and practitioners choose an assessment or profiling instrument to use with their clients, it is important that they are able to trust the tool and its results and reliably act on the conclusions suggested by the data. This is especially true in a coaching scenario, where giving a client an inaccurate profile might be extremely disruptive and confusing.
Best practice validity and reliability guidelines enable product development houses and questionnaire designers to have confidence in the quality of the data that their instrument produces and therefore to minimise measurement error. These measures allow us to test and demonstrate that a diagnostic instrument is valid and reliable. A diagnostic instrument that is valid actually tests what it is intended to, and data that is reliable and true enables us to have confidence that we can draw trustworthy conclusions.
At Integrative Enneagram Solutions, we invest in rigorous statistical analysis and ongoing improvement of our iEQ9 Questionnaire to ensure that we continue to offer the highest standard of reliability and accuracy in our Enneagram testing. To complement our ongoing internal processes to improve test reliability and validity, we periodically contract independent statisticians to offer their review and analysis of our data.
We recently commissioned research psychologist and psychometrist Dr Liezel Korf to conduct an independent psychometric study of our iEQ9 data, for the purposes of investigating the reliability and validity of the iEQ9 Questionnaire.
Dr Liezel Korf is a Research psychologist with 27 years of research and assessment experience. She started as a researcher at the HSRC in the psychometric test development section. From there she moved to academia and lectured Research Methodology, Psychometry and Statistics at various South African universities for more than 10 years. In 2001 she started consulting while teaching part-time at various institutions. During this time, together with a network of associates, she provided psychometric assessment services to a large number of companies. She also assisted in conducting surveys for various companies, investigating issues such as culture and climate, safety, retention, employee satisfaction and customer satisfaction. Liezel has assisted more than 500 post-graduate students with the statistical analyses for their dissertations. She is still involved in part-time lecturing and acts as external examiner and moderator for a number of courses in the fields of Research Methodology, Psychology and Industrial Psychology. |
The iEQ9 statistical study
Our study drew on a sample of 5910 people across various organisations and industries to investigate the validity and reliability of the iEQ9 Questionnaire. All of the individuals included in the study were English speaking, aged between 21 and 65 years old, from various countries and industries such as Client Services, Medical, Insurance, Financial Services, and Technical Services.
The iEQ9 was subjected to a number of different analyses, which are summarised below. The full statistical report is available in the iEQ9 Statistical Manual, available from Integrative Enneagram Solutions.
Reliability analysis: Do the items in the iEQ9 measure consistently?
In psychometrics, reliability, or internal consistency, measures how consistently an instrument performs. This analysis reviewed how consistently and reliably the iEQ9 measures Core Enneagram Type and whether it is within an acceptable range of measurement error. This statistic measures internal consistency by examining how closely responses to questions for a given scale relate to each other. In other words, do people respond to similar items in a similar and consistent way?
The accepted measure of internal consistency is Cronbach’s alpha (Bryman and Bell, 2007). Values for Cronbach’s alpha can range from 0 to 1. Values of 0.80 are considered excellent, or ‘Gold standard’, values above 0.70 are considered ‘Good’ and values as low as 0.60 are considered acceptable for some purposes.
|
Cronbach Alpha |
0.80 – Gold |
|
0.73 – Good |
|
0.77 – Good |
|
0.73 – Good |
|
0.78 – Good |
|
0.78 – Good |
|
0.79 – Good |
|
0.84 – Gold |
|
0.81 – Gold |
Results: All iEQ9 scales for the nine Enneagram Types showed Cronbach Alpha’s values of above 0.70, indicating that these scales are highly reliable. The scales for the E1 Strict Perfectionist, E8 Active Controller and E9 Adaptive Peacemaker all achieved the gold standard with a value of over 0.80 for internal consistency.
The analysis also tested inter-item correlation. This measures whether all items that are theoretically related (e.g. all items linked to the Enneagram 4 profile) do indeed relate correctly and are free from error, without unknown or unwanted items correlating where this is not part of the profile.
The average inter-item correlation in a reliable scale should be between 0.1 and 0.5 (Clark and Watson, 1995). Item-total correlations measure whether items measure the same thing, i.e., whether scales are double-measuring. These values should not be too high – results above 0.8 suggest redundant items.
|
Average Inter-item correlation |
Mean Item-total correlation |
Ennea Strict Perfectionist |
0.291 |
0.480 |
Ennea Considerate Helper |
0.210 |
0.388 |
Ennea Competitive Achiever |
0.244 |
0.429 |
Ennea Intense Creative |
0.215 |
0.392 |
Ennea Quiet Specialist |
0,261 |
0.446 |
Ennea Loyal Sceptic |
0.256 |
0.442 |
Ennea Enthusiastic Visionary |
0.276 |
0.563 |
Ennea Active Controller |
0.344 |
0.535 |
Ennea Adaptive Peacemaker |
0.301 |
0.495 |
Results: Average inter-item correlations for all nine Enneagram Type scales fall within the acceptable range, well within the 0.5 limit. Item-total correlations were well within an acceptable range – only 5 of the 90 possible item-total correlations were below 0.3, while the highest value was 0.662 indicating low redundancy. These results suggest that the iEQ9 is reliable in its measurements.
Validity analysis: Do the items in the iEQ9 measure accurately?
The Enneagram is based on often-unconscious motivations, rather than behaviour – and unconscious motivations are difficult to measure accurately. This issue has led to low validity in many computerised Enneagram typing tools, which requires a sophisticated approach to measurement to overcome.
The statistical measure of validity refers to the extent to which a test serves its purpose and supports the conclusions we draw from its results. This kind of analysis explores how valid the questionnaire’s results are, asking questions like “how accurately do we measure Enneagram types with the iEQ9?” and “do the Types present in the way we expect them to, based on the model?” Construct validity indicates whether the test actually and accurately measures what it is supposed to measure.
Exploratory Factor Analysis is an accepted way of assessing and evaluating the validity of questionnaires like the iEQ9. Put simply, an exploratory factor analysis takes all of the items or variables from the iEQ9 tests and analyses their correlation. This analysis treats the data as if it were random and goes looking for the patterns within it, identifying clusters of items that seem to belong together. If the Enneagram model holds true and if the iEQ9 measures the nine Types accurately, then the pattern found in the data should match the theoretical model of the Enneagram.
The factor analysis’ best clustering turned out to be a nine-factor solution, or nine clusters which map well to the existing nine Enneagram scales. A secondary IRT (Item Response Theory) factor analysis took a different approach to the calculations and also found that the factors offered a good approximation of the Enneagram scales, with a high percentage of items that were intended to map onto a Type doing so. Further, there was limited cross-loading or items that registered on more than one cluster or scale.
|
Exploratory Factor Analysis 1 |
Exploratory Factor Analysis 2 (IRT) |
|||
Scale |
Factor |
Items loading on correct factor |
% of remaining items loading on one factor |
Items loading on correct factor |
% of remaining items loading on one factor |
Factor 5 |
8 |
100.00% |
8 |
100.00% |
|
Factor 9 |
8 |
88.89% |
8 |
88.89% |
|
Factor 6 |
9 |
100.00% |
9 |
100.00% |
|
Factor 8 |
6 |
60.00% |
9 |
90.00% |
|
Factor 4 |
8 |
88.89% |
8 |
88.89% |
|
Factor 7 |
8 |
88.89% |
8 |
88.89% |
|
Factor 2 |
9 |
90.00% |
9 |
88.89% |
|
Factor 1 |
9 |
100.00% |
10 |
100.00% |
|
Factor 3 |
7 |
77.78% |
6 |
60.00% |
Note: Factor analysis depends on the specific data (or people) included. Further analysis of different samples could yield different results.
Results: These statistics indicate that the items meet their intention and effectively measure the Nine Types as distinct, clear scales, and that the iEQ9 reflects the Enneagram model.
Correlation Analysis: Are the nine Enneagram Types independent from each other?
Correlation analysis checks whether the test adequately distinguishes between Types, or how much the different scales overlap. Although each of the nine Types is distinct, they share certain traits with other Types and so we may expect low to moderate correlation between Types.
The accepted measure is the correlation coefficient (r). Values of less than 0.1 are insignificant, values from 0.1 to 0.3 are ‘small’ and 0.3 to 0.5 ‘moderate’ correlation. Values over 0.5 are ‘large’ and may be problematic.
Correlation (r) |
|||||||||
1 |
|
|
|
|
|
|
|
|
|
.237 |
1 |
|
|
|
|
|
|
|
|
.389 |
.201 |
1 |
|
|
|
|
|
|
|
.204 |
.324 |
.136 |
1 |
|
|
|
|
|
|
.344 |
.065 |
.137 |
.268 |
1 |
|
|
|
|
|
.517 |
.188 |
.252 |
.312 |
.520 |
1 |
|
|
|
|
-.093 |
.159 |
.303 |
.297 |
-0.007 |
-.064 |
1 |
|
|
|
.293 |
-0.012 |
.481 |
.205 |
.078 |
.213 |
.300 |
1 |
|
|
.054 |
.304 |
-.126 |
.241 |
.370 |
.221 |
-.077 |
-.456 |
1 |
Results: The values indicate a mostly small to moderate correlation between Enneagram Types. This suggests that while some types have elements in common, each Enneagram type measured by the iEQ9 is adequately unique.
Large correlations were found between Types 5 and 6 (r = 0.520) and also between Types 1 and 6 (r = 0.517). These correlations may be explained by typing theory: Ennea Fives often share behaviours with Ennea Sixes, such as being intellectual, cautious, and planning. Similarly, Ennea Ones and Sixes share traits such as worry, diligence and a concern with rules and authority.
Conclusion
Computerised Enneagram tests are challenged to reach levels that are statistically valid and reliable, as they measure deep and often intangible aspects of who we are as people. Unreliable tools with low validity can cause many challenges, as they are more open to self-reporting biases and as a result, to mistyping clients.
At Integrative Enneagram Solutions, we believe that the creation, analysis and ongoing improvement of reliable, valid measurement instruments is part of our contribution and we continuously invest in testing and improving our tools.
These most recent results indicate that these ongoing efforts have effectively improved the validity and reliability of the iEQ9, exceeding our previous high standards. This indicates that the iEQ9 offers a reliable, accurate reflection by psychometric standards.