Dissertation Update #3: The Pilot

Hi! Thanks for reading the second to last (penultimate) update of my dissertation study. My dissertation study concluded March 2023 and I successfully defended April 2023… so this post is a little delayed. But as they say, “better late than never.” 😆


In the first post I talked about how difficult psychological scale development is and in the second post I discussed the importance of conducting ✨cognitive interviews✨ as part of the scale development process. Now that I am on the other side of this research study, I stand behind the sentiment that scale development is hard and the importance of ✨cognitive interviews✨. If you’re not including ✨cognitive interviews✨ in your scale development process, you’re doing it wrong and should reconsider.

The Pilot

After my last post, I piloted the scale to be tested with middle and high school students (n = 61). With the sample data, I performed an item analysis using Classical Test Theory to examine item endorsement and discrimination. While high or low item endorsement is not “bad” 🙅‍♀️, following latent trait theory, items with a low or high endorsement index should be further examined 🔎 to understand if the items are adequately measuring students with low/high latent traits. In other words, there are no true cut off points or index guidelines to a favorable endorsement index. If you have items that have high/low endorsement, you should heavily rely on your understanding of the theory behind the latent trait you’re measuring, your population, and the findings from the previous phases in the scale development process (e.g., literature review, expert review panel, ✨cognitive interviews✨).

Let’s Talk About 🚩Red Flags🚩 (not those)

Based on the item analysis, items were flagged due to high item endorsement (over 50% of students selected ‘strongly agree’ for these items) and were cross checked with performance difficulties from the expert review panel as well as the ✨cognitive interviews✨. Items that had performance issues across these data collection stages were removed from moving onto the final scale development phase (more on this next blog post). Additionally, based on the item analysis, 1 item was removed since it poorly discriminated (or differentiated) between students who, theoretically, had high levels of the latent trait and those who had low levels of the latent trait.

For example, if we had an item (or question) with a poor discrimination index that intended to measure empathy on a pretend scale that measured “Empathy”, then we have reason to suspect that this item did not do a good job at measuring empathy for students who scored high on the total 💗Empathy Pretend Scale💗 (those who are highly empathetic individuals) and those who scored low on the total 💗Empathy Pretend Scale💗 (individuals with low empathy). If the item/question cannot differentiate between individuals with high and low levels of empathy, then it’s not doing a good job at measuring the latent trait, empathy, and therefore should be removed from the scale because the whole point of having an 💗Empathy Pretend Scale💗 is to… well… measure empathy. You know what I mean? Great.

Other statistics from the pilot scale were also examined such as the inter-item correlation and Cronbach’s alpha to “test the waters” so to speak in how the items are “hanging together” before entering the final scale development stage and conducting an exploratory factor analysis. Since we aim to develop a scale that measures a latent construct, we want the items with a strong, but not too strong, inter-item correlation coefficient. Kind of like the porridge in the story of Goldilocks and the three little bears; something that is just right 😇. If the correlation is too strong, then there might be redundant items measuring the latent trait and if it’s too low, then there may not be a latent trait being measured. Overall, we want a correlation that demonstrates items are homogeneous but have sufficient unique variance. So not too high, not too low, something that is justtttt right.


That concludes the pilot scale recap of my dissertation study. Stay tuned for the final blog post of the dissertation series (and eventually my published study).

Sarah Narvaiz, PhD
Sarah Narvaiz, PhD
Research Scientist

My research interests include survey research, psychometrics, and QuantCrit theory.