Skip to main content

Coming Back to an Old Idea: Concept Analysis using Natural Language Processing

Like clothing styles, ideas in the academy come into and out of fashion. I wrote a paper back in 2012 that attracted very little interest at the time, even though it was funded by a small grant from a National Institutes of Health center and published in a top-tier nursing journal. The paper has been cited only 4 times, which is not much, the last time in 2018. Literally no one has asked me about this paper in the past 10 years. I liked it at the time, but until recently it seemed like a dead letter. Despite having no new citations, though, I have started to get notifications from one of the scholarly tracking services that my article is being read again in the past 2 years -- and read more than it ever was back when it was new! Here's an "interest" graph from ResearchGate (the green line), which is based on readership and downloads rather than just the traditional citations metric:


The current interest score isn't all that high, but it's still high enough to put my 11-year-old article into the top half of all papers published in 2012. I should be clear that this is not the normal pattern of knowledge generation. Health sciences students are typically encouraged to consider only those publications that were written during the past 10 years. The only exception is for articles that are "classics," so influential that they continue to define the field. An aging article, particularly one that wasn't often cited even back when it was first released, usually just meanders off into the dusty journal stacks of the library and is never heard from again. So what's going on here?

To answer this question I need to tell you what the article was about. You may be familiar with the idea of a "mind map" or "concept map," which is a graphic organizer tool used in business or education. The picture at the top of this blog post gives you an example. This type of representation is intended to show what a particular term means (i.e., a concept), and how it relates to other terms or concepts. Some relationships may be definitional: When we say A, we expect the attributes X, Y, and Z to be part of it. Other relationships may be correlational or causal: When we say A, that concept tends to be connected to other concepts B, C, and D. Other concepts can be connected to the concept of interest as causes (antecedents), as results (consequences), or ways of measuring the concept (referents). My kids were taught how to create a mind-map diagram in elementary school. This type of visual mapping can help us to better understand our own ideas, organize content for writing, and prompt creativity.

In the discipline of nursing, a formal method called concept analysis is used to describe concepts and the relationships between them. Concept analysis is a tool for literature review, where researchers search relevant journals to find articles that use the conceptual term of interest. Then they compile lists of related concepts, and decide whether those related terms represent (a) attributes of the concept of interest, (b) antecedents, (c) consequences, or (d) referents. The visual mind-mapping tool isn't traditionally part of nursing concept analysis, but the goal of the procedure is very similar. The main benefit of concept analysis is that it helps a researcher to be clear about what they are studying, and avoid potentially duplicative work if someone else has been studying the phenomenon of interest under a different name. (Psychological research has an unfortunate history of creating new names for the same basic idea and studying it all over again. I wrote about the pre-paradigmatic state of our science and the need for a "periodic table of behavior" here). The identification of antecedents and consequences can also help researchers to decide what variables they need to measure when they are designing a study.

So now, back to the question of why my past-its-prime paper is suddenly getting a second look: My colleagues and I were writing for a "novel methods" call in the journal Nursing Research, about a project that we had done under a small grant to connect health researchers on the CU Anschutz campus with computer scientists on the CU Boulder campus. Our goal was to conduct a concept analysis in a different way, by looking at all the times that a particular concept word was used in connection with other concept words in a published article. The linkages between words were purely statistical -- e.g., "self-management" was X times more likely to also appear in the same article with "diabetes" than it was to appear in the same article with "alcohol use." The approach was called latent semantic analysis (LSA), and is a technique within the broader category of computational science approaches called natural language processing (NLP). Ah, now we're getting to something that might seem familiar!

LSA is one possible way of quantifying the statistical likelihood of one word appearing in the context of others, but it's not the only way. In 2017, Google researchers published a new approach for calculating word-to-word associations that they called a "transformer." It's more context-aware than LSA models, and it also makes several guesses at the same time to produce a more accurate overall response. Transformer methods are the "T" in "ChatGPT" -- a generative pre-trained transformer system that predicts words from other words. The system that we used in our article, called the Inter-Nomological Network (INN, no longer available online) was essentially a smaller, weaker version of ChatGPT that allowed us to predict the likelihood of one conceptual term from another. 

This, then, is the explanation for our dead-letter paper's apparent resuscitation: GPT models are a hot topic in 2023! As an early example, using a previous generation of technology, our paper may be of renewed interest because it shows that purely automated methods can in fact produce a decent version of a concept analysis or a mind map. That could be valuable to researchers, because a traditional concept analysis is a useful step early to take early in a new program of research (it's often an assignment in first-semester doctoral nursing courses), but it requires a lot of time and effort to do well. Our experience also showed that statistical approaches to concept analysis can identify the meaning of concepts across disciplinary lines, which can help to overcome the problem of different groups of scientists using different words to mean essentially the same thing. And after we screened the results to weed out terms that didn't seem applicable, we found that the statistical model had largely replicated the results that we got using more traditional literature review methods. It even came up with some relevant terms that we missed in the usual human-intensive approach.

Our early effort using the INN had some weaknesses. First, after the language model produced a set of associations between words, it was still up to our research team to make sense of them. We manually classified each term as an attribute, antecedent, consequence, or referent of our concept of interest, which was the "transition to self-management" for patients with type 2 diabetes. The INN wasn't capable of providing structured results in the way that current large language models do (e.g., "define self-management in the form of a haiku" or "define the attributes of self-management, presenting your response in the form of a Socratic dialogue"). Second our LSA-based model "hallucinated" some results, identifying terms that were clearly unrelated to self-management (e.g., business management terms like "organizational culture," and words related to other life transitions like "retirement"). Hallucinatory answers are also a well-known problem in current large-language models. Finally, our concept analysis was based on a relatively small training dataset: 10 years of articles from the top 3 nursing journals, plus some business literature that was already included in the INN. Although adequate for LSA calculations by the standards of the time, this was in no way a "large" language model by 2023 standards, and our results were accordingly limited. 

What's the future of automated approaches to concept analysis? Our 2012 methodology may be obsolete, but the general approach seems sound. I asked ChatGPT to conduct the same concept analysis in 2023, and here are the results. The large-language model output is much more structured than our 2012 INN output, which was simply a list of words with numeric weights attached. The generative AI approach provides a nice outline format with headings. It also classifies the related concepts into attributes, antecedents, and consequences, something that my colleagues and I previously had to do by hand. It even gave me an example case, which is the final step in nursing concept analysis, and I thought its example was pretty solid! ChatGPT's response wasn't perfect, of course: It repeated some of the same terms under antecedents and attributes, which isn't allowed, and despite several tries with different prompts, I couldn't make the system understand that I was asking for measures of the concept (referents) instead of literature citations. Finally, I can't be sure that I'm capturing all relevant aspects of a concept, because ChatGPT responses are limited in length (perhaps I could have done separate queries asking for attributes, antecedents, and consequences, in order to circumvent that limitation). But the current results are an iterative leap forward from our 2012 output, suggesting that contemporary AI will allow us to automate the process of concept analysis to an even greater degree.

Comments

Popular posts from this blog

Why Does Psychotherapy Work? Look to the Intuitive Mind for Answers

  Jerome Frank's 1961 book Persuasion and Healing  popularized the idea of "common factors" that explain the benefits of psychotherapy, building on ideas that were first articulated by Saul Rosenzweig in 1936 and again by Sol Garfield in 1957. Frank's book emphasized the importance of (a) the therapeutic relationship, (b) the therapist's ability to explain the client's problems, (c) the client's expectation of change, and (d) the use of healing rituals. Later theorists emphasized other factors like feedback and empathy that are sub-components of the therapeutic relationship, and that can be clearly differentiated from specific behavior-change techniques like cognitive restructuring or behavioral reinforcement . Additional aspects of therapy that are sometimes identified as common factors include the opportunity to confront difficult past experiences, the opportunity for a "corrective emotional experience" with the therapist, and the chance t

Ethical Improvement in the New Year

  Just after the first of the year is prime time for efforts to change our behavior, whether that's joining a gym, a "dry January" break from alcohol, or going on a diet. (See my previous post about New Year's resolutions for more health behavior examples). This year I'd like to consider ethical resolutions -- ways in which we try to change our behavior or upgrade our character to live more in line with our values.  Improving ethical behavior has been historically seen as the work of philosophers, or the church. But more recent psychological approaches have tried to explain morality using some of the same theories that are commonly used to understand health behaviors based on Narrative constructs like self-efficacy, intentions, and beliefs. Gerd Gigerenzer suggests that an economic model of " satisficing " might explain moral behavior based on limited information and the desire to achieve good-enough rather than optimal results. Others have used simula

Year in Review: 2023

Here’s my annual look back at the topics that captured my attention in 2023. Over the past year I taught several undergraduate mental health classes, which is not my usual gig, although it does fit with my clinical training. The Two Minds Blog took a turn away from health psychology as a result, and veered toward traditional mental health topics instead. I had posts on   mania   and   depression .  I wrote about   loneliness   as a risk for health problems, as well as   hopefulness   as a form of stress inoculation. I wrote about the “ common factors ” in psychotherapy, which help to improve people’s mental health by way of the intuitive mind (I was particularly happy with that one). I also shared findings from a recent study where my colleagues and I implemented a   burnout prevention   program for nursing students, and another new paper that looked at the incidence of mental and physical health problems among   back country search and rescue workers . Mental health has received more