Chatbot Changes and Challenges in 2023

I wrote last summer about artificial intelligence tools that are increasingly able to approximate human speech in free-form conversations. These tools then burst onto the public stage with the release of OpenAI's ChatGPT at the end of November last year. As you probably know by now, the acronym "GPT" stands for "generative pre-trained transformer," which highlights the three most important aspects of this technology: (1) it generates novel responses that aren't based on an a specific algorithm or decision rule, but instead rely on pattern recognition; (2) it has been pre-trained by consuming massive amounts of writing from the Internet -- much more than a human could read in several lifetimes; and (3) it transforms those prior writing samples using a trial-and-error process that predicts the next phrase in a sequence until it has come up with a response that seems intelligible to humans. ChatGPT works much like the auto-complete feature in your email or text, but on a more sophisticated scale. Most importantly, it doesn't know what it is saying -- only that its response is what humans are likely to be looking for in that particular situation. Other AI tools generate other types of work using similar methods -- e.g., DALL-E for images, or several different AI music generators. If you are still working to understand how these new technologies produce their very impressive results, and how that's different from human intelligence, I particularly recommend this series from the New York Times.

The capabilities of AI continue to evolve, and people are constantly devising clever new uses for these tools now that they are widely available. But I wanted to take a look at the impact these tools have had in the few short months that most people have had the opportunity to use them. In my world of scientific writing and university teaching, here are some ways AI is already re-shaping the landscape:

Professors are concerned about students writing term papers using AI. Colleges are using strategies like changing their honor codes, or creating "signpost" assignments to circumvent AI (e.g., referring to Thomas Jefferson and the Declaration of Independence: "as described on p. 4 of your textbook, a person wrote a document. What were the major influences on his work, and what was novel about it?" A chatbot wouldn't have the textbook, so it couldn't produce a valid response. Other suggestions for faculty are to ask students to create and then critique AI-generated responses, to comment on current events or their own experiences, or to refer to information that was only presented verbally in class. I'm not sure I agree with that last one, because it can create access issues for some learners. The tool may be particularly helpful in teaching skills like critical thinking, problem-solving, and self-reflection, important meta-skills that have been historically hard to develop in learners. Some schools have blocked chatbot websites from their networks, although this strategy is more feasible in K-12 education than at the university level.
College students, on the other hand, have been early adopters of this new technology -- and not just for cheating. Like many people, they are using it to write short stories, silly songs, or in general to enhance creativity. (My colleague Kai Larsen at the University of Colorado Boulder used ChatGPT and Dall-E to write an instant children's book about his pet bird). Enterprising students have also discovered some of the potentially legitimate ways to use chatbots in writing, as described below. Some faculty have embraced generative AI tools either as a way of teaching students about artificial intelligence, or as an initial idea generator that students can then riff on and adapt from. Chatbots can also make effective tutors to help students work through problems or understand difficult concepts. AI can accelerate the existing trend toward student-led learning, with faculty moving from the role of "sage on the stage" to the less-expert, more supportive role of "guide on the side."
New anti-plagiarism tools including GPTZero and an AI-writing-detector on the popular educational software program Turnitin have launched features designed to detect AI-generated student responses. Whether AI use does in fact constitute plagiarism is an open question, one that will likely be eventually decided in challenging court cases over copyright law. In the meantime, of course, instructors are within their rights to ask students not to use this type of tool. But whether the AI-detection tools are up to the job is another question. (So far, studies suggest that professors cannot systematically detect AI-generated responses on their own). As long as instructors take an adversarial stance toward AI use, I predict a sort of arms race between improving AI systems and improving AI detectors, in which both improve over time but neither gains a definitive edge.
On the practice side of academic health care, some health care providers have tried using ChatGPT to write progress notes on their patient encounters. This application of AI seems like a potential cost-saving alternative to replace the "medical scribe" role that some offices have instituted to document patient visits, and some patients might actually be more comfortable with an AI in the room than with another human being who isn't their doctor. Health care providers have also reported using generative AI to help them find ways to say things that patients might not want to hear, as an algorithmic tool to help in making challenging diagnoses or medical decisions, or to help personalize patient care beyond the average results produced by clinical guidelines that make the same recommendations for everyone in a particular demographic group. However, there are concerns about confidentiality because publicly available AI tools are not currently HIPAA-compliant systems. The FDA has issued new guidance about how AI can be integrated into medical devices or decision support, an issue that I also blogged about earlier this year in my post on the future of "Smart Health" tools. Health care providers also need to think carefully about their malpractice risk, because they are ultimately responsible for any medical diagnoses, decisions, or advice, and they risk being misled by bad advice from AI. "The robots told me to do it" is clearly not going to be a successful malpractice defense.
For all of us as patients, generative AI can provide medical advice. One study found that, while not infallible, it does about as well as a physician in providing generic advice based on a set of symptoms. That of course doesn't mean that it's always right (physicians aren't either). But as a method for serving up the "collective wisdom of the Internet," generative AI works pretty well and may provide useful suggestions on whether to seek care, from whom, and how soon. It may do similarly well in giving generic psychotherapeutic advice. There's no substitute for a personal relationship with a trained expert, of course, but AI tools might be a useful adjunctive resource to supplement other resources, for off-hours needs, or as a first step in seeking care.
For professors as writers, generative AI has a range of uses. First, generative AI tools function as a competent editor, especially for making complex scientific writing understandable in everyday language. It can also help with standard English-language usage for people who don't speak English as a first language. You can give AI "style" prompts, such as "write this sentence in third-person objective language" or "write this at a 6th-grade level," if you struggle with achieving the right level of formality or informality in your writing. Second, passing drafts back and forth with an AI large-language model can help you to develop your writing skills, in the manner of a personal writing tutor. Third, AI tools write decent first drafts, especially on relatively non-specialized topics like my request for ChatGPT to write a commencement address. I also had ChatGPT write the first draft of an abstract for a Federal grant application (after I had already written my own and turned it in): The tool did about as well as I did on a first draft, and probably even a little bit better, although by the final version I added plenty of local background and operational details that made it demonstrably much better than the AI could have produced. Still, just having the first draft in hand as a document to edit and modify would probably have saved me an hour of work, so researchers who use AI tools can radically speed up their process and potentially submit more grants in the same amount of time. Finally, AI tools have good search engine capabilities: They are great at generating lists or suggesting alternatives. Because current large-language models have already read more material than a single researcher could in 10 lifetimes, they are more likely to think of oddball entries or ideas that might not have occurred to an individual human. So I have taken to asking AI to create lists on a topic, just to be certain that I didn't miss anything.
As writers speed up their process and automate parts of their work, so do grant or journal reviewers. The National Institutes of Health are already using AI tools to simplify the process of literature reviews and associate specific research teams with their funding sources and resulting publications. NIH is also using AI to route applications to the appropriate scientific review panel of experts, and to identify overlap or redundancy when investigators submit several closely-related applications in an attempt to game the system. In a recent statement the NIH unambiguously said that human reviewers may not use AI tools to complete their reviews, but this ruling was based on confidentiality concerns that would result from uploading a grant title, abstract, or aims page to a third-party company's server. What NIH did not do in their statement was to forbid grant writers from using AI tools in creating their grants -- instead, they said that they won't ask who wrote a grant, that authors are ultimately responsible for their content, and that people who use AI tools do so at their own risk. The Journal of the American Medical Association (JAMA) recently made a similar statement prohibiting use of AI for reviews (because of confidentiality rules) but leaving the door open for use of AI in writing, as long as writers disclose that use. We can therefore expect to see increasing numbers of applicants use these tools as time goes on. And as AI authoring tools expand grant-writers' ability to generate more grants that are truly novel, NIH will need even stronger AI review tools just to cope with the flood of new grant applications coming in. Eventually, I could foresee the first review of a grant application being done by an AI, which performs a set of basic quality checks and triage to determine whether or not the proposal gets sent on as part of a limited set for a finite pool of human experts to review. Someday, the grant-making process may consist of researchers' AIs sending proposals to government-review AIs, with decisions being made largely by the machines!
The greatest worry in all of these applications of new AI tools is the risk of "hallucinations," which are statements that generative language models create with supreme confidence and plausibility, but that are demonstrably false. In one egregious example, the editor of a journal that I support had a recent phone call from someone trying to locate an article that had been published in our pages. The paper had been cited in a recent report, with the author's name, date, title, etc., but the person on the phone hadn't been able to find it on our journal's website. The catch: This article didn't exist! An AI "author" had made it up as a plausible-sounding source of information, but the study was completely nonexistent. A lawyer recently found to his dismay that his AI assistant had made up cases as plausible-sounding but nonexistent "precedents" for his argument in court (when the situation was discovered, he faced legal charges of his own). Because of the risk of hallucinations, the International Committee of Medical Journal Editors (ICMJE) recently put out a statement saying that AI tools can never be listed as authors of journal articles, because they cannot take responsibility for their actions (they can, however, be listed as part of the authors' description of their methods). For our journal, the author statement is being revised to remind authors that they are always personally responsible for all content in their writing, including ensuring that citations are true and appropriate; that was always true, but in the new age of AI it doesn't hurt to remind people of that fact. Besides these concerns, IT experts have begun to raise questions about data security (how protected are the prompts that you give to AI systems?) and data integrity (what happens if a malicious actor tweaks the AI algorithm to give you intentionally misleading results?).
A lot of ink has been spilled on the issue of whether generative AI will displace knowledge workers such as university faculty. This one I think is over-stated. Generative language models are good at creating standard text in response to scripted prompts. There are certain kinds of formulaic or technical writing, like creating a user manual for a piece of software, or writing a Federal program grant application, or perhaps even the Methods section of a manuscript, where I believe that AI will shine. In that sense, it can probably make knowledge workers faster or more productive -- always a desirable thing for time-starved university faculty! But in terms of replacing them completely, I just don't see this happening. Too much contextual knowledge and disciplinary insight is needed to produce a viable scholarly work product, including things like lecture notes or lab manuals. Ad-copy writers may have reason to fear for their jobs, but university faculty probably don't; instead, AI will become another valuable tool that helps knowledge workers to create viable rough drafts that can then be improved with human expertise.
Finally, in the abstract realm of the philosophy of mind, and our understanding of human consciousness, generative AI has pushed the envelope like almost nothing that came before it. Previously esoteric thought experiments are now part of popular discourse: John Searle's Chinese Room example in which a person receives slips of paper in Chinese, references them against a huge library, copies down a response from a book, sends their slip of paper back out with a new message, and manages to carry on a conversation even though they don't speak Chinese; or the Mary's Room example in which a woman becomes a noted expert on the physiology of color perception even though she has spent her whole life locked in a room with a black-and-white monitor. These thought experiments were originally intended to help us think about how human consciousness works, but they now to some extent serve as literal descriptions of generative AI. Philosopher of language Noam Chomsky has argued that generative AI can never be comparable to human intelligence because the technology doesn't understand what it is saying. Seemingly in contradiction of Chomsky's views, psychologists at Stanford found that newer AI models had spontaneously shown evidence of "theory of mind" -- a capacity to solve problems that involve false beliefs like "Bob believes there is a cat in a box, but it is really a dog. Bob opens the box; what does he see?" (earlier versions of the software said that he would see a cat; my judgment of how the software now produces the right answer is simply that it has seen this type of problem and at this point recognizes the pattern). Philosopher Daniel Dennett, who has argued that human consciousness is just a process of pattern recognition and response production, recently wrote about the dangers of AI systems creating "counterfeit humans," but seemingly without recognition that his own arguments about philosopher's zombies suggest that all of us "real" humans do nothing more than what AI is now able to mimic. AI technology has not resolved longstanding issues about the "hard problem" of consciousness or free will, but it does provide a strong working model for how humans' Narrative minds might actually work. Of course, the Narrative mind is not the entire human mind -- and from the perspective of agency or causation it probably doesn't matter as much as we think it does. Still, AI has dramatically accelerated the conversation about consciousness and behavior by taking some previously pie-in-the-sky examples and giving them real-world existence.

Generative AI tools are still new, and it's hard to know what they will be capable of 6 months or 1 year from now. But they are already changing the work of educators, clinicians, and researchers in demonstrable ways.

Two Minds Blog

Search This Blog

Chatbot Changes and Challenges in 2023

Comments

Post a Comment

Popular posts from this blog

Our Reactions to Robots Tell Us Something About Ourselves

Inside the Intuitive Mind: Social Support Can Facilitate or Inhibit Behavior Change