Abstract
In the coming years, the proliferation of artificial intelligence (AI) will lead to changes and challenges to many traditional practices in school music and beyond, particularly related to student assessment and grading. At the same time, the AI revolution may also facilitate new and exciting directions for assessment and differentiation in music education. In this article, I offer a range of considerations and suggestions for music educators seeking to teach music effectively and ethically in the age of AI.
Photo of Brian P. Shaw courtesy of the author
What Is Artificial Intelligence?
Artificial intelligence (AI) has recently surged into seemingly every corner of American media and intellectual discourse. Although AI has no single definition, the catchall term “artificial intelligence” generally refers to computer programs that exhibit some signs of humanlike intelligence in the pursuit of goals. 1 Systems powered by AI range from services such as search engines like Google and Bing and personal assistants like Apple’s Siri and Amazon’s Alexa to programs that best world champions at complex games, such as chess and Go, 2 and that independently sequence the genomes of pathogens. 3
There is probably no single class of AI tools that has generated as much recent conversation, particularly in education, as large language models (LLMs). LLMs use huge amounts of available language, mainly on the internet—books, journals, websites, videos, social media posts, even software code—to predict what words are likely to come next after other words. LLMs combine with natural language processing, which is the ability of a computer program to understand what the user is requesting without requiring a strict input syntax, to produce products colloquially known as chatbots. While ChatGPT is probably the best-known general-purpose chatbot today, others—currently including Claude, Bard, and Moonbeam—perform similar functions, with more platforms sure to have been released prior to the publication of this article. The power of LLMs comes from the enormous volume of information they have consumed and parsed. They return believably human and mostly correct answers to an almost infinite number of queries in any domain. As an illustration of chatbots’ abilities, I asked ChatGPT the following question: “What is the difference between balance and dynamics in music?” In seconds—less time than it would have taken me to write a response myself—I had the answer shown in Table 1.
The Difference between Balance and Dynamics Generated by ChatGPT, September 25, 2023
AI tools have been developed in fields other than text production. Image generators, such as DALL-E, Stable Diffusion, and Midjourney, can create high-quality images in response to natural language requests. And, of particular interest to music educators, there are AI tools for music as well. AI platforms can complete part-writing tasks, following the rules for voice leading. 4 Other tools exist to transcribe recordings, edit and splice audio and video, and so on. AI-driven platforms can even compose music in many genres based simply on users’ typed specifications, such as “Re-compose Mozart’s Rondo alla Turca in the style of Chopin.” 5
Clearly, these developments represent a paradigm shift in education and beyond. Even though AI tools are already powerful—ChatGPT is said to be able to outperform 90 percent of humans on the SAT 6 —their capabilities are predicted to continue to grow exponentially in the coming years. 7 Proponents tout AI tools’ potential economic and societal benefits, while critics emphasize their drawbacks and risks. 8 Whether schools are prepared or not, the US Department of Education recently reported that AI has the potential to profoundly change certain aspects of schooling in America. 9 While some practices in music classrooms will probably be less affected in the near term, AI is sure to bring change to music education as well, particularly with respect to student assessment. In this article, I describe three implications of the AI revolution for assessment in music classrooms: AI enables students to submit assessment tasks they did not complete themselves, AI introduces new complications, and AI may enable new possibilities for assessment.
AI Enables Students to Submit Assessment Tasks They Did Not Complete Themselves
The implicit premise underlying the evaluation of nearly all student work is that the task was completed by the student. Most teachers are familiar with the problem of students submitting work that was not their own but was represented as their own—that is, cheating. Some students copy others’ homework or test responses, some have a peer play on their audio performance recording, and some download entire papers for submission. While these challenges have probably existed for most of the history of education, AI escalates the ease with which students can submit an ever-increasing variety of assessment tasks that they did not themselves complete.
In education, cheating is undesirable for reasons beyond concerns about honesty and work ethic. While the purpose of an assignment is sometimes a completed product or artifact, student work is often primarily intended to help students learn through engagement in musical or conceptual processes. As the legendary education theorist Grant Wiggins wrote, assessment tasks “should teach, not just measure.” 10 When students cheat, they skip the critical steps of planning, contemplating, discovering, and revising, which inhibits their growth as thinkers and musicians. While cheating is detrimental to learning and to school culture, it can be difficult to detect. Therefore, it is wise to move beyond merely policing AI-related cheating toward avoiding it.
It may be helpful to review some assessment terminology. Assessment results can be classified as either formative or summative, depending on how they will be used. 11 Formative assessment is woven into instruction, intended to facilitate rather than judge student learning. Summative assessment is conducted at the conclusion of a unit or term, and its purpose is to facilitate determinations about which students accomplished the stated outcomes. A common recommendation is for teachers to base students’ grades on summative assessment results and to down-weight or not consider formative assessment in grading. 12
One reason that students cheat on assignments, through AI or otherwise, is that there is a grade penalty if they do not complete the assessment task. 13 If teachers do not count formative assessment in grade calculations, students have less incentive to cheat on formative assessment tasks, like homework and study guides. When summative assessment underpins students’ grades, the incentives encourage students to master the content rather than complete the assignment, and content mastery is probably the teacher’s ultimate objective. 14 Formative assessment helps students because it provides insights to their teachers that facilitate individualized feedback and instruction. The consequence for cheating on formative assessment is that the student is denied this feedback and risks doing poorly on the more consequential summative assessment. If the student is still able to succeed with the summative task, the formative practice task was evidently unnecessary for that student. If summative assessment underpins student grades, students will probably cheat less on formative assessment, and even if they do cheat, it may not be a crisis.
However, the possibility that students will use AI to cheat on summative assessment tasks is potentially a crisis. Educators should be aware of the ability of LLMs to generate impressive-seeming work in a variety of common formats, such as short and long written answers, music composition, and part writing, while requiring little or no content knowledge or effort from students. Furthermore, AI is sure to expand relentlessly into new types of student work. One simple step music educators can to take is to shift to assessment methods that are less amenable to AI interference. Students could complete summative assessment of knowledge in person and on paper, using verbal question-and-answer assessment, or otherwise in person. For longer papers, intermediate deadlines, such as an outline leading to a rough draft and then a final version, might also reduce chatbot usage. This is not a perfect solution, as LLMs can generate prose from an outline, even while adhering to a given word count.
Requiring students to explain their thought processes, or how they have used or will use facts or concepts in the future, is another way to reduce chatbot involvement because AI tools do not always give the reasoning behind an answer. Teachers can also experiment with chatbots to become familiar with the content and feel of an AI-generated response. LLMs tend to have a characteristic “voice,” and sudden shifts in writing style, particularly between paragraphs, suggest their use. Another way to recognize work copied from AI is that LLMs occasionally “hallucinate” by providing confident-sounding but decidedly incorrect information. 15 Writing that is factually correct but disconnected from what happened in class is also suspect. As a result, I now use a criterion titled “Related to Our Class Readings and Discussions” in my rubrics for written work.
Yet another possible remedy would be the use of algorithms to detect algorithm-generated text. Similar to software that checks student work for plagiarism, AI checkers are websites or apps that evaluate student work as being more or less likely to have been written by LLMs. (Interestingly, LLMs do not technically plagiarize; they employ paraphrasing, just like students have always been asked to do.) Although they may improve in the future, and AI developers have floated the possibility of a “watermark” that is detectable by other algorithms, as of this writing, AI checkers are not yet highly reliable. 16
However, the most effective and most learning-focused way to sidestep the possibility for AI-driven cheating is to engage students in authentic assessment. Authentic assessment involves shifting away from assessment of discrete abilities with traditional tasks, such as worksheets and tests, and toward performance tasks that are “representative challenges within a given discipline.” 17 Grant Wiggins, who coined the term, noted that tests and other traditional tasks “frequently reinforce—unwittingly—the lesson that mere right answers, put forth by going through the motions, are adequate signs of ability.” 18 Instead of “going through the motions,” authentic assessment invites students to demonstrate their abilities in the course of completing worthy real-world undertakings. For example, ensemble music students might be tasked with selecting, arranging, preparing, and performing a solo or chamber piece. Another example is planning and leading a portion of a sectional or full-group rehearsal or being responsible for teaching a skill or concept to the class. Students in a listening-focused course could plan and record a radio hour or podcast episode with musical selections, personal commentary, and humorous public-service announcements inspired by the school or community. Any attempt to embed creativity, student voice, whimsy, and other personal touches can decrease—although not eliminate—the risk that AI can be used wholesale.
As of this writing, LLMs currently have the most potential for cheating with assignments asking students to “explain” or “summarize”—lower-level processes in the revised Bloom’s taxonomy 19 —and less potential for cheating with assignments in which students will “evaluate,” “critique,” “generate,” or plan,” which in the new Bloom’s taxonomy are the higher-order cognitive processes. Using another taxonomy, known by the acronym KRSPD, 20 teachers might de-emphasize assessment of objectives classified as “knowledge” in favor of “reasoning,” “skills,” or creative “products” or artifacts. Not coincidentally, authentic assessment emphasizes these same kinds of abilities. As a heuristic, the more relevant, meaningful, useful, or authentic an assessment task is, the less likely it is to be directly copyable from chatbots. In summary, being strategic and intentional about how assessment is structured, combined with disclosure policies detailed in the next section, can promote honesty in students’ use of AI tools in music classrooms and lessen the risk of cheating in summative assessment.
AI Introduces New Complications
It may be tempting to reflexively declare that any amount of AI assistance with schoolwork is corrupt, but it is important to acknowledge that the line between an acceptable and unacceptable amount of help from AI or any other resource is and has always been fuzzy. For example, it is generally accepted that parents or other family members helping with homework is allowable. Tutors and private-lesson teachers are also frequently tasked with helping students complete school assignments. Similar to asking a parent or teacher a foundational factual question, or for help checking their completed work, students could ask a chatbot instead. Students have already been using internet searches and software such as the math app Wolfram Alpha this way for years.
What’s more, AI and other technological tools are already in use throughout education. Word-processing software, such as Microsoft Word and Google Docs, already uses AI to check spelling and suggest grammatical improvements. Using such suggestions is generally not regarded as cheating. Famous singers use computer programs to help their intonation, and nearly all performers splice performance segments together and use digital mastering when producing a recording. While there are some clear-cut cases, such as AI being used to generate an entire essay that is submitted as-is, there are many possible ways that students could use AI that are not easily described as either acceptable or unacceptable.
There have been repeated calls for governments and developers to issue regulations and policies about AI. 21 Because US schools seem far from adopting a consistent approach, for now, teachers will need to establish and communicate their own set of guidelines for how AI tools may be used. AI policies should include details about which uses are permitted and which are not and include appropriate procedures for disclosure of how AI was used. One common approach is to require disclosure of AI use with a description of how it was used and by mandating that any strings of four or more consecutive words copied from AI use a different color. Students who are following the policy may shake up some wording to avoid having to disclose, but paraphrasing sources is a long-standing and commonly accepted practice.
Of course, like any policy, an approach that is standard across a school building will be most understandable to students. And, as all teachers know, a policy is not a panacea. Sustained effort will still be required to explain, enforce, and update expectations. Table 2 shows the policy I am using this year with my college courses, but each class will necessarily have its own considerations based on the nature of the summative assessment tasks that students will complete. Still, penetrating questions are likely to arise. For example, are students allowed to use software to help with their intonation on a recorded performance assessment or to splice clips together? What about using AI to check their writing for run-on sentences and mistakes with capitalization and punctuation? If students are assigned to compose and perform a song in guitar class, are digital “collaborators” allowed? Can students use AI to suggest an ending to their composition if they cannot think of one? What if a student uploads several of their previous compositions and asks an AI tool to compose more music in the same style? It may be that some technology is indeed appropriate; if it is used by professionals in a field, it may be suitable for students in the same field. Such conversations are likely to be fascinating to students and bring the existence of AI tools out into the open for transparent consideration. New tools and new questions will inevitably lead to changes with some long-standing assessment practices. However disorienting they might be, these changes are already underway and will inexorably continue. Music teachers can embrace certain advances while strategically eluding others, but it is critical for thoughtful practitioners to maintain a posture of engagement because the challenges brought by AI are more likely to accelerate than they are to dissipate. As a consortium of prestigious school leadership organizations recently summarized, “Essentially, you really only have two options: attempt to maintain current assessment approaches in a highly controlled, technology-free environment, or adapt your assessment methods.” 22
Sample Artificial Intelligence (AI) Policy
An additional complication is concern about algorithmic bias, which is defined as “systematic, unwanted unfairness in how a computer detects patterns or automates decisions.” 23 As law professor Michele Gilman recently summarized, “AI systems can produce inaccurate, biased, and discriminatory outcomes, often because the data fed into these systems reflects historical and social inequities that exist in the real world.” 24 We cannot be sure that students’ cultural preferences, speech styles, singing voices, and so on are not caught up in AI tools’ decisions about what is “good” or “normal” in musical knowledge, performance, or preference. In fact, AI has been shown to underwrite assessment discrimination in other subjects. 25 This is perhaps not surprising given that, at least for now, teachers who use AI for assessment are effectively handing control of student evaluation over to anonymous and unaccountable employees of for-profit software companies. 26 Furthermore, similar to the broadband divide during the COVID-19 pandemic, AI has the potential to exacerbate existing inequity in the educational system, with students able to access its features being advantaged over others who cannot. 27
AI May Enable New Possibilities for Assessment
Despite their complications, AI tools also have the potential to facilitate assessment in music classrooms in novel and beneficial ways. Grading software is getting ever better at evaluating written knowledge assessment. Other platforms are increasingly capable of appraising the accuracy of individual students’ musical performances. Certainly, more capabilities will appear in the near future. To be clear, AI is not a substitute for a professional educator, and in fact there are well-founded concerns about AI as a tool of school corporatization. 28 Still, teachers can productively use chatbots and other software to speed up the process of creating and scoring certain kinds of assessment tasks. At least for now, teachers should be wary of uncritically adopting test questions, rubrics, or writing prompts from LLMs. However, LLMs do have the power to assist teachers’ practice by, for example, generating lists of terms about a topic or describing concepts in several different ways. 29 Several education-focused AI tools are also in development, 30 potentially enabling teachers to bring their own pedagogical ideas into conversation with those of others.
The fact that LLMs’ core function is predicting text based solely on existing text is, in this situation, again both an advantage and a disadvantage. It is possible that an AI-generated lesson plan will reproduce popular but ineffective, biased, or misguided teaching approaches. Furthermore, the best teachers tailor their lessons to their students, which AI cannot meaningfully do. The possibility exists that, with additional refinement and the involvement of professional education organizations, these tools will become more reliable and useful, but human teachers’ knowledge of and connection with their students, pedagogical insights, and professional judgment will remain indispensable components of quality teaching.
Another exciting possibility related to student assessment involves students who struggle with a certain mode for demonstrating their abilities who could benefit from the freedom to use an alternative assessment method. 31 In many cases, teachers arbitrarily select an assessment method based on convenience, but computer automation may soon render some distinctions between aural and visual modalities obsolete and facilitate differentiated assessment on a scale previously unimaginable. AI can translate spoken or written materials between many languages, speak students’ written responses or those given in American Sign Language, dictate students’ spoken responses, verbally describe images, and narrate videos. 32 The possibility that AI tools may allow students with visual or hearing impairments, social anxiety, language hurdles, or other situations to be assessed in a way that foregrounds their abilities more than their presentation is a transformative opportunity to remove barriers and create accessible music classrooms.
Finally, it is worth considering the origin of the term “assessment”: the Latin asside-re, which translates to “sit beside” or “sit with.” Music teachers frequently “sit beside” their students in a class or lesson, directing successive attempts to sing, play, improvise, dictate, and so on. Many music educators also encourage their students to practice their skills away from the teacher’s direct guidance. While it is certainly important for students to cultivate their own practice skills, research suggests that students spend much of their practice time in suboptimal ways. 33 AI software may soon be able to more seamlessly “sit beside” students as they practice. Platforms have already been developed for both instrumental and vocal practice and music theory and will surely continue to expand and improve over time. 34 AI tools for guiding practice, checking homework, and similar functions could provide immediate feedback to students and free up teachers’ time for other, less automatable tasks.
Conclusion
AI is already present, at least in the background, in nearly every classroom. While Table 3 describes recommendations for the immediate future, it is too early to describe, or even confidently predict, how AI will change education in the long run. However, it is undeniable that it will increasingly affect music teachers’ practice. Leadership will be required from all corners of the field to navigate inevitable challenges and to develop and publicize recommended practices. Used judiciously, by professional educators rather than in place of professional educators, AI may soon revolutionize individual assessment and ultimately promote achievement in school music.
Recommendations for Approaching Assessment Given Current Artificial Intelligence (AI) Capabilities
