Potentially Useful | Programmers should reject LLM-based coding assistants

The complexity of our world is beyond the limits of human comprehension. In spite of this, we generally feel like we understand what’s going on around us. Each of us achieves this feat of self-deception by cobbling together an assortment of abstractions, mental models, schemas, and metaphors¹. When life confronts us with yet another task that demands attention, we select the most appropriate of these and ignore the universe of details that are (hopefully) irrelevant. This approach generally works surprisingly well!

Computers are less complex than the whole world, but they still resist human comprehension. Computer practitioners once again rely on abstractions, etc., in order to muddle their way through things—it’s the only hope we’ve got². Programming languages are among the best tools in our arsenal, allowing us to transform human-written source code (which is a sort of mashup of human language and mathematical notation—another good tool for approximating the world) into the list of numbers comprising a CPU’s instructions. Sometimes our programs even work.

Some people truly love programming for its own sake, but there aren’t enough of them to fill all the jobs that require doing so. Further complicating matters, even these folks only really like writing certain kinds of code, which generally represents a minority of the code employers need. When taken together, these observations imply that most code is written begrudgingly—it is not exactly contributing to self-discovery or spiritual growth.

This is probably one reason that large language model-based coding assistants (LLMCAs) are becoming popular with some programmers. The most well-known of these tools is GitHub Copilot, developed by Microsoft and OpenAI³. LLMs work by learning representations of language (including, in the case of LLM-based coding assistants, programming languages) that result in good performance at predicting the next token in a sequence. For a programmer using a LLMCA to help with their work, they experience “auto-complete for code”. In short, they speed up the process of writing programs, and “writing programs” is the thing that programmers are paid to do.

There are ethical issues with the use of the LLMCAs that currently exist. Copilot specifically was trained on code that was posted to GitHub, and the authors of this code were not asked for their informed consent to have their work being used this way⁴. LLM-based models are also particularly energy intensive, which is something that should concern anybody who cares about climate change⁵. LLMCAs are also probably illegal⁶ as Copilot is known to have violated the licenses of most open source code posted there. Especially damning is the use of “copyleft” code in its training corpus. Such code was licensed in a manner that allows for its adaptation and reuse (which is what Copilot is ultimately doing—adapting and reusing code at scale), but only when the resulting code is also shared with the same license. Whether or not you’d like to see the proliferation of Copilot-generated code result in all programs becoming copyleft, I don’t think that’s what its users (or their employers) intend to have happen.

But the above issues with LLMCAs are at least solvable in theory. Viz: a company as well-resourced as Microsoft could train its model using code that was collected with the authors’ explicit consent, and advances in energy infrastructure and algorithmic efficiency might bring the climate impact of coding assistants down into acceptable levels. However, there is a existential issue with LLMCAs that should inspire programmers to reject them out of hand: even though they address a real problem, they are the wrong tool for the job.

The real problem that LLMCAs attempt to address is that many programmers are ill-served by the rest of their tooling. I don’t have the personal experience with web programming to opine on the state of the JavaScript ecosystem, but there is an emerging recognition that the current status quo (which starts by reaching for the JavaScript framework du jour, and solves the problems that arise from using it by bolting on addition dependencies) is unpleasant and untenable. This approach to developing applications may generate a lot of code, but it isn’t really programming⁷; while bolting together disparate parts is sometimes an appropriate way to build something, it can’t be the only way we build things. As the early 20th century biologist Edward J. v. K. Menge noted⁸:

Breeding homing pigeons that could cover a given space with ever increasing rapidity did not give us the laws of telegraphy, nor did breeding faster horses bring us the steam locomotive.

Sometimes people get the opportunity to apply cleverness and creativity to find new solutions to problems. This usually starts by taking a step back and understanding the problem space in a holistic manner, and then finding a different way to think about it. Coders working with LLMCAs⁹ won’t be able to do this very often.

So what does a good solution to this tooling problem look like? Here I’ll share an example from the R world, since it’s the primary language I’ve programmed in for the past ten years. I’ve been doing statistics and “data science” for longer than that, and programming longer still, but two important things happened ten years ago that turned me into an “R programmer”: I started a new job that was going to require more statistical programming than I’d done in academia, and Hadley Wickham was hard at work on a new R package called dplyr (which was to become the centerpiece of a family of packages collectively called the Tidyverse¹⁰).

I used R before 2014, but I went to tremendous lengths to avoid actually programming it. Instead, I would do all of my data wrangling on Python (in version 2, which was the style at the time) and then load “tidy data” into R to perform T-tests and ANOVA. In my experiments with R as a programming language, I found its native interface for manipulating data frames¹¹ (now frequently called “base R” to distinguish it from Tidyverse-dependent approaches) to be clunky and unintuitive. The Tidyverse changed all that; dplyr introduced a suite of “pure” functions¹² for data transformation. They had easy-to-remember names (all verbs, since they performed actions on data frames), consistent argument ordering, and were designed to work well with the forward pipe operator from the magrittr package.

Data wrangling in the Tidyverse just feels different (and better) than working with its predecessors. While doing a live coding exercise as I interviewed for a previous job, one of my then-future-colleagues—a die-hard Python user—commented on how “fluid” programming in the Tidyverse seemed. Compared to the syntax of Pandas, a Python data frame module that provides an interface not too different from base R’s, it’s a fundamentally different beast.

That stuff about metaphors and abstractions is relevant here, because these explain why the Tidyverse feels different. It operates on a different level of abstraction than base R’s data frame operations; i.e., it depends on a different mental model of the underlying data structures. Just to be clear: its advantages do come at some cost, and not everybody agrees that these trade-offs are justified. But based on the popularity of the Tidyverse, I am not alone in thinking they are. Almost everything we do on computers follows this pattern. Writing our data analyses in R and Python is much easier than using a “low-level” language like C, but this additional layer of abstraction can make our programs slower and less memory-efficient. For that matter, carefully optimized assembly code can outperform C, and I haven’t met anybody who analyzes data using assembly. Programming languages (and paradigms, libraries, frameworks, etc.) proliferate because they solve different problems, generally by working at different levels of abstraction.

LLMCAs also introduce trade-offs: for example, programmers can generate code more quickly, but they don’t understand it as deeply as if they had written it themselves. Rather than simply argue about when (if ever) this trade-off is worth making, I invite you to imagine that Copilot had come to R before the Tidyverse had. Instead of getting an interface that allows data scientists to work faster by operating at a more comfortable level of abstraction, we’d be writing base R at faster rates using its suggestions. Both approaches result in programming tasks being finished more quickly. However, the programmer using the Tidyverse knows exactly why and how their code works (at least at one level of abstraction) and enjoyed the time they spent on it. The programmer using Copilot would only have a sketchy sense that their code seems to work, and they probably didn’t have much fun getting there. This is why I fundamentally oppose LLMCAs: the challenges that drive programmers to use them would be better solved with their own “Tidyverses”.

From a business perspective, it might seem less risky to rent access to LLMCAs than invest in the development of new tooling, but this is a mistake. The tools may be relatively inexpensive to use now, but that’s bound to change eventually. The cost of building and deploying a new LLMCA ensures that only a few Big Tech players can compete, and these companies have track record of collusion¹³. I also find that many hiring managers underestimate how much more productive their workers can be when given a challenging but fun task than when asked to do something “easier” that’s boring.

I’m no business guy, my call to action is directed primarily to my fellow “senior” developers. Don’t evangelize for LLMCAs—instead push harder for the resources to develop better tooling. If you currently use LLMCAs yourself, identify the types of tasks that benefit the most from them, and note these as spaces in need of creative solutions. Encourage junior programmers to develop a deeper understanding of the tools they use currently, and insist that your colleagues at all levels imagine something better.

For its part, science can be a great tool for exposing the limitations of these mental models. But at the end of the day, it’s still only producing different, hopefully better models, operating at specific levels of abstraction. ↩︎
I invite any extremely hardcore coder who scoffs at my claim that computers are difficult to comprehend to reflect on the last time they were required to think about the physics of transistors or the capacitance of circuit traces when they were programming their web app or whatever. ↩︎
Like many LLM-based technologies, these are currently being marketed as “AI”. There’s no reason to believe that these machine learning technologies will bring us closer to “general” artificial intelligence. ↩︎
Somebody may argue that this sort of use was permitted by GitHub’s Terms of Service, but there are two flaws in this argument. First, the people posting code to GitHub are not necessarily the code’s authors; plenty of projects have been “mirrored” there by people who were only hoping to make them more accessible. The more glaring error in this argument is that it commits the cardinal sin of mistaking “not illegal” for “ethical”. Plenty of atrocious behaviors have been perfectly legal. Stealing people’s computer code is certainly not in the same class of behavior as slavery and genocide, but I still learned not to assume that those who are quick to point to a Terms of Service are taking ethical issues seriously. ↩︎
ChatGPT alone is already consuming the energy of 33,000 homes. ↩︎
I say “probably” because the courts have yet to rule on the matter. At least one lawsuit has already been filed, but I can’t say that I’m particularly optimistic that the courts would rule in favor of individual hobbyists against the interests of some of the wealthiest individuals and corporations in the world. ↩︎
Just to be clear, this isn’t a “kids these days” rant about the skills of junior programmers, if anybody deserves the blame here it’s the managers and senior programmers who allowed this rotten situation to fester. ↩︎
Menge, E. J. v. K. (1930). Biological problems and opinions. The Quarterly Review of Biology, 5(3), 348-359. ↩︎
And also the doubly-unlucky coders who are neither allowed to use LLMCAs nor given the resources and opportunity to be clever and creative. ↩︎
There was a brief period where this was colloquially called “the Hadleyverse” in homage to Wickham, but he insisted on the new name. ↩︎
A “data frame” is another abstraction, used by many programming languages or libraries use to represent observations about distinguishable units; it uses the same metaphor of rows and columns that spreadsheet programs like Microsoft Excel use. ↩︎
A “pure function” is one that “doesn’t have side effects”. In slightly plainer English, a pure function doesn’t do anything other than (potentially) return a new thing. Object methods that update object attributes aren’t pure functions, nor are any functions that modify their arguments or global variables. ↩︎
For instance, several Big Tech companies recently reached a settlement with workers alleging that they had engaged in illegal wage-fixing. ↩︎