Potentially Useful

Writing boring posts is okay

Just use the GPL

in: Programming Data Science

Tl;dr: if you choose to publish some code that you’ve written by yourself, and you want attach an “open source” license to that code, you’ll probably be happiest if you choose the GPLv3.

There are many of reasons to publish your code (or data analysis) with an open source license (maybe it’s something you’ve made just for fun), and there are nearly as many licenses to choose from. Twenty years ago, when the free and open source software (FOSS) phenomenon was starting to penetrate the public conscious, the GNU GPL was practically synonymous with “open source”. But in contemporary practice, more permissive FOSS licenses are more popular (due in part, I suspect, to the encouragement of organizations such as Microsoft-owned GitHub).

The GPL is a “copyleft” license, designed to use copyright law to spread open source across the computing ecosystem; people who mix code with a GPL license with code of their own must release the ensuing product under the same license. The MIT license, which was the most popular license for GitHub projects as of 2015, is an example of a “permissive” license; it mainly protects the original code author from claims of liability, and allows people to use it however they wish. You can incorporate MIT-licensed code into a closed-source project without asking for permission, or even informing the original author of the code.

I’m not here to evangelize FOSS, or argue that any license (or class of license) is better than any other in general. The most successful and important FOSS projects are community efforts, and the choice of license should be made by the community. But while it might be true that most FOSS code is written as part of such projects, I would guess that most FOSS projects are small one-off efforts undertaken by individuals who happen to have chosen to publish their code in a manner that allows its reuse. If you’re thinking of doing this, I recommend that you license your code under the GPL.

The solo-programmer (or data scientist) releasing a small project of their own is best served by the GPL because it offers a compromise between maintaining full control of their code (as does traditional copyright) and supporting other FOSS projects. You might argue, “I’m not an open-source zealot, I don’t mind some projects being closed source!” That’s not a problem; if you release your own code under the GPL, you can still offer it under different licenses on a case-by-case basis. If somebody contacts you asking to use your code in their closed-source project—and you should encourage people to do so—you get to decide whether their product, company, and business model are things that you want to support. The only people who are allowed to use your code without checking in with you are other open-source developers releasing FOSS of their own. As a programmer considering open-sourcing your project, this is presumably something that you support.

I’m actually not a fan of “intellectual property” restrictions; I believe that copyright and patents do more harm than good for science, the arts, and even commerce. However, as long as we live in a world with copyright laws, I believe that we should use them to ensure that our values can flourish. I haven’t bothered changing the license on all of my (small and simple) public projects—I’m not even sure it’s worth the effort. However, moving forward, almost everything I post will be available under the GPL.