Description
Clearing up license confusion (post-mortem)
For those of you who missed the action, a large number of people recently showed up in a flash mob to discuss and/or complain about license issues in pynose
(and some of my other repos). They mainly came from Reddit and 4Chan. (Source: "GitHub Insights")
Some big questions: Were the claims justified? Was I unfairly targeted? Are there other popular repos with the same issues? Let's go through the points that were brought up and see based on some helpful questions:
(Question) Can a repo fork/copy from another repo while removing history? (The part in question: 5b7314a, where pynose
was created from a modified version of nose
.)
(Answer) At it turns out, yes, that's legal: That's how Microsoft created Playwright from Google's Puppeteer: See microsoft/playwright@9ba375c, which was made from a modified copy of Puppeteer (https://github.com/puppeteer/puppeteer).
(Question) Can a repo change its license to MIT from something else?
(Discussion) The Puppeteer License is Apache: https://github.com/puppeteer/puppeteer/blob/main/LICENSE. However, when Microsoft created Playwright, they changed the original license to MIT: microsoft/playwright@794b59c. Certainly looks legal if Microsoft can do it. Turns out that maybe it wasn't OK because they later changed it back: microsoft/playwright@562e6f5. So even with code reviews and a very large legal team to double-check things, even big companies can get licensing wrong sometimes. If that's the case, then certainly smaller teams (or even individual repo maintainers) may get licensing wrong, or not know correct licensing from wrong licensing if the repos they're learning from didn't get it right either. I ended up "pulling a Microsoft" by setting a license to MIT from a non-MIT license. Got it fixed though: #30. Also fixed a secondary license issue: #34. In the process of that secondary fix, I learned there there was another repo (not mine) that also had a license issue: https://github.com/pdbpp/pdbpp. After pointing it out, someone opened a ticket for it.
(More Discussion) As it turns out, licensing issues are quite common: https://github.blog/2015-03-09-open-source-license-usage-on-github-com/ (may be an old article, but it says only 20% of repos have a license (30% for forked ones) and that "Open source simply isnβt open source without a proper license." So although there was a licensing issue with pynose
(now fixed), there was a disproportionate response focused at my GitHub. Lots of people disrespected not only me, but also one of the original nose
maintainers who came to help. (They downvoted him because he thanked me for resurrecting nose
. If you look through the other comments on the thread, anyone who said positive things about me got downvoted.)
For some context (as not everyone here knows) nose
is (or once was) a very popular Python unit-testing framework that hasn't been maintained in over 8 years:
Major companies around the world still depend on it. Unfortunately, nose
stopped working when Python 3.10 came out. Although it was easy to patch it at that point, the number of things that broke with nose
increased at a rapid rate with the releases of Python 3.11 and Python 3.12. People either didn't want to fix it, or didn't know how to fix it. Although I'm quite busy with a lot of other things, I decided to fix it because I knew how to do it. (I've been using Python ever since working at ITA Software, which was acquired by Google.) So I took on that "burden" and created pynose
. Major companies that were still dependent on nose
began using it. Those companies include big names like Mozilla, Intel, DocuSign, Wikimedia, and SAP:
Some of my fixes for nose
were shipped with Alpine Linux. Eg: (Meaning that they would be found on Azure, Google Cloud, AWS, and Docker instances around the world.) https://github.com/alpinelinux/aports/blob/5fb0b96b79977fd89ee20f1d2bd3367762df67a1/community/py3-nose/python-nose-py312.patch
With people finding out about the popularity of pynose
, they came by and then called in others from 4Chan, Reddit, Mastodon, etc. While some people did offer constructive criticism, there were many others that either came by to just rant, or to wave torches & pitchforks. The ones with the torches & pitchforks mostly came from https://boards.4chan.org/g/thread/101339536. Some of the people there made comments that were way out-of-line and very offensive. (You can read their long thread and make your own assessments.) There were lots of extremely hostile messages on 4Chan, and calls for people to come downvote my pynose
posts.
Some tickets opened in pynose
were more helpful than others.
Eg. This was helpful: #33 (Clear points about licensing rules so that the problems could be described in detail, and fixed accordingly.)
This earlier one was not as helpful: #28 (Fewer details and the mention of preserving history, which as I mentioned earlier with the Microsoft example here: microsoft/playwright@9ba375c shows that preserving the Git commit history of the original repo is not necessary.)
Eventually, I sorted out necessary changes from non-necessary demands by using Microsoft's Playwright repo as a case study. Both pynose
and playwright
made similar decisions / mistakes, as posted earlier. (The mistakes that needed to be corrected have already been fixed.)
(Question) Was pynose
not giving credit to nose
?
(Answer) The ReadMe clearly stated at the top that "pynose
is an updated version of nose
, originally made by Jason Pellerin." Credit was definitely acknowledged and given. (But for some, the ReadMe didn't count because they only cared about the license.)
One of the three official maintainers of nose
spoke positively about pynose
fixing nose
and keeping it alive:
For reference, here are the three official nose
maintainers according to PyPI:
Let's get back to the "Questions":
(Question) Can a license be slightly modified from the original to include new maintainers for a forked / copied project?
(Answer) Yes, Microsoft added their name when they modified Google's code: microsoft/playwright@9ba375c#diff-0a2cb6528fb78d67f03776f9e443ba3b811ecb8cab767af904e48604197c922b
If that's legal, then I can also add my name when modifying code. (Context: mdmintz/tabcompleter#11, where someone was trying to tell me that I can't do that after returning the original license.)
(Question) Can I put a license for a specific file directly in the file itself, rather than including it in the main LICENSE file?
(Answer) Yes, Microsoft did it: microsoft/playwright@9ba375c#diff-647cd6d72ffd0e5a5e9ba4f459fb9d36bb7b9aa621723e0eb7b221e1d9bc67bcR2 - Copyright 2017 Google Inc., PhantomJS Authors All rights reserved.
in the file itself. - The main licenses did not include any mention of PhantomJS
. (Source: https://github.com/microsoft/playwright/blob/71a668eb863ca44e269f8353bfb055d7e0d4e583/LICENSE. It also wasn't in their ThirdPartyNotices.txt
file: https://github.com/microsoft/playwright/blob/71a668eb863ca44e269f8353bfb055d7e0d4e583/packages/playwright/ThirdPartyNotices.txt)
Someone came after one of my repos without knowing that putting specific licenses directly into files was OK:
The files were copied directly from their CDN links, which meant that the license would be there if it wasn't missing in the CDN. Here's an example of that:
Therefore, the license would only be missing there if the CDN link didn't include it. (Maybe a CDN issue if the license wasn't uploaded with the JS or CSS code from there.) The JS and CSS file copies would be from there, as well as any SeleniumBase Chrome extension zip files included directly in the repo. Here's another example of the license in the file: https://github.com/seleniumbase/resource-files/blob/main/js/hopscotch/hopscotch.min.js. I deleted a few of his invalid tickets for that (for him not realizing that the license can be included within the files themselves). Hence the reason you might not find the ticket I copied from the email notification I posted above. For fairness sake, I didn't delete other tickets of his when there were valid points, eg: mdmintz/tabcompleter#10. (He did complain later on social media that I deleted a few of his tickets.)
On the topic of SeleniumBase, although the https://github.com/mdmintz org falls under my responsibility, my https://github.com/seleniumbase org falls under the special protection of the Software Freedom Conservancy (due to being part of the Selenium umbrella of frameworks). This means that if anyone has a license issue or any legal issue with a repo in the SeleniumBase org, then they need to go through the Software Freedom Conservancy instead of going directly through me. For regular SeleniumBase issues (non-licensing stuff) you can go directly through me (opening a regular ticket). For any possible license issues that you may have with SeleniumBase, go directly to the Software Freedom Conservancy: https://sfconservancy.org/news/2011/feb/02/selenium-joins/ As written there: By joining the Conservancy, Selenium obtains the benefits of a formal non-profit organizational structure while keeping the project focused on software development and documentation. Some benefits of joining the Conservancy include the ability to collect donations, hold assets on behalf of the project, and some protection of the lead developers of the project from personal liability when engaging in the activities of the project.
So specifically for SeleniumBase
, they have my back.
So in summary, open source license rules can get very complicated: Even big corporations can make mistakes. If a big company does something incorrect with respect to licensing, it's easy for individual developers learning from those repos to make the same mistakes without realizing it. Sometimes, even the people coming to complain about a license issue may get some things wrong (Eg. Them thinking that history from a forked/copied repo needs to be preserved, which clearly isn't the case because this happened: microsoft/playwright@9ba375c, where Google's Puppeteer Git History was removed during the creation of Microsoft's Playwright repo.) Also, some people are more helpful than others in resolving things (by providing useful, actionable feedback). Then there are others out there who are just trying to mess with other people's reputations. The GitHub ecosystem should be a welcoming space for all developers.
For anyone skipping right to the end of this long message, all outstanding requests have been resolved, people are happy with the results, and pynose
will continue to be shipped with Linux distributions around the world.
And now people know me a bit better. In particular, they know I'm the guy who fixes unmaintained Python packages that businesses still depend on. Eg. pynose
, as well as others like pdbp
(not to be confused with pdb
or pdbpp
). And they know I'm the guy who does a lot with web automation (SeleniumBase
). With all the work I do, one would think that I don't get much of chance to go outside, but I did manage to attend ballroom dance class a few evenings this week, and I recently went to a Star Trek convention where I survived for a whole three days without opening my laptop (https://www.youtube.com/watch?v=BwHc4lIS5z8). There, I partied on the set of the original Enterprise with Jonathon Frakes, and I had a fun conversation with LeVar Burton.
OK, back to work, everyone! There's lots of Python code to write!