Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Assert as input validation is wrong and needs to be changed #1084

Open
dhvalden opened this issue Jul 22, 2024 · 7 comments · May be fixed by #1085
Open

Using Assert as input validation is wrong and needs to be changed #1084

dhvalden opened this issue Jul 22, 2024 · 7 comments · May be fixed by #1085
Labels
status:need more info More information needed type:discussion Discussion or feedback about the lesson type:enhancement Propose enhancement to the lesson

Comments

@dhvalden
Copy link

What is the problem?

The problem has been reported several times already, so I'm late to the party (here fore example #979).

To recap, assert keyword is not to be used as proposed in the example of the episode of Defensive Programming. The reason has been pointed out several times with various degrees of detail, but it boils down to that is basically a debug and testing phase tool, that can be bypassed at runtime by the interpreter. Python documentation states that basically evaluates to:

if __debug__:
    if not expression: raise AssertionError

The episode of Defensive Programming focuses mainly on input validation. Input validation needs to stop the program every time an incorrect input is passed. if raise blocks are the standard and canonical way of doing this. For an example see this function in the scipy repository:

https://github.com/scipy/scipy/blob/v1.14.0/scipy/optimize/_minimize.py#L53-L775

I do not propose to also add the idea of exceptions neither try except blocks, since a novice user will not need this, I simple propose to replace assert blocks with if raise blocks. I will provide a pull request of this in the coming days.

Location of problem (optional)

https://github.com/swcarpentry/python-novice-inflammation/blob/main/episodes/10-defensive.md

@AlbertoImg AlbertoImg added type:discussion Discussion or feedback about the lesson type:enhancement Propose enhancement to the lesson labels Jul 22, 2024
@AlbertoImg
Copy link
Contributor

AlbertoImg commented Jul 22, 2024

Hi @dhvalden,
Thanks for your suggestions. It is a valid point, but since it is involving a big part of the episode I will call the curriculum-advisors for reviewing the suggestions as well. @swcarpentry/curriculum-advisors, @swcarpentry/governance-committee do you have an option on that?

@dhvalden
Copy link
Author

Dear @AlbertoImg,

Of course! No worries. But just to reiterate. The use of assert to sanitise inputs is a bad practice. That is the job of if... raise Error blocks, and we should not be promoting bad practices in the courses.

@dhvalden dhvalden linked a pull request Jul 31, 2024 that will close this issue
@dhvalden
Copy link
Author

Dear @AlbertoImg, I just created the promised pull request.

@AlbertoImg
Copy link
Contributor

Hi @dhvalden, thanks for the pull request! we will look into it with the curriculum advisors.

@martinosorb
Copy link
Contributor

martinosorb commented Aug 1, 2024

Answering as a member of the CAC, but not representing the rest of the CAC.

I can agree that one should not use assert as input validation (although not all examples of assertions in the code of this episode are input validation). However these changes are much, much wider than that, and I think they would all need to be discussed. Just a few examples:

  • Do we really want to silently remove test-driven development? I think it's a nice concept to mention
  • Do we really want to teach try: / except:? In my experience, beginners abuse it, making messy code. In ten years programming in Python, I can't think of a single case where I really needed a try statement. To be honest, I have a personal beef against try and tend to think that the code is not very well written as soon as I see one.
  • Do we really not want to teach assert at all? Asserting things is good practice to catch bugs. This episode is called 'defensive programming'. Error handling is not what I call defensive programming. It's also important, but it's a different thing.

In summary: one thing is removing the use of assert as input validation, one thing is completely rewriting the episode.

@dhvalden
Copy link
Author

dhvalden commented Aug 1, 2024

Hey! @martinosorb

To comment on your points:

  1. Why would we want to teach tests if we haven't learned how to raise errors correctly first? It is a nice concept to mention, yes, after people know how to raise errors and learn about exceptions. AssertionError is a special class of exception. It follows naturally after general errors.

  2. I don't know how I can address this comment if you have a personal beef with try but this is the pythonic way of handling errors. It is particularly common and useful in API calls, where the payload can change unexpectedly, APIs are from where many researchers get their data. It's also common when handling user inputs, if for example you want to try to transform an invalid user inputs (e.g. coerce a string to numbers, a list to array, etc). Finally, if beginners will abuse something, better abuse the pythonic way of handling errors, than the incorrect way of input validation, right?

  3. It's alright, we/you can teach assertions later and change the episode name. But how would you teach assertions, that use heavily the exception AssertionError, without teaching what an exception is and how to raise errors first?

Finally, correct input validation and correct error handling is a much more "beginner need" than test driven development. The former will be useful all the time in simple scripts, whereas the latter becomes important in a larger interconnected project, like a library or app. Even the Official Python Tutorial thinks this way, with Errors and Exceptions handling being the 8th lesson https://docs.python.org/3/tutorial/errors.html whereas assert is not mentioned and unit test is only briefly mentioned as part of the standard library in chapter 10.

@AlbertoImg AlbertoImg added the status:need more info More information needed label Aug 4, 2024
@chillenzer
Copy link

I can see both side's points here but I'm also leaning more towards @dhvalden and want to add a few aspects to the discussion.

  1. I personally like test-driven development but I think that concept is much richer than "test first". When teaching that section for the first time, I was really disappointed that most of the things that are important to me in TDD are not mentioned at all. So, while I prefer it to get mentioned at all, I would prefer the current version of the lecture to steer away from the term "TDD" and use something more humble like "test first". I think it might even be dangerous because if new learners afterwards attempt to use TDD, their knowledge about it might make them ineffective and lead to them abandoning that term without ever fully understanding (let alone using) it. On the other hand, if input validation should drop out due to this discussion, it would be a good opportunity to actually elaborate on TDD and provide some actual guidance how to effectively use it.

  2. I strongly agree that try... except... is a very pythonic construct. As opposed to other languages, it is not associated with a design flaw or a performance penalty but simply a "Ask for forgiveness not permission" attitude. There are a few best practices around that but it is a much shallower concept than good testing, let alone TDD.

  3. Personally, I never use assert outside of tests, so I was quite surprised to see it in the lectures at all. For any form of errors, including input validation, there exist more tailored exceptions and it is much more expressive if you simply derive your own exceptions for more specialised cases. Also, while try... except... in general is very pythonic, I'd personally frown if I'd see a except AssertionError for precisely the reason that it might not get thrown at all.

To summarise, that section always felt a bit off in my teaching, too. Both concepts are important but the current lecture doesn't do either of them a favour with using bad practice for input validation and leaving a lot of room for misinterpretation concerning TDD. I think focusing on one would benefit the learner. Which one, is for a bigger audience to decide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:need more info More information needed type:discussion Discussion or feedback about the lesson type:enhancement Propose enhancement to the lesson
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants