Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete numpy.delete #1010

Open
chillenzer opened this issue Sep 15, 2022 · 2 comments
Open

Delete numpy.delete #1010

chillenzer opened this issue Sep 15, 2022 · 2 comments

Comments

@chillenzer
Copy link

Hi everybody,

While reading through Episode 2, I stumbled across the use of numpy.delete. From my experience, I have not a single examples where the usage of numpy.delete is justified because it creates a new array instead of returning a view.

In the best case, the performance penalty does not matter and you never look at your code again and think "Hey, that element was deleted from the original array. So, it won't be there any more." Well, it is! You are bound to get surprised at least once by the fact that your original array didn't change at all and only if you are very lucky, that will not translate into a hard to track bug in your code where the numbers are slightly off all the time.

In the worst case, an inexperienced python user writes a for-loop to delete all the elements they don't want instead of using a mask or proper slicing to return a view. If you do that on any reasonably sized data, you will immediately abandon python for being almost as slow as doing it by hand or maybe even run into memory issues with multiple copies of large data in memory. And then again, they might forget to assign the created copy (as opposed to an in-place change) back to the original variable and all that.

So, I would be very interested in hearing about justifications I might have overlooked. But for the time being, IMHO the best that can be done is removing that part or, if someone insists on mentioning numpy.delete, clearly state its pitfalls instead and recommend not to use it unless you really, really know what you are doing.

Best,
Julian

@chillenzer
Copy link
Author

PS: I came up with one kind of reasonable scenario for using numpy.delete: Functional programming! From a purist's perspective, a function should have no side effects there and that is what numpy.delete achieves in returning a copy. If it is very (num)pythonic to use pure functional programming is to be decided by others.

@yueyangu
Copy link

Good points on numpy.delete! Here’s a concise comparison and examples to clarify:

numpy.delete Behavior
numpy.delete creates a new array, leaving the original unchanged. This can be confusing and lead to bugs if one assumes in-place deletion:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
new_arr = np.delete(arr, 2)
print("Original:", arr)  # [1, 2, 3, 4, 5]
print("New:", new_arr)   # [1, 2, 4, 5]
Preferred Alternatives
Masking and slicing avoid making copies and are more efficient:
mask = np.arange(len(arr)) != 2
masked_arr = arr[mask]  # [1, 2, 4, 5]

sliced_arr = np.concatenate((arr[:2], arr[3:]))  # [1, 2, 4, 5]

Performance Impact
numpy.delete in loops can be slow and memory-heavy. Using masks for large arrays is usually faster and more memory-efficient:

arr = np.random.rand(10**6)
filtered_arr = arr[arr >= 0.5]

I hope it helps.

Thanks,
Yueyan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants