-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iterate over scoped uops once [run_process_replay] #5255
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Changes
|
vpachkov
pushed a commit
to vpachkov/tinygrad
that referenced
this pull request
Jul 8, 2024
ignaciosica
added a commit
to ignaciosica/tinygrad
that referenced
this pull request
Jul 9, 2024
commit 51882c8efcd033fb13b3c50f1a80d4be4672a99a Author: p4sscode <p4ssenger.developer@gmail.com> Date: Tue Jul 9 14:53:41 2024 -0300 Squashed commit of the following: commit 1678199 Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Tue Jul 9 20:44:44 2024 +0300 add update_copy to hcq spec (tinygrad#5348) * add update_copy to hcq spec * fix amd commit 9504db1 Author: chenyu <chenyu@fastmail.com> Date: Tue Jul 9 12:28:52 2024 -0400 remove the realize in _rebuild_tensor_v2 (tinygrad#5347) no longer needed commit 1f5de80 Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Tue Jul 9 17:20:17 2024 +0300 multi reduce Tensor.var passing verify_lazyop (tinygrad#5346) * what about this * reset late gate commit 3d45219 Author: kormann <49917710+DKormann@users.noreply.github.com> Date: Tue Jul 9 15:38:39 2024 +0200 [bug fix] nested commutative pattern _match [run_process_replay] [no_assert] (tinygrad#5340) * deep pat test * lint * min diff * min lines * nothing * is res extra * cleanup2 * add res back * reduce lines * type anno --------- Co-authored-by: qazal <qazal.software@gmail.com> commit e815c57 Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Tue Jul 9 15:56:06 2024 +0300 use hcq_profile in nv/amd program (tinygrad#5344) commit bee96a1 Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Tue Jul 9 15:24:56 2024 +0300 fuzz uop schedules (tinygrad#5345) * basic blocks + cleanups * fixups * elif is better for future me * fuzz_schedule_max_paths * fix linter commit d5a68ae Author: Ian Paul <46036106+ianpaul10@users.noreply.github.com> Date: Tue Jul 9 10:48:42 2024 +0000 Simple abstractions3.py fix (tinygrad#5343) * abstractions3.py fix * Add abstractions3.py to CI tests commit a2a9bfd Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Tue Jul 9 10:39:39 2024 +0300 nv correct error messages with ptx (tinygrad#5341) * nv correct error messages with ptx * return compile error commit c13da83 Author: George Hotz <72895+geohot@users.noreply.github.com> Date: Mon Jul 8 21:23:19 2024 -0700 tests from lowerer branch (tinygrad#5339) * tests from lowerer branch * Update test_image_dtype.py * Update test_image_dtype.py * Update test_image_dtype.py commit 4ceab5d Author: chenyu <chenyu@fastmail.com> Date: Mon Jul 8 22:25:03 2024 -0400 fix PTX match rule for gated LOAD (tinygrad#5338) * test padto sum with bool tensor and bool acc dtype make sure bool tensor acc with gate is handled correctly * broken in PTX * fix ptx commit a80f2df Author: chenyu <chenyu@fastmail.com> Date: Mon Jul 8 21:33:05 2024 -0400 fix some PTX tests (tinygrad#5337) fix broken PTX tests in test_linearizer and test_uops. there are tests that were skipped and broken because it runs only with CUDA=1 and we run PTX with NV=1 now commit 9150a6b Author: wozeparrot <wozeparrot@gmail.com> Date: Tue Jul 9 00:45:40 2024 +0000 tensor metadata (tinygrad#5271) commit 7f642aa Author: chenyu <chenyu@fastmail.com> Date: Mon Jul 8 19:19:20 2024 -0400 minor PTX matcher cleanup [run_process_replay] (tinygrad#5336) * minor PTX matcher cleanup [run_process_replay] uop.cast syntatic sugar and some newline/space cleanup * comment commit 0f09402 Author: chenyu <chenyu@fastmail.com> Date: Mon Jul 8 18:15:04 2024 -0400 fix Tensor.all and Tensor.any for PTX (tinygrad#5335) supported boolean acc and boolean phi. and rewrite boolean max to uint8 max commit 053c706 Author: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com> Date: Mon Jul 8 20:47:34 2024 +0200 refactor: expr_view on View (tinygrad#5315) commit 2349d83 Author: kormann <49917710+DKormann@users.noreply.github.com> Date: Mon Jul 8 20:46:15 2024 +0200 Fix scope order in graph toposort [run_process_replay] (tinygrad#5330) * fix * test * nothing commit 631bc97 Author: chenyu <chenyu@fastmail.com> Date: Mon Jul 8 14:00:28 2024 -0400 raise line count limit to 8500 (tinygrad#5331) commit bb77469 Author: Timmy <96938750+0xtimmy@users.noreply.github.com> Date: Mon Jul 8 10:28:55 2024 -0700 multireduce scheduler tests (tinygrad#5141) Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> commit bb2222e Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Mon Jul 8 19:01:27 2024 +0300 nv default for ampere & ada (tinygrad#5329) commit 51d6f37 Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Mon Jul 8 18:25:05 2024 +0300 nv get classes based on device (tinygrad#5325) * nv get classes * support in mockgpu * choose sm based on gpu * fix * fix * fix arch commit 7d049fc Author: chenyu <chenyu@fastmail.com> Date: Mon Jul 8 10:51:56 2024 -0400 move getting 0 and min value of a dtype to dtype.py (tinygrad#5328) cleanup getting base case for reduce ops [run_process_replay] commit b0c5c58 Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Mon Jul 8 17:24:33 2024 +0300 nv rm_control to rmctrl type (tinygrad#5327) * nv rm_control to rmctrl type * fix commit 73bddc4 Author: Elias Wahl <82230675+Eliulm@users.noreply.github.com> Date: Mon Jul 8 15:07:44 2024 +0200 Fix fake dataloader (tinygrad#5326) commit 43ec8d7ed8dfd420af72875730232a5b2faf3d6b Author: p4sscode <p4ssenger.developer@gmail.com> Date: Tue Jul 9 14:44:59 2024 -0300 remove comments commit d1874a53b51859f5d74a7919773c99a535e89c44 Author: p4sscode <p4ssenger.developer@gmail.com> Date: Mon Jul 8 14:13:29 2024 -0300 remove unnecessary code commit ef5497f141c2f5b48d89707cc77b1b6b02587105 Author: p4sscode <p4ssenger.developer@gmail.com> Date: Mon Jul 8 13:27:50 2024 -0300 remove simple_relu and env commit 61be81b3fa1d57ea612a97c72e53430f369bf59f Author: p4sscode <p4ssenger.developer@gmail.com> Date: Mon Jul 8 13:26:37 2024 -0300 working with accs commit 8803069c0459a276d373d4c0c0feeb0b3b057761 Author: p4sscode <p4ssenger.developer@gmail.com> Date: Mon Jul 8 11:02:05 2024 -0300 make amx an OptOps commit 49119c2b62d89eaa0a2825714b44b549f7145edb Merge: 97faa1c3 6856f91 Author: p4sscode <p4ssenger.developer@gmail.com> Date: Sun Jul 7 15:55:03 2024 -0300 merge master commit 6856f91 Author: chenyu <chenyu@fastmail.com> Date: Sun Jul 7 14:36:00 2024 -0400 Tensor.any and Tensor.all (tinygrad#5320) does not work in ptx yet due to how boolean tensor is handled commit 2029cb7 Author: chenyu <chenyu@fastmail.com> Date: Sun Jul 7 13:04:22 2024 -0400 support passing None to Tensor.clip (tinygrad#5319) passing None for no upper bound or no lower bound commit 296a1a3 Author: chenyu <chenyu@fastmail.com> Date: Sun Jul 7 12:10:39 2024 -0400 update Tensor.round doc and example (tinygrad#5318) document rounding half to even and update examples to show commit c1e330f Author: chenyu <chenyu@fastmail.com> Date: Sun Jul 7 11:52:58 2024 -0400 Tensor.int and Tensor.bool (tinygrad#5317) commit 778d1cd Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Sun Jul 7 17:34:49 2024 +0300 nv allocate local memory dynamically (tinygrad#5277) * nv allocate local memory dynamically * fix * linter * linter 2 * linter * fixes commit ae10e93 Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Sun Jul 7 10:49:08 2024 +0300 UOps.VECTORIZE cleanups [run_process_replay] (tinygrad#5314) * still render_cast * one extra line ok * these are all just vectorize * save space * behavior change can go in a different diff commit 77b2ce9 Author: greg-niemeyer <152219575+greg-niemeyer@users.noreply.github.com> Date: Sat Jul 6 23:59:57 2024 -0700 Add UOps.VECTORIZE [run_process_replay] (tinygrad#5289) * Add UOps.VECTORIZE to core * Update vectorized cast tests * Addresses code review comments - Removes VECTORIZE from LLVMRenderer - Add line breaks to unduly long lines - Add noop CAST rule back - Update asserts and add render_vectorize in CSytleLanguage renderer * Add missing const folding rule for VECTORIZE Also adds corresponding test * Fixes test_const_vectorize_fold and add assert - Use sane types with VECTORIZE in test_const_vectorize_fold - Add assert that sanity checks the types for VECTORIZE * Rename test_cast_vectorized_fold Renames test_cast_vectorized_fold to test_noop_vectorize_fold because the test targets a very specific rule and there are other tests for VECTORIZE. * Revert unrelated changes --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: qazal <qazal.software@gmail.com> commit 2a7282c Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Sun Jul 7 09:12:49 2024 +0300 test: delete the extra cast in cstyle load [run_process_replay] [no_assert] (tinygrad#5310) * test: delete the extra cast in cstyle load [run_process_replay] [no_assert] * assert buf_uop * ImageDType * ptx is actually a 64bit address commit cededd8 Author: chenyu <chenyu@fastmail.com> Date: Sat Jul 6 21:55:59 2024 -0400 minor multi cleanup (tinygrad#5311) add type, move around and some newlines commit 8a99514 Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Sun Jul 7 00:06:30 2024 +0300 generalize the uops toposort spec to ptx (tinygrad#5309) * generalize spec to ptx * redundant assert * extra print commit ca0ef17 Author: chenyu <chenyu@fastmail.com> Date: Sat Jul 6 12:47:27 2024 -0400 use precise::sin in metal (tinygrad#5307) commit 5c2ca7b Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Sat Jul 6 19:28:47 2024 +0300 remove UOps.SPECIAL rendering from llvm (tinygrad#5306) commit 356e5d2 Author: chenyu <chenyu@fastmail.com> Date: Sat Jul 6 11:54:12 2024 -0400 touchup multi dtype in elementwise (tinygrad#5305) only need to check real once, also added type annotation commit 7ddda9f Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Sat Jul 6 14:13:58 2024 +0300 hotfix: cache seen graphs in fusion (tinygrad#5302) commit 11dfb19 Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Sat Jul 6 12:39:31 2024 +0300 track seen graphs in recursive group (tinygrad#5301) * track seen * maybe never add realized * ahh it needs to track sts * delete extra check * cache typings * minor cleanup commit d813617 Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Sat Jul 6 12:04:03 2024 +0300 prescheduling refactor (tinygrad#5300) * p1 * refactor tuple commit c1e166c Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Sat Jul 6 11:36:40 2024 +0300 fix dtype mismatch for bool ops in multi (tinygrad#5299) commit fc03fc0 Author: chenyu <chenyu@fastmail.com> Date: Fri Jul 5 14:52:09 2024 -0400 enable sin on METAL in test_dtype_alu (tinygrad#5298) commit b369e75 Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Fri Jul 5 21:14:38 2024 +0300 refactor schedule creation (tinygrad#5297) commit 5292d37 Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Fri Jul 5 19:43:50 2024 +0300 LoadOps.VIEW in the scheduler spec (tinygrad#5296) * refactor to allow_buffer_view * tests * fix multi commit 1ab7a4c Author: hikettei <88639579+hikettei@users.noreply.github.com> Date: Sat Jul 6 01:16:44 2024 +0900 Handling Multiple UnaryOps.BITCAST in Function for Proper Kernel Fusion [run_process_replay] (tinygrad#5172) * [Patch] added an option not to ignore view replacing when doing bitcast * added the testcase * [Add] reproduced bitcast cannot be fused into a single kernel in the unittest --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> commit 43c3f73 Author: chenyu <chenyu@fastmail.com> Date: Fri Jul 5 11:01:20 2024 -0400 handcode_bert_opt.py (tinygrad#5295) similar to handcode_resnet50_opt.py, one file to check bert kernels without dataset. commit d7835a7 Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Fri Jul 5 16:53:40 2024 +0300 hotfix: fix metal with vars (tinygrad#5294) * hotfix: fix metal with vars * one more place commit 8a548b0 Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Fri Jul 5 16:13:05 2024 +0300 metal support offset (tinygrad#5293) commit 1cefbb3 Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Fri Jul 5 13:00:01 2024 +0300 uop graph tests + type_verify cleanup (tinygrad#5292) * test_cast_alu_fold * test_double_cast_fold + these should assert commit 341c4a2 Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Fri Jul 5 11:29:35 2024 +0300 hotfix: use dtype.scalar() for rendering cast [run_process_replay] [no_assert] (tinygrad#5290) commit 87d27c4 Author: chenyu <chenyu@fastmail.com> Date: Thu Jul 4 14:25:24 2024 -0400 minor _broadcast cleanup (tinygrad#5286) `any(x==0 for x in y)` is `0 in y`. also `get_args(ConstType)` instead of hard coded `float, int, bool` commit 8c03816 Author: SnakeOnex <romakcz@gmail.com> Date: Thu Jul 4 17:15:07 2024 +0200 fix README example (tinygrad#5284) * fixed README example * README test * changed py -> python markdown code flags in REAME commit 2778b60 Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Thu Jul 4 18:06:04 2024 +0300 new memory scheduler (tinygrad#5278) * new memory schedule algo * works * fix * fix * linter * tiny fixes * do not optimize copy buffers * mpre comments * tiny cleanups commit 84b3e3b Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Thu Jul 4 13:29:21 2024 +0300 hcq exec no embedded signal (tinygrad#5142) commit 0c3a35e Author: Tobias Fischer <tobiasfischer17@gmail.com> Date: Wed Jul 3 22:47:10 2024 -0400 Stable Diffusion v2 Inference (tinygrad#5283) * model implementation * clip fix, more qol options commit e5ba385 Author: chenyu <chenyu@fastmail.com> Date: Wed Jul 3 19:42:56 2024 -0400 remove first contiguous in multi from_sharded (tinygrad#5121) second contiguous guarantees lbs are contiguous going into MultiLazyBuffer, don't need the first contiguous commit f1ff65e Author: chenyu <chenyu@fastmail.com> Date: Wed Jul 3 17:52:50 2024 -0400 remove "no-nans-fp-math"="true" for LLVM (tinygrad#5282) fixed isnan for llvm (still have issue with < nan) commit 97faa1c307ee9322a88de83cf6c2d2ae8918db56 Author: p4sscode <p4ssenger.developer@gmail.com> Date: Wed Jul 3 16:01:58 2024 -0300 minor changes commit 3929a9d Author: chenyu <chenyu@fastmail.com> Date: Wed Jul 3 14:59:05 2024 -0400 fix UOp.cmp_tuple for ALU (tinygrad#5280) * fix UOp.cmp_tuple for ALU for ALU, use self.arg instead of self.op to compare * skip that? commit a9d6a6c Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Wed Jul 3 20:15:42 2024 +0300 verify_lazyop with multi reduce (tinygrad#5276) * outsource the assert to the implicit movement op check * tests commit 16e3b8b Author: George Hotz <72895+geohot@users.noreply.github.com> Date: Wed Jul 3 09:40:00 2024 -0700 uops work from lowerer [run_process_replay] (tinygrad#5279) commit 622b7bd Author: chenyu <chenyu@fastmail.com> Date: Wed Jul 3 12:28:53 2024 -0400 simpler TinyJit inside TinyJit detection (tinygrad#5219) * simpler TinyJit inside TinyJit detection suggested in tinygrad@73395b9#commitcomment-143660402 * cannot repro... * clear the way out * finally clear commit 04ef0fd Author: gip <gip@users.noreply.github.com> Date: Wed Jul 3 09:07:09 2024 -0700 fix: message when applegpu tools missiong (tinygrad#5236) commit d3e244d Author: reddyn12 <72528507+reddyn12@users.noreply.github.com> Date: Wed Jul 3 12:06:01 2024 -0400 prev speed improvements (tinygrad#5252) Co-authored-by: reddyn <nikidsniper@gmail.com> commit 8c90b4d4b0f4d0ba97d3e9ee12d3253523e69bca Author: p4sscode <p4ssenger.developer@gmail.com> Date: Wed Jul 3 11:25:30 2024 -0300 vec is done before store commit 21d41f0 Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Wed Jul 3 11:34:10 2024 +0300 nv follows HCQCompatAllocRes protocol (tinygrad#5275) * nv follows HCQCompatAllocRes protocol * fix amd commit d3e4e21 Author: Vyacheslav Pachkov <slava.pach@gmail.com> Date: Wed Jul 3 10:25:44 2024 +0300 add return type for HCQCompatAllocator _alloc (tinygrad#5267) Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> commit 191463a Author: chenyu <chenyu@fastmail.com> Date: Tue Jul 2 23:29:54 2024 -0400 add timing to SDXL (tinygrad#5273) commit b2c3a28 Author: chenyu <chenyu@fastmail.com> Date: Tue Jul 2 21:39:01 2024 -0400 nn.RMSNorm (tinygrad#5272) the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize commit 9a2a82a Author: chenyu <chenyu@fastmail.com> Date: Tue Jul 2 21:37:52 2024 -0400 test stable diffusion unet in ci (tinygrad#5268) unet is parameterized now so can test a smaller one is ci commit ce52b10 Author: chenyu <chenyu@fastmail.com> Date: Tue Jul 2 20:01:11 2024 -0400 add a flag DISABLE_LOOP_COLLAPSE (tinygrad#5270) workaround if user encountered UNMUL error commit e53b164 Author: George Hotz <72895+geohot@users.noreply.github.com> Date: Tue Jul 2 15:03:54 2024 -0700 small changes from lowerer (tinygrad#5266) commit 7be776f Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Tue Jul 2 23:35:39 2024 +0300 add _alloc_signal/_free_signal to hcq (tinygrad#5264) * add _alloc_signal/_free_signal api * oops, revert this * linter commit 9a25ee0 Author: Tobias Fischer <tobiasfischer17@gmail.com> Date: Tue Jul 2 12:40:27 2024 -0400 pixed unet call params (tinygrad#5262) commit b4cb32bac18b7ca2426e243c40fc90fe90b014ad Author: p4sscode <p4ssenger.developer@gmail.com> Date: Tue Jul 2 10:41:20 2024 -0300 squash merge amx_support for third revision commit 59bc837 Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Tue Jul 2 15:13:10 2024 +0300 refactor gated load rendering [run_process_replay] (tinygrad#5259) * refactor gated load rendering [run_process_replay] * hotfix: extra line * remove llvm diff commit e050603 Author: nimlgen <138685161+nimlgen@users.noreply.github.com> Date: Tue Jul 2 13:57:46 2024 +0300 nv close fds after mapping (tinygrad#5246) commit d3cfb6c Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Tue Jul 2 13:48:47 2024 +0300 refactor UOps.LOAD barrier [run_process_replay] (tinygrad#5258) commit a1044e6 Author: qazal <77887910+Qazalin@users.noreply.github.com> Date: Tue Jul 2 09:21:09 2024 +0300 iterate over scoped uops once [run_process_replay] (tinygrad#5255) commit dfbee4f Author: wozeparrot <wozeparrot@gmail.com> Date: Tue Jul 2 02:33:58 2024 +0000 feat: add blobfile to testing (tinygrad#5254) commit 8c9c1cf Author: Tobias Fischer <tobiasfischer17@gmail.com> Date: Mon Jul 1 22:33:01 2024 -0400 Pulled CLIP and UNet into Seperate Files (tinygrad#5253) * pulled clip and unet into seperate files * reference cleanup, lru cache fix * better pool indexing commit 5808c37 Author: chenyu <chenyu@fastmail.com> Date: Mon Jul 1 15:00:47 2024 -0400 hotfix disable flaky llama3 beam benchmark on green (tinygrad#5249) commit b9122ec Author: chenyu <chenyu@fastmail.com> Date: Mon Jul 1 14:43:47 2024 -0400 revert stable diffusion validation with threefry (tinygrad#5248) * Revert "use threefry in stable diffusion benchmark (tinygrad#4988)" This reverts commit 44dfa37. * sdxl and validation fix * relax threshold
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This diff is a small redundancy removal, turns out to be faster too: