-
Notifications
You must be signed in to change notification settings - Fork 23.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move glu to Aten(CPU) #33179
Move glu to Aten(CPU) #33179
Conversation
💊 CircleCI build failures summary and remediationsAs of commit 049547b: Commit 049547b was recently pushed. Waiting for builds... This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 16 times. |
I also removed some dead codes in TH. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested some changes from Python-style code to C++-style code
Wow, impressive cleanup. I will run bigger tests to make sure there is no dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Sorry for the really slow response on #26687 -- this looks pretty good. Some thoughts: |
@VitalyFedyunin , I just change the code style according to @xuhdev's suggestion and move THNN doc to THCUNN. please re-landing this PR. Thanks! |
I didn't use ghstake, I will try it. Yes, ther has a overlap to #26687, for cuda part, perhaps we can port forward and backward code together. |
WOAH this kills THNN. Nice work!! |
just code rebased. |
@gchanan can you please instruct how to resolve merge conflicts with your other PR |
@gchanan , please tell me when your all PRs are merged. Thanks! |
5f244a0
to
6af02a8
Compare
Code rebased. |
@VitalyFedyunin, @ezyang , code was rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Overall looks good, I still have couple files to check and internal tests to run. But anyway almost 3k removed lines - this deserves the medal! |
There are few internal dependencies on THNN/generic/THNN.h I will clean them up and land this PR after. |
Btw, it is around 34k lines of TH code, you are killing little less than 10%! |
Yes, there still have many code to be killed, could you update the cpu ops state |
@VitalyFedyunin merged this pull request in b678256. |
Summary: This PR move glu to Aten(CPU). Test script: ``` import torch import torch.nn.functional as F import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" #warm up for n in [10, 100, 1000, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n // 2, device=device) for i in range(1000): output = F.glu(input) output.backward(grad_output) for n in [10, 100, 1000, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n // 2, device=device) for i in range(10000): t1 = _time() output = F.glu(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test device: **skx-8180.** Before: ``` input size(128, 10) forward time is 0.04 (ms); backwad avg time is 0.08 (ms). input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.14 (ms). input size(128, 1000) forward time is 0.11 (ms); backwad avg time is 0.31 (ms). input size(128, 10000) forward time is 1.52 (ms); backwad avg time is 2.04 (ms). ``` After: ``` input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.05 (ms). input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms). input size(128, 1000) forward time is 0.07 (ms); backwad avg time is 0.17 (ms). input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 1.03 (ms). ``` Fix pytorch#24707, pytorch#24708. Pull Request resolved: pytorch#33179 Differential Revision: D19839835 Pulled By: VitalyFedyunin fbshipit-source-id: e4d3438556a1068da2c4a7e573d6bbf8d2a6e2b9
This PR move glu to Aten(CPU).
Test script:
Test device: skx-8180.
Before:
After:
Fix #24707, #24708.