Move glu to Aten(CPU) #33179

XiaobingSuper · 2020-02-11T09:24:07Z

This PR move glu to Aten(CPU).
Test script:

import torch
import torch.nn.functional as F
import time

torch.manual_seed(0)

def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"

#warm up
for n in [10, 100, 1000, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n // 2, device=device)
    for i in range(1000):
        output = F.glu(input)
        output.backward(grad_output)

for n in [10, 100, 1000, 10000]:
    fwd_t = 0
    bwd_t = 0
    input = torch.randn(128, n, requires_grad=True, device=device)
    grad_output = torch.ones(128, n // 2, device=device)
    for i in range(10000):
        t1 = _time()
        output = F.glu(input)
        t2 = _time()
        output.backward(grad_output)
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))

Test device: skx-8180.
Before:

input size(128, 10) forward time is 0.04 (ms); backwad avg time is 0.08 (ms).
input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.14 (ms).
input size(128, 1000) forward time is 0.11 (ms); backwad avg time is 0.31 (ms).
input size(128, 10000) forward time is 1.52 (ms); backwad avg time is 2.04 (ms).

After:

input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.05 (ms).
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms).
input size(128, 1000) forward time is 0.07 (ms); backwad avg time is 0.17 (ms).
input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 1.03 (ms).

Fix #24707, #24708.

dr-ci · 2020-02-11T09:28:06Z

💊 CircleCI build failures summary and remediations

As of commit 049547b:

Commit 049547b was recently pushed. Waiting for builds...

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 16 times.

XiaobingSuper · 2020-02-11T11:20:47Z

I also removed some dead codes in TH.

xuhdev

I suggested some changes from Python-style code to C++-style code

aten/src/ATen/native/cpu/Activation.cpp

aten/src/ATen/native/Activation.cpp

VitalyFedyunin · 2020-02-11T20:07:25Z

Wow, impressive cleanup. I will run bigger tests to make sure there is no dependencies.

facebook-github-bot

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

gchanan · 2020-02-11T20:18:09Z

Sorry for the really slow response on #26687 -- this looks pretty good.

Some thoughts:
do you use ghstack? It would be nice to be able to separate out the implementation cleanup from the code cleanup in a nicer way. In particular, we have this kind of unfortunate overlap where #26687 moves CUDA forward and this moves CPU backward, and it would be nice to combine those in a way that is separate from the larger cleanup stuff here.

aten/src/THNN/README.md

aten/src/THNN/doc/api_reference.md

aten/src/THNN/doc/style_guidelines.md

XiaobingSuper · 2020-02-12T01:39:06Z

@VitalyFedyunin , I just change the code style according to @xuhdev's suggestion and move THNN doc to THCUNN. please re-landing this PR. Thanks!

XiaobingSuper · 2020-02-12T01:43:49Z

Sorry for the really slow response on #26687 -- this looks pretty good.

Some thoughts:
do you use ghstack? It would be nice to be able to separate out the implementation cleanup from the code cleanup in a nicer way. In particular, we have this kind of unfortunate overlap where #26687 moves CUDA forward and this moves CPU backward, and it would be nice to combine those in a way that is separate from the larger cleanup stuff here.

I didn't use ghstake, I will try it. Yes, ther has a overlap to #26687, for cuda part, perhaps we can port forward and backward code together.

ezyang · 2020-02-12T22:09:11Z

WOAH this kills THNN. Nice work!!

XiaobingSuper · 2020-02-13T00:42:23Z

just code rebased.

ezyang · 2020-02-14T15:19:06Z

@gchanan can you please instruct how to resolve merge conflicts with your other PR

XiaobingSuper · 2020-02-17T09:56:34Z

@gchanan , please tell me when your all PRs are merged. Thanks!

XiaobingSuper · 2020-02-23T09:55:25Z

Code rebased.

XiaobingSuper · 2020-02-26T00:56:08Z

@VitalyFedyunin

XiaobingSuper · 2020-02-27T01:01:40Z

@VitalyFedyunin, @ezyang , code was rebased.

facebook-github-bot

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

VitalyFedyunin · 2020-02-27T01:24:59Z

Overall looks good, I still have couple files to check and internal tests to run. But anyway almost 3k removed lines - this deserves the medal!

VitalyFedyunin · 2020-02-27T02:18:32Z

There are few internal dependencies on THNN/generic/THNN.h I will clean them up and land this PR after.

VitalyFedyunin · 2020-02-27T20:27:01Z

Btw, it is around 34k lines of TH code, you are killing little less than 10%!

XiaobingSuper · 2020-02-28T05:44:12Z

Btw, it is around 34k lines of TH code, you are killing little less than 10%!

Yes, there still have many code to be killed, could you update the cpu ops state
in #24507, I will check which one I can do. Thanks!

facebook-github-bot · 2020-02-28T23:16:56Z

@VitalyFedyunin merged this pull request in b678256.

Summary: This PR move glu to Aten(CPU). Test script: ``` import torch import torch.nn.functional as F import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" #warm up for n in [10, 100, 1000, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n // 2, device=device) for i in range(1000): output = F.glu(input) output.backward(grad_output) for n in [10, 100, 1000, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n // 2, device=device) for i in range(10000): t1 = _time() output = F.glu(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test device: **skx-8180.** Before: ``` input size(128, 10) forward time is 0.04 (ms); backwad avg time is 0.08 (ms). input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.14 (ms). input size(128, 1000) forward time is 0.11 (ms); backwad avg time is 0.31 (ms). input size(128, 10000) forward time is 1.52 (ms); backwad avg time is 2.04 (ms). ``` After: ``` input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.05 (ms). input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms). input size(128, 1000) forward time is 0.07 (ms); backwad avg time is 0.17 (ms). input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 1.03 (ms). ``` Fix pytorch#24707, pytorch#24708. Pull Request resolved: pytorch#33179 Differential Revision: D19839835 Pulled By: VitalyFedyunin fbshipit-source-id: e4d3438556a1068da2c4a7e573d6bbf8d2a6e2b9

XiaobingSuper requested review from ebetica, goldsborough and yf225 as code owners February 11, 2020 09:24

XiaobingSuper requested review from VitalyFedyunin and ezyang February 11, 2020 09:24

XiaobingSuper force-pushed the glu branch from bdfbe5a to cce9ead Compare February 11, 2020 09:26

pytorchbot added the open source label Feb 11, 2020

xuhdev reviewed Feb 11, 2020

View reviewed changes

gchanan added the module: porting Issues related to porting TH/THNN legacy to ATen native label Feb 11, 2020

facebook-github-bot reviewed Feb 11, 2020

View reviewed changes

gchanan reviewed Feb 11, 2020

View reviewed changes

aten/src/THNN/README.md Show resolved Hide resolved

gchanan reviewed Feb 11, 2020

View reviewed changes

aten/src/THNN/doc/api_reference.md Show resolved Hide resolved

gchanan reviewed Feb 11, 2020

View reviewed changes

aten/src/THNN/doc/style_guidelines.md Outdated Show resolved Hide resolved

ezyang removed their request for review February 12, 2020 01:30

zhangguanheng66 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 12, 2020

XiaobingSuper force-pushed the glu branch from 06887ef to f9a2598 Compare February 13, 2020 00:41

ezyang approved these changes Feb 14, 2020

View reviewed changes

XiaobingSuper force-pushed the glu branch 2 times, most recently from 5f244a0 to 6af02a8 Compare February 23, 2020 09:50

XiaobingSuper requested a review from ezyang February 23, 2020 09:54

XiaobingSuper added 4 commits February 27, 2020 08:52

Move glu to Aten(CPU)

1045324

remove cadd, cdiv, cmul codes in TH

7b68c6e

change code style

ac23871

move THNN doc to THCUNN

049547b

XiaobingSuper force-pushed the glu branch from 6af02a8 to 049547b Compare February 27, 2020 00:59

facebook-github-bot reviewed Feb 27, 2020

View reviewed changes

VitalyFedyunin approved these changes Feb 27, 2020

View reviewed changes

facebook-github-bot closed this in b678256 Feb 28, 2020

facebook-github-bot added the merged label Feb 28, 2020

This was referenced Jun 5, 2020

[WIP] Migrate glu & glu_backward from the TH to Aten (CUDA) #39586

Closed

Migrate glu from the TH to Aten (CUDA) #39697

Closed

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move glu to Aten(CPU) #33179

Move glu to Aten(CPU) #33179

XiaobingSuper commented Feb 11, 2020

dr-ci bot commented Feb 11, 2020 •

edited

Loading

XiaobingSuper commented Feb 11, 2020

xuhdev left a comment

VitalyFedyunin commented Feb 11, 2020

facebook-github-bot left a comment

gchanan commented Feb 11, 2020

XiaobingSuper commented Feb 12, 2020

XiaobingSuper commented Feb 12, 2020

ezyang commented Feb 12, 2020

XiaobingSuper commented Feb 13, 2020

ezyang commented Feb 14, 2020

XiaobingSuper commented Feb 17, 2020

XiaobingSuper commented Feb 23, 2020

XiaobingSuper commented Feb 26, 2020

XiaobingSuper commented Feb 27, 2020

facebook-github-bot left a comment

VitalyFedyunin commented Feb 27, 2020

VitalyFedyunin commented Feb 27, 2020

VitalyFedyunin commented Feb 27, 2020

XiaobingSuper commented Feb 28, 2020

facebook-github-bot commented Feb 28, 2020

Move glu to Aten(CPU) #33179

Move glu to Aten(CPU) #33179

Conversation

XiaobingSuper commented Feb 11, 2020

dr-ci bot commented Feb 11, 2020 • edited Loading

💊 CircleCI build failures summary and remediations

XiaobingSuper commented Feb 11, 2020

xuhdev left a comment

Choose a reason for hiding this comment

VitalyFedyunin commented Feb 11, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

gchanan commented Feb 11, 2020

XiaobingSuper commented Feb 12, 2020

XiaobingSuper commented Feb 12, 2020

ezyang commented Feb 12, 2020

XiaobingSuper commented Feb 13, 2020

ezyang commented Feb 14, 2020

XiaobingSuper commented Feb 17, 2020

XiaobingSuper commented Feb 23, 2020

XiaobingSuper commented Feb 26, 2020

XiaobingSuper commented Feb 27, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

VitalyFedyunin commented Feb 27, 2020

VitalyFedyunin commented Feb 27, 2020

VitalyFedyunin commented Feb 27, 2020

XiaobingSuper commented Feb 28, 2020

facebook-github-bot commented Feb 28, 2020

dr-ci bot commented Feb 11, 2020 •

edited

Loading