Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #2642: Fix all Data module Tests and add them to run on Github Actions CI check #2916

Merged

Conversation

prayutsu
Copy link
Contributor

@prayutsu prayutsu commented Mar 15, 2021

Explanation

Fixes #2642:
All the flaky data module tests are fixed and added to run on CI.

Checklist

  • The PR title starts with "Fix #bugnum: ", followed by a short, clear summary of the changes. (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".)
  • The PR explanation includes the words "Fixes #bugnum: ..." (or "Fixes part of #bugnum" if the PR only partially fixes an issue).
  • The PR follows the style guide.
  • The PR does not contain any unnecessary auto-generated code from Android Studio.
  • The PR is made from a branch that's not called "develop".
  • The PR is made from a branch that is up-to-date with "develop".
  • The PR's branch is based on "develop" and not on any other branch.
  • The PR is assigned to an appropriate reviewer in both the Assignees and the Reviewers sections.
Screenshots of all passing tests (Each test file is run 100 times)

QuestionPlayerServiceTest
Screenshot from 2021-04-01 10-19-44

FeedbackReportingServiceTest
Screenshot from 2021-04-01 10-15-49

SubtopicServiceTest
Screenshot from 2021-04-01 10-32-42

TopicServiceTest
Screenshot from 2021-04-01 10-28-37

StoryServiceTest
Screenshot from 2021-04-01 10-24-39

ExplorationServiceTest
Screenshot from 2021-04-01 10-11-46

ConecptCardServiceTest
Screenshot from 2021-04-01 10-07-55

ClassroomServiceTest
Screenshot from 2021-04-01 10-00-07

NetworkInterceptorTest
Screenshot from 2021-04-01 10-41-08

PersisitentCacheStoreTest
Screenshot from 2021-04-01 10-42-50

JsonNetwrokInterceptortest
Screenshot from 2021-04-27 13-37-50

RemoteAuthNetwrokInterceptortest
Screenshot from 2021-04-27 13-38-30

FeedbackReportingServiceTest was always failing.
Screenshot from 2021-03-15 16-34-55

Fixed FeedbackReportingServiceTest -
Screenshot from 2021-04-01 10-15-49

@prayutsu prayutsu requested a review from BenHenning as a code owner March 15, 2021 11:05
@prayutsu prayutsu requested a review from jcqli March 15, 2021 11:06
@prayutsu
Copy link
Contributor Author

@BenHenning @jcqli PTAL. All test cases of the data module are now fixed except for the MockFeedbackReportingTest which has been recently added in #2838, it is always failing regardless of setting failurePercent of the NetworkBehavior to 0 (please also refer to another screenshot in the PR description).

Screenshot from 2021-03-15 16-40-55

@BenHenning I have reformatted main.yml, but not sure if it is required or not.

@prayutsu
Copy link
Contributor Author

Screenshot from 2021-03-15 17-37-41
I added the missing JSON attributes and corrected some names to fix this test. However, I am not sure if the corresponding values added are best for this test. Please add your suggestions if you have any.

Copy link
Contributor

@jcqli jcqli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other fields look good, thanks for fixing this!

data/src/test/assets/api_mocks/feedback_reporting.json Outdated Show resolved Hide resolved
data/src/test/assets/api_mocks/feedback_reporting.json Outdated Show resolved Hide resolved
Copy link
Contributor

@jcqli jcqli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM :)

Copy link
Member

@BenHenning BenHenning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @prayutsu! I'm not sure I follow how this fixes the flaky tests, so I'm hoping to better understand that.

@jcqli please also take a look. It seems that the newly introduced feedback reporting test might have been failing prior to this PR. Was it passing for you before your PR was merged?

@BenHenning
Copy link
Member

Ah, just saw the comments above.

@BenHenning BenHenning assigned prayutsu and unassigned BenHenning Mar 16, 2021
@prayutsu
Copy link
Contributor Author

prayutsu commented Mar 16, 2021

@BenHenning
This was the previous behavior of all the test cases -
Screenshot from 2021-03-16 11-11-53

They were failing only 4-5 time out of 100 times and the cause of failure was -

retrofit2.mock.MockRetrofitIOException: Failure triggered by MockRetrofit's NetworkBehavior

And, here is some part of the declaration of the NetworkBehavior class -

/**
 * A simple emulation of the behavior of network calls.
 * <p>
 * This class models three properties of a network:
 * <ul>
 * <li>Delay – the time it takes before a response is received (successful or otherwise).</li>
 * <li>Variance – the amount of fluctuation of the delay to be faster or slower.</li>
 * <li>Failure - the percentage of operations which fail (such as {@link IOException}).</li>
 * </ul>
 * Behavior can be applied to a Retrofit interface with {@link MockRetrofit}. Behavior can also
 * be applied elsewhere using {@link #calculateDelay(TimeUnit)} and {@link #calculateIsFailure()}.
 * <p>
 * By default, instances of this class will use a 2 second delay with 40% variance. Failures
 * will occur 3% of the time. HTTP errors will occur 0% of the time.
 */
public final class NetworkBehavior {
  private static final int DEFAULT_DELAY_MS = 2000; // Network calls will take 2 seconds.
  private static final int DEFAULT_VARIANCE_PERCENT = 40; // Network delay varies by ±40%.
  private static final int DEFAULT_FAILURE_PERCENT = 3; // 3% of network calls will fail.
  private static final int DEFAULT_ERROR_PERCENT = 0; // 0% of network calls will return errors.
  
  
  
  ....// Some more code 

By default, NetworkBehavior is supposed to fail almost 3 out of 100 times and which is almost the result we are getting in our tests, to ensure that it never fails, I set the failurePercent to 0.

Also here is some more reference to the documentation of setFailurePercent() method -

  /** Set the percentage of calls to {@link #calculateIsFailure()} that return {@code true}. */
  public void setFailurePercent(int failurePercent) {
    checkPercentageValidity(failurePercent, "Failure percentage must be between 0 and 100.");
    this.failurePercent = failurePercent;
  }
  
  /**
   * Randomly determine whether this call should result in a network failure in accordance with
   * configured behavior. When true, {@link #failureException()} should be thrown.
   */
  public boolean calculateIsFailure() {
    return random.nextInt(100) < failurePercent;
  }
  
  private static void checkPercentageValidity(int percentage, String message) {
    if (percentage < 0 || percentage > 100) {
      throw new IllegalArgumentException(message);
    }
  }

So, first, we check if the failure percent is between 0 and 100 (default is 3). And if it between 3 and 100 we use it in the method calculateIsFailure() which is used in the BehaviorCall class.
(Have a look at the onRun() method below in the BehaviorCall class where we require NetworkBehavior)

/** @BenHenning Please go below near onRun() method, as we are not interested in this. */

final class BehaviorCall<T> implements Call<T> {
  final NetworkBehavior behavior;
  final ExecutorService backgroundExecutor;
  final Call<T> delegate;

  private volatile Future<?> task;
  volatile boolean canceled;
  @GuardedBy("this")
  private boolean executed;

  BehaviorCall(NetworkBehavior behavior, ExecutorService backgroundExecutor, Call<T> delegate) {
    this.behavior = behavior;
    this.backgroundExecutor = backgroundExecutor;
    this.delegate = delegate;
  }

  @SuppressWarnings("CloneDoesntCallSuperClone") // We are a final type & this saves clearing state.
  @Override public Call<T> clone() {
    return new BehaviorCall<>(behavior, backgroundExecutor, delegate.clone());
  }

  @Override public Request request() {
    return delegate.request();
  }

  @SuppressWarnings("ConstantConditions") // Guarding public API nullability.
  @Override public void enqueue(final Callback<T> callback) {
    if (callback == null) throw new NullPointerException("callback == null");

    synchronized (this) {
      if (executed) throw new IllegalStateException("Already executed");
      executed = true;
    }
    task = backgroundExecutor.submit(new Runnable() {
      boolean delaySleep() {
        long sleepMs = behavior.calculateDelay(MILLISECONDS);
        if (sleepMs > 0) {
          try {
            Thread.sleep(sleepMs);
          } catch (InterruptedException e) {
            callback.onFailure(BehaviorCall.this, new IOException("canceled"));
            return false;
          }
        }
        return true;
      }
      
      
      /** @BenHenning We are interested here. */
      
      @Override public void run() {
        if (canceled) {
          callback.onFailure(BehaviorCall.this, new IOException("canceled"));
        } else if (behavior.calculateIsFailure()) {
          if (delaySleep()) {
            callback.onFailure(BehaviorCall.this, behavior.failureException());
          }
        } else if (behavior.calculateIsError()) {
          if (delaySleep()) {
            //noinspection unchecked An error response has no body.
            callback.onResponse(BehaviorCall.this, (Response<T>) behavior.createErrorResponse());
          }
        } else {
          delegate.enqueue(new Callback<T>() {
            @Override public void onResponse(Call<T> call, Response<T> response) {
              if (delaySleep()) {
                callback.onResponse(call, response);
              }
            }

            @Override public void onFailure(Call<T> call, Throwable t) {
              if (delaySleep()) {
                callback.onFailure(call, t);
              }
            }
          });
        }
      }
    });
  }

So, if the calculateIsFailure() returns true, we trigger a failure. That's why the tests were only failing 4-5 times.

…so refactor the corresponding filed of the data class GaeFeedbackReportingSystemContext.
@prayutsu prayutsu assigned BenHenning and unassigned prayutsu Mar 16, 2021
Copy link
Member

@BenHenning BenHenning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's an excellent find @prayutsu! Might it be possible for us to introduce a new test module (e.g. in testing/ package) that creates & provides the mock retrofit object and can be shared among all of these tests? It'd be nice to set this configuration in one place so that we don't actually hit it in the future when we add more services.

@jcqli I think is working on something similar in a related PR, so it might make sense to collaborate to determine which PR should have that module in it.

data/src/test/assets/api_mocks/feedback_reporting.json Outdated Show resolved Hide resolved
@BenHenning BenHenning assigned prayutsu and unassigned BenHenning Mar 17, 2021
@jcqli
Copy link
Contributor

jcqli commented Mar 17, 2021

Ah, that's an excellent find @prayutsu! Might it be possible for us to introduce a new test module (e.g. in testing/ package) that creates & provides the mock retrofit object and can be shared among all of these tests? It'd be nice to set this configuration in one place so that we don't actually hit it in the future when we add more services.

@jcqli I think is working on something similar in a related PR, so it might make sense to collaborate to determine which PR should have that module in it.

For clarification, we would also want to move the mock services to the new module, right? So then both the retrofit mock and the services can be shared with each other

@prayutsu prayutsu requested a review from anandwana001 as a code owner March 17, 2021 10:12
… fix-data-module-tests-and-add-them-to-ci
@prayutsu prayutsu assigned BenHenning and unassigned prayutsu and anandwana001 Apr 28, 2021
@prayutsu
Copy link
Contributor Author

@anandwana001 @BenHenning Data module's tests are failing locally for me on Bazel but they are passing in CI.
Can you please verify if they are passing locally for you?

Copy link
Member

@BenHenning BenHenning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prayutsu I can't check right now if they are working locally for me, but I suspect they will if they're passing on CI.

The failures are either the known compute affected tests flake tracked in #2691 & the other failure is #2032. No failures seem tied to this PR.

Otherwise, the PR LGTM. Just have 1 nit + 1 existing comment to address before approving.

data/BUILD.bazel Outdated Show resolved Hide resolved
kt_android_library(
name = "retrofit_module",
srcs = [
"RetrofitModule.kt",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Do you have JDK 8 installed locally? Bazel uses an embedded JDK 11 for actions & building, but for running it uses whatever version of JDK you have locally (we generally recommend Java 8, though CI uses 9 I believe). This seems to be a class version incompatibility, so installing JDK 8 & setting it as your default may fix this issue.

If you don't want to set it as your default, you can set JAVA_HOME temporarily before using bazel test and it should use that version. There might be a way to configure this via bazelrc, as well, but I'm not sure.

@BenHenning BenHenning assigned prayutsu and unassigned BenHenning Apr 29, 2021
@prayutsu prayutsu assigned BenHenning and unassigned prayutsu Apr 29, 2021
@prayutsu
Copy link
Contributor Author

@anandwana001 PTAL, all the checks are passing now.

Copy link
Member

@BenHenning BenHenning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @prayutsu! The PR underwent a bunch of iterations, but I think it landed in a really nice place. LGTM!

Also as an aside: please don't mark any conversations in PRs resolved unless you authored them (it makes it harder for the person creating the conversation to make sure the changes were addressed).

@BenHenning BenHenning removed their assignment Apr 30, 2021
@prayutsu
Copy link
Contributor Author

Thanks @prayutsu! The PR underwent a bunch of iterations, but I think it landed in a really nice place. LGTM!

Also as an aside: please don't mark any conversations in PRs resolved unless you authored them (it makes it harder for the person creating the conversation to make sure the changes were addressed).

@BenHenning Sure, I think I only resolved the conversations initiated by me, maybe I would have accidentally resolved some conversation authored by you, will take care of in the future.

Copy link
Contributor

@anandwana001 anandwana001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all test files LGTM

@prayutsu prayutsu assigned BenHenning and unassigned prayutsu Apr 30, 2021
@prayutsu
Copy link
Contributor Author

@BenHenning PR is ready to merge.

@BenHenning
Copy link
Member

Ah @prayutsu I think this was my mistake. You're right--the comment of mine I saw disappear was because it was a conversation thread you started. It looks like you did follow the process entirely--thanks! :)

Really happy to see this getting merged. Merging it now.

@BenHenning BenHenning merged commit 524799e into oppia:develop May 1, 2021
@prayutsu prayutsu deleted the fix-data-module-tests-and-add-them-to-ci branch May 1, 2021 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix data module tests & add them to CI
4 participants