Skip to content

[Pollux, Reproducibility, Inquiry] Are dataset-fetching mechanisms broken? #110

Open
@stet-stet

Description

Hi, I am trying to run the pollux benchmark with custom workload and a different cluster (one that is not aws), to evaluate how pollux does in a variety of situations. However, I cannot seem to pull from your docker registry at registry.petuum.com, which is needed to assemble the containers for each of the six models. (See this directory, for example )

Below is a part of what kubectl describe pods outputs for the dataset pod, after I successfully launch the three kinds of sched pods.

Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  2m2s                default-scheduler  Successfully assigned default/datasets-jxz86 to elsa-05
  Normal   Pulling    53s (x3 over 2m)    kubelet            Pulling image "registry.petuum.com/dev/esper-datasets:latest"
  Warning  Failed     38s (x3 over 104s)  kubelet            Failed to pull image "registry.petuum.com/dev/esper-datasets:latest": rpc error: code = Unknown desc = Error response from daemon: Get https://registry.petuum.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Failed     38s (x3 over 104s)  kubelet            Error: ErrImagePull
  Normal   BackOff    9s (x4 over 104s)   kubelet            Back-off pulling image "registry.petuum.com/dev/esper-datasets:latest"
  Warning  Failed     9s (x4 over 104s)   kubelet            Error: ImagePullBackOff

I tried just pulling an image as well, and I got what you can see below. I am starting to think that maybe some undocumented procedure(eg. registration) is required to access registry.petuum.com...?

> ping registry.petuum.com
PING ec2-54-245-165-47.us-west-2.compute.amazonaws.com (54.245.165.47) 56(84) bytes of data.

^C
> sudo docker pull registry.petuum.com/dev/esper-datasets:latest

Error response from daemon: Get https://registry.petuum.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I googled a bit, and tested some of the more common solutions:

Regrettably, the former did not work, and it turns out the latter is not an option given my circumstances.

How can I proceed if I want to pull images from your server, and/or download the datasets you used in the evaluations in the paper?

Thank you in advance!

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions