Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Connect to Pod's Exposed Public IP & Port from Pod within same Region #337

Open
cblmemo opened this issue Aug 29, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@cblmemo
Copy link

cblmemo commented Aug 29, 2024

Describe the bug
The ports exposed through TCP Public IP cannot be accessed inside pods within same region.

To Reproduce

  1. Use this script to create 2 pods from same region:
import runpod
import base64
import os
from rich import print

def create(name: str, region: str):
    with open(os.path.expanduser('~/.ssh/id_rsa.pub'), 'r', encoding='utf-8') as f:
        public_key = f.read().strip()
    setup_cmd = (
        # Setting up SSH here
        'prefix_cmd() '
        '{ if [ $(id -u) -ne 0 ]; then echo "sudo"; else echo ""; fi; }; '
        '$(prefix_cmd) apt update;'
        'export DEBIAN_FRONTEND=noninteractive;'
        '$(prefix_cmd) apt install openssh-server rsync curl patch -y;'
        '$(prefix_cmd) mkdir -p /var/run/sshd; '
        '$(prefix_cmd) '
        'sed -i "s/PermitRootLogin prohibit-password/PermitRootLogin yes/" '
        '/etc/ssh/sshd_config; '
        '$(prefix_cmd) sed '
        '"s@session\\s*required\\s*pam_loginuid.so@session optional '
        'pam_loginuid.so@g" -i /etc/pam.d/sshd; '
        'cd /etc/ssh/ && $(prefix_cmd) ssh-keygen -A; '
        '$(prefix_cmd) mkdir -p ~/.ssh; '
        '$(prefix_cmd) chown -R $(whoami) ~/.ssh;'
        '$(prefix_cmd) chmod 700 ~/.ssh; '
        f'$(prefix_cmd) echo "{public_key}" >> ~/.ssh/authorized_keys; '
        '$(prefix_cmd) chmod 644 ~/.ssh/authorized_keys; '
        '$(prefix_cmd) service ssh restart; '
        '[ $(id -u) -eq 0 ] && echo alias sudo="" >> ~/.bashrc;'
        # Starting a test HTTP server
        'python3 -m http.server 9000'
    )
    encoded = base64.b64encode(setup_cmd.encode('utf-8')).decode('utf-8')
    pod = runpod.create_pod(
        name=name,
        image_name="runpod/base:0.0.2",
        gpu_type_id="NVIDIA RTX A4000",
        country_code=region,
        ports="22/tcp,9000/tcp",
        support_public_ip=True,
        docker_args=f'bash -c \'echo {encoded} | base64 --decode > init.sh; bash init.sh\''
    )
    return pod['id']

rp1_id = create("rp1", "CA")
rp2_id = create("rp2", "CA")

print(f"rp1_id = '{rp1_id}'")
print(f"rp2_id = '{rp2_id}'")
  1. Use this script to get test commands:
def get_cmd(pod_id: str):
    pod_stat = runpod.get_pod(pod_id)
    runtime = pod_stat.get('runtime') or {}
    ports_info = runtime.get('ports', [])
    if not ports_info:
        raise ValueError(f"Pod {pod_id} is not ready.")
    ssh_cmd = None
    curl_cmd = None
    for p in ports_info:
        if p['isIpPublic']:
            if p['privatePort'] == 22:
                ssh_cmd = f'ssh -i ~/.ssh/id_rsa -p {p["publicPort"]} root@{p["ip"]}'
            if p['privatePort'] == 9000:
                curl_cmd = f'curl http://{p["ip"]}:{p["publicPort"]}'
    assert ssh_cmd is not None and curl_cmd is not None, f"Pod {pod_id} is not ready."
    return ssh_cmd, curl_cmd

# Fill in the pod id retrieved from previous script
rp1_id = 'qi5a6pnu01x2zl'
rp2_id = '3k3hy87mtr2old'

rp1_ssh, rp1_curl = get_cmd(rp1_id)
rp2_ssh, rp2_curl = get_cmd(rp2_id)

print(rp1_curl)
print(rp2_curl)

print(f'{rp1_ssh} {rp2_curl}')
print(f'{rp2_ssh} {rp1_curl}')

Example output:

curl http://69.30.85.69:22145
curl http://69.30.85.69:22186
ssh -i ~/.ssh/id_rsa -p 22144 root@69.30.85.69 curl http://69.30.85.69:22186
ssh -i ~/.ssh/id_rsa -p 22185 root@69.30.85.69 curl http://69.30.85.69:22145
  1. Trying to run the 4 commands we get from the script. The first two (from the laptop running runpod api calls) success but the third and the fourth (which doing curl inside the pod) failed.
$ curl http://69.30.85.69:22145
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
...

$ curl http://69.30.85.69:22186
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
...

$ ssh -i ~/.ssh/id_rsa -p 22144 root@69.30.85.69 curl http://69.30.85.69:22186
The authenticity of host '[69.30.85.69]:22144 ([69.30.85.69]:22144)' can't be established.
ECDSA key fingerprint is SHA256:8wlRef+5KXU62d7TkPvMan6bkdkyUgPxt4qP4WyWFrw.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[69.30.85.69]:22144' (ECDSA) to the list of known hosts.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 69.30.85.69 port 22186: Connection refused

$ ssh -i ~/.ssh/id_rsa -p 22185 root@69.30.85.69 curl http://69.30.85.69:22145
The authenticity of host '[69.30.85.69]:22185 ([69.30.85.69]:22185)' can't be established.
ECDSA key fingerprint is SHA256:8wlRef+5KXU62d7TkPvMan6bkdkyUgPxt4qP4WyWFrw.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[69.30.85.69]:22185' (ECDSA) to the list of known hosts.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 69.30.85.69 port 22145: Connection refused
  1. The same command works well if the two pod is from different region (tested with CA and SE).

Expected behavior
The exposed endpoint is accessible from anywhere, including other pods started by runpod.

Screenshots
Pls see the console logs before.

Desktop (please complete the following information):

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye
$ pip show runpod       
Name: runpod
Version: 1.7.0
Summary: 🐍 | Python library for RunPod API and serverless worker SDK.
Home-page: https://runpod.io
Author: RunPod
Author-email: RunPod <engineer@runpod.io>, Justin Merrell <justin.merrell@runpod.io>
License: MIT License
Location: /home/memory/install/miniconda3/envs/sky/lib/python3.9/site-packages
Requires: aiohttp, aiohttp-retry, backoff, boto3, click, colorama, cryptography, fastapi, inquirerpy, paramiko, prettytable, py-cpuinfo, requests, tomli, tomlkit, tqdm-loggable, urllib3, watchdog
Required-by:

Additional context
None

@keyboardAnt
Copy link

I also need help with connecting to pods. Connecting via "Basic SSH Terminal" works, but "SSH over exposed TCP" doesn't. I checked the ~/.ssh/authorized_keys file on the pod, and it matches the public key corresponding to the private key I'm using while SSHing. The error I receive is

ssh: connect to host 213.173.108.100 port 12157: Connection refused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants