Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get remote webSocketDebuggerUrl (or connect without GUID) #940

Closed
davidwindell opened this issue Oct 2, 2017 · 13 comments
Closed

Comments

@davidwindell
Copy link

davidwindell commented Oct 2, 2017

We're using puppeteer.connect to connect to a remote Chrome instance:

puppeteer.connect(
  { "browserWSEndpoint": "ws://{remoteip}:9222/devtools/browser/fa60c034-422d-4f2c-bbeb-17a2cfd690f2"}
);

Is there any way we can avoid needing to know the GUID? Perhaps by a flag when starting the remote Chrome (it's on a secure internal network).

Alternatively, what would be the best way in node to look up the browserWSEndpoint from http://{remoteip}:9222/json/version?

Or are we approaching this the wrong way?

@kdekooter
Copy link

kdekooter commented Oct 4, 2017

As to your second question: http://{remoteip}:9222/json/version returns JSON. You could use the request library to call this URL and do a JSON.parse() on the response which gives you an object with the id as an attribute.

@davidwindell
Copy link
Author

Thanks @kdekooter, I've put together the below script which is working (I'm new to node so feel free to comment on my appalling structure and handling of async):

generate.js
/*
 * A puppeteer script to generate a PDF from stdin using a remote chrome instance.
 * The generated PDF content is returned to stdout.
 *
 * Usage: node generate.js <host:port> <format>
 */
'use strict';

const puppeteer = require('puppeteer');
const http = require('http');

var html     = '';
var endpoint = process.argv[2];
var format   = process.argv[3] ? process.argv[3] : 'A4';

process.stdin.setEncoding('utf8');

process.stdin.on('readable', () => {
    const chunk = process.stdin.read();
    if (chunk !== null) {
        html += chunk;
    }
});

process.stdin.on('end', () => {
    http.get(`http://${process.argv[2]}/json/version`, (res) => {
        res.setEncoding('utf8');
        let rawData = '';
        res.on('data', (chunk) => { rawData += chunk; });
        res.on('end', () => {
            try {
                const parsedData = JSON.parse(rawData);
                endpoint = parsedData.webSocketDebuggerUrl;

                (async () => {
                    const browser = await puppeteer.connect({browserWSEndpoint: endpoint});
                    const page    = await browser.newPage();

                    await page.setJavaScriptEnabled(false);
                    await page.goto(`data:text/html,${html}`, {waitUntil: 'networkidle'});

                    const pdf = await page.pdf({format: format});

                    await process.stdout.write(pdf);

                    await browser.close();
                })();
            } catch (e) {
                console.error(`Unable to generate PDF: ${e.message}`);
            }
        });
    }).on('error', (e) => {
        console.error(`Unable to connect to Chrome: ${e.message}`);
    });
});

For example: echo "<html>foo</html>" | node generate.js 127.0.0.1:9222 > foo.pdf

If anyone has any suggestions to speed this up and/or avoid the browserWSEndpoint lookup I'd love to hear them 👍

@aslushnikov
Copy link
Contributor

Is there any way we can avoid needing to know the GUID? Perhaps by a flag when starting the remote Chrome (it's on a secure internal network).

@davidwindell why do you want to avoid this? Since you're the one who starts chrome, you can get its GUID and share with all interested parties.

@davidwindell
Copy link
Author

@aslushnikov we launch the chrome instance automatically in a disposable docker container and communicating/broadcasting the Guid just wouldn't work. A reboot of the host means all dependent services have to look it up.

@aslushnikov
Copy link
Contributor

@davidwindell I see, thanks. This seems to be a general service discovery problem, but yes, you
can hack this in with the script you've suggested.

@drorweissweiss
Copy link

@davidwindell , @kdekooter
I'm also trying to discover the endpoint's browser id so I can use browserWSEndpoint, but I don't see any id field in the object returning from http://{remoteip}:9222/json/version:

{
"Browser": "HeadlessChrome/61.0.3163.91",
"Protocol-Version": "1.2",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/61.0.3163.91 Safari/537.36",
"V8-Version": "6.1.534.37",
"WebKit-Version": "537.36 (@2f544949507dc5330b714cb017e3f584e791a1bf)"
}

I'm starting my headless-chrome instance using these flags:

--headless
--remote-debugging-address=0.0.0.0
--remote-debugging-port=9222
--no-sandbox
--disable-gpu
--disable-sync
--disable-translate
--disable-extensions
--disable-default-apps
--disable-background-networking
--safebrowsing-disable-auto-update
--mute-audio
--no-first-run
--hide-scrollbars
--metrics-recording-only
--ignore-certificate-errors

Am I missing any step to detect the id (perhaps did you guys started your headless-chrome instance with a special flag that allows this information broadcast) ?

@davidwindell
Copy link
Author

@drorweissweiss this is only in the latest versions of Chrome (62+ I think, certainly 63). Caught me out at first!

@seeekr
Copy link

seeekr commented Apr 12, 2018

FYI: In Chrome 65+ (maybe earlier) there is a webSocketDebuggerUrl property instead of id in the /json/version endpoint's json response.

@ja8zyjits
Copy link

@aslushnikov for the new version chromium 69 is there a way to connect with remote headless chromium without knowing the GUID. I know there is a way through service discovery /json but since i would be having few chromium running behind my load balancer I would like the lb to redirect dynamically.

Possible Way

Is there a way to start chromium with a custom GUID so that my supervisord can autorestart and apply the same GUID which my puppeteer would be trying to access and it will also make the task of lb to redirect without worrying about GUID matching issues.

Thanks

@ginuerzh
Copy link

@aslushnikov for the new version chromium 69 is there a way to connect with remote headless chromium without knowing the GUID. I know there is a way through service discovery /json but since i would be having few chromium running behind my load balancer I would like the lb to redirect dynamically.

Possible Way

Is there a way to start chromium with a custom GUID so that my supervisord can autorestart and apply the same GUID which my puppeteer would be trying to access and it will also make the task of lb to redirect without worrying about GUID matching issues.

Thanks

@ja8zyjits did you find any solution to the dynamic guid when deploying multiple chrome instances?
I want to deploy the headless chrome on k8s as a deployment, but the inconsistent urls make it impossible to scale up.

@puneetjindal
Copy link

@ginuerzh i would also be interested to understand the same!

@qiankunxienb
Copy link

Does anyone have the solution of this question?

@Jaqenhghar
Copy link

@davidwindell , @kdekooter I'm also trying to discover the endpoint's browser id so I can use browserWSEndpoint, but I don't see any id field in the object returning from http://{remoteip}:9222/json/version:

{
"Browser": "HeadlessChrome/61.0.3163.91",
"Protocol-Version": "1.2",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/61.0.3163.91 Safari/537.36",
"V8-Version": "6.1.534.37",
"WebKit-Version": "537.36 (@2f544949507dc5330b714cb017e3f584e791a1bf)"
}

I'm starting my headless-chrome instance using these flags:

--headless
--remote-debugging-address=0.0.0.0
--remote-debugging-port=9222
--no-sandbox
--disable-gpu
--disable-sync
--disable-translate
--disable-extensions
--disable-default-apps
--disable-background-networking
--safebrowsing-disable-auto-update
--mute-audio
--no-first-run
--hide-scrollbars
--metrics-recording-only
--ignore-certificate-errors

Am I missing any step to detect the id (perhaps did you guys started your headless-chrome instance with a special flag that allows this information broadcast) ?

It is in /json/list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants