Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running in AWS Lambda Containers #289

Open
tomardern opened this issue Dec 7, 2020 · 3 comments
Open

Running in AWS Lambda Containers #289

tomardern opened this issue Dec 7, 2020 · 3 comments

Comments

@tomardern
Copy link

Hi,

Now that AWS supports containers in Lambda is there a plan / has anyone attempted to get this repo to work using a container instead of the provided binaries/layers?

Thanks,

@JasperHG90
Copy link

JasperHG90 commented Mar 11, 2021

Yep. I got it to work.

You need to use "/tmp" for downloads, and you need to call the 'driverEnableHeadlessDownloads' function to enable headless chrome to be able to download files if you want that (see link in function for source). I pinned my selenium version to selenium==3.141 and use Python 3.7.9. Chromedriver and headless chrome versions are also pinned (see dockerfile below for versions).

I use the following python/selenium functions to set up the driver:

def driverEnableHeadlessDownloads(driver: webdriver, downloadDir: str) -> webdriver:
    """
    Need this voodoo function to allow serverless chrome downloads.
     From: https://github.com/shawnbutton/PythonHeadlessChrome/blob/master/driver_builder.py
    Parameters
    ----------
    driver: selenium webdriver
    downloadDir: directory used for downloads
    Returns
    -------
    selenium webdriver
    """
    driver.command_executor._commands["send_command"] = (
        "POST",
        "/session/$sessionId/chromium/send_command",
    )
    params = {
        "cmd": "Page.setDownloadBehavior",
        "params": {"behavior": "allow", "downloadPath": downloadDir},
    }
    driver.execute("send_command", params)


def makeDefaultChromeOptions() -> webdriver.ChromeOptions:
    """
    Set up default chrome options
    Returns
    -------
    selenium webdriver
    """
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")
    options.add_argument("--disable-gpu")
    options.add_argument("--window-size=1280x1696")
    options.add_argument("--disable-application-cache")
    options.add_argument("--disable-infobars")
    options.add_argument("--no-sandbox")
    options.add_argument("--hide-scrollbars")
    options.add_argument("--enable-logging")
    options.add_argument("--log-level=0")
    options.add_argument("--single-process")
    options.add_argument("--ignore-certificate-errors")
    options.add_argument("--disable-dev-shm-usage")
    options.add_argument("--homedir=/var/task")
    options.add_argument(
        "user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (HTML, like Gecko) "
        "Chrome/61.0.3163.100 Safari/537.36"
    )
    return options
    
class Driver:
    def __init__(self, chromeDriver: str, prefs: dict, headlessChromeBinary: str):
        if not pathlib.Path(chromeDriver).exists():
            raise FileNotFoundError(f"Chrome driver not found at {chromeDriver}")
        self.chromeDriver = chromeDriver
        self.prefs = prefs
        self.options = makeDefaultChromeOptions()
        self.options.add_experimental_option("prefs", prefs)
        self.options.binary_location = headlessChromeBinary
        self.driver = None

    def __enter__(self):
        logger.info(
            f"Setting up headless chrome-based browser with preferences {self.prefs}"
        )
        self.driver = webdriver.Chrome(self.chromeDriver, options=self.options)
        driverEnableHeadlessDownloads(self.driver, "/tmp")
        return self.driver

    def __exit__(self, excType, excVal, excTb):
        logger.info("Shutting down driver")
        self.driver.close()
        
 chromePrefs = {
            "download.default_directory": chromeDownloadPath,
            "download.prompt_for_download": False,
            "download.directory_upgrade": True,
            "safebrowsing.enabled": False,
        }

This is the Dockerfile I use for deployment:

FROM public.ecr.aws/lambda/python:3.7

RUN mkdir -p /opt/bin && mkdir -p /opt/extensions && mkdir /var/task/.downloads \
        && curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-55/stable-headless-chromium-amazonlinux-2017-03.zip \
         > /opt/bin/headless-chromium.zip \
        && unzip /opt/bin/headless-chromium.zip -d /opt/bin && rm /opt/bin/headless-chromium.zip \
        && curl -SL https://chromedriver.storage.googleapis.com/2.43/chromedriver_linux64.zip > /opt/bin/chromedriver.zip \
        && unzip /opt/bin/chromedriver.zip -d /opt/bin && rm /opt/bin/chromedriver.zip \
        && chmod 777 /opt/bin/chromedriver

# Add poetry files
ADD poetry.lock /var/task
ADD pyproject.toml /var/task

RUN pip install --upgrade pip \
        && pip install poetry --no-cache-dir \
        # Export requirements from poetry project
        && poetry export -f requirements.txt --output /var/task/requirements.txt \
        && pip uninstall -y poetry \
        && pip install -r requirements.txt --target /var/task --no-cache-dir \
        && pip install awslambdaric --target /var/task --no-cache-dir

ADD awsLambda /var/task

CMD [ "main.handler" ]

And this is my pulumi function to create the lambda

lambdaFunction = lambda_.Function(
        resource_name="myLambda",
        image_uri="XXXXXXXXX.dkr.ecr.XXXXX.amazonaws.com"
        f"/myLambda:latest-prod",
        memory_size=1024,
        role=role.arn,
        package_type="Image",
        description="This lambda does things.",
        timeout=500,
        tags={
            "environment": "prod",
            "creator": "pulumi",
            "project": "myLambda",
            "project-url": "https://github.com/XXXXXXX/XXXXXXX",
            "maintainer": "myname",
            "maintainer-email": "mymail@myprovider.com",
        },
    )

I test the lambda function locally by using the awslambdaric python module. After building the dockerfile, I call:

docker run -d -v ~/.aws-lambda-rie:/aws-lambda -p 9000:8080 \
  --entrypoint /aws-lambda/aws-lambda-rie \
  --env-file .temp/.env \
   docker.io/myorg/myimg \
   /var/lang/bin/python -m awslambdaric main.handler ## 'main' is my lambda file, 'handler' is the lambda name

Firing curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}' in a terminal invokes the lambda. I usually just call docker ps, note down the container id, and then call docker logs on it.

Hope this helps someone!

@umihico
Copy link

umihico commented Apr 15, 2021

@tomardern
I could make it. Please visit my repository https://github.com/umihico/docker-selenium-lambda

@kajrolkar
Copy link

chromePrefs = {
            "download.default_directory": chromeDownloadPath,
            "download.prompt_for_download": False,
            "download.directory_upgrade": True,
            "safebrowsing.enabled": False,
        }

May i know which downlaod path i have to provide ?
Is /var/task/.Download

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants