Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job configuration setting via JSON file #1094

Open
DatPhanTien opened this issue Jul 23, 2018 · 3 comments
Open

Job configuration setting via JSON file #1094

DatPhanTien opened this issue Jul 23, 2018 · 3 comments

Comments

@DatPhanTien
Copy link

Spark 2.3.1

Spark JobServer 0.8.0

Spark Standalone mode

Dear,

We are using Spark Hidden REST API for managing (launch, kill, monitor, ...) our jobs.
We recently learnt that Spark JobServer provides more wonderful features that can really make our lives easier.

After reading the Spark Jobserver documentation, I noticed that it does not seem to have the same way to configure a Spark job. For Spark Hidden REST API, once can configure a Spark job with a JSON file like:

{
  "action": "CreateSubmissionRequest",
  "clientSparkVersion" : "2.3.1",
  "appResource": "/app/otrace/test/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
  "appArgs": [ "1" ],
  "mainClass": "org.apache.spark.examples.SparkPi",
  "environmentVariables" : {
    "SPARK_ENV_LOADED" : "1"
  },
  "sparkProperties": {
    "spark.jars": "/app/otrace/test/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
    "spark.executor.memory": "2048m",
    "spark.cores.max" : "4",
    "spark.driver.memory": "1024m",
    "spark.executor.cores": "2",
    "spark.submit.deployMode":"cluster",
    "spark.app.name": "SparkPi",
    "spark.mesos.fetcherCache.enable" : "false",
    "spark.master": "mesos://zk://10.81.149.187:2181/mesos",
    "spark.mesos.executor.home" : "/app/otrace/test/spark-2.3.1-bin-hadoop2.7/"
  }
}

With a HTTP POST, such as "curl -X POST --data @config.json http://my_spark_server" and it will start a job with the above configuration.

Our questions are:

  1. Does Spark JobServer allow to pass job/context configuration using a JSON file like aforementioned? As far as our understanding, all configuration seems to be passed via HTTP query. For instance, "http://my_spark_server/jobs?configName1=value1&configName2=value2". This does not seem to be practical if we have tens of different configuration fields to set.

  2. If it DOES, is the JSON format the same or not?

Best
Tien Dat

@bsikander
Copy link
Contributor

As far as I know, currently jobserver only allows querystring parameters. Some of the configurations can be set directly in spark configurations, others can be set in jobserver default configurations and remaining one's can be set via querystring. Generally, it works out well.

As far as JSON passing in body is concerned, either you will have to write a wrapper on top of jobserver or contribute to jobserver by enhancing its rest API (shouldn't be too difficult).

@DatPhanTien
Copy link
Author

Thanks for your response.

One extra point to my first comment:
Since Job Server allows Context to running permanently, the aforementioned configuration actually targets two different objects:
1- Spark context: such as spark.cores.max, spark.driver.memory, ...
2- The running jobs: such as appResource , appArgs , ...

@bsikander
Copy link
Contributor

To add to my previous comment, I was wrong. JobServer allows you to pass JSON in /jobs endpoint.

JobServer internally uses parseString function of ConfigFactory which can accept both HOCON and JSON formats.

HOCON="configName1=value1,configName2=value2"
JSON= {"configName1": "value1", "configName2": "value2"}

So, just pass your configs in the body in any of the above format and then you will be able to access them in your job using config.getXXXX() methods.

Sorry for the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants