Job configuration setting via JSON file #1094

DatPhanTien · 2018-07-23T13:10:22Z

Spark 2.3.1

Spark JobServer 0.8.0

Spark Standalone mode

Dear,

We are using Spark Hidden REST API for managing (launch, kill, monitor, ...) our jobs.
We recently learnt that Spark JobServer provides more wonderful features that can really make our lives easier.

After reading the Spark Jobserver documentation, I noticed that it does not seem to have the same way to configure a Spark job. For Spark Hidden REST API, once can configure a Spark job with a JSON file like:

{
  "action": "CreateSubmissionRequest",
  "clientSparkVersion" : "2.3.1",
  "appResource": "/app/otrace/test/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
  "appArgs": [ "1" ],
  "mainClass": "org.apache.spark.examples.SparkPi",
  "environmentVariables" : {
    "SPARK_ENV_LOADED" : "1"
  },
  "sparkProperties": {
    "spark.jars": "/app/otrace/test/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
    "spark.executor.memory": "2048m",
    "spark.cores.max" : "4",
    "spark.driver.memory": "1024m",
    "spark.executor.cores": "2",
    "spark.submit.deployMode":"cluster",
    "spark.app.name": "SparkPi",
    "spark.mesos.fetcherCache.enable" : "false",
    "spark.master": "mesos://zk://10.81.149.187:2181/mesos",
    "spark.mesos.executor.home" : "/app/otrace/test/spark-2.3.1-bin-hadoop2.7/"
  }
}

With a HTTP POST, such as "curl -X POST --data @config.json http://my_spark_server" and it will start a job with the above configuration.

Our questions are:

Does Spark JobServer allow to pass job/context configuration using a JSON file like aforementioned? As far as our understanding, all configuration seems to be passed via HTTP query. For instance, "http://my_spark_server/jobs?configName1=value1&configName2=value2". This does not seem to be practical if we have tens of different configuration fields to set.
If it DOES, is the JSON format the same or not?

Best
Tien Dat

The text was updated successfully, but these errors were encountered:

bsikander · 2018-07-23T13:55:24Z

As far as I know, currently jobserver only allows querystring parameters. Some of the configurations can be set directly in spark configurations, others can be set in jobserver default configurations and remaining one's can be set via querystring. Generally, it works out well.

As far as JSON passing in body is concerned, either you will have to write a wrapper on top of jobserver or contribute to jobserver by enhancing its rest API (shouldn't be too difficult).

DatPhanTien · 2018-07-27T16:26:43Z

Thanks for your response.

One extra point to my first comment:
Since Job Server allows Context to running permanently, the aforementioned configuration actually targets two different objects:
1- Spark context: such as spark.cores.max, spark.driver.memory, ...
2- The running jobs: such as appResource , appArgs , ...

bsikander · 2018-08-15T09:23:17Z

To add to my previous comment, I was wrong. JobServer allows you to pass JSON in /jobs endpoint.

JobServer internally uses parseString function of ConfigFactory which can accept both HOCON and JSON formats.

HOCON="configName1=value1,configName2=value2"
JSON= {"configName1": "value1", "configName2": "value2"}

So, just pass your configs in the body in any of the above format and then you will be able to access them in your job using config.getXXXX() methods.

Sorry for the confusion.

bsikander added the feature request label Jul 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job configuration setting via JSON file #1094

Job configuration setting via JSON file #1094

DatPhanTien commented Jul 23, 2018

bsikander commented Jul 23, 2018

DatPhanTien commented Jul 27, 2018

bsikander commented Aug 15, 2018

Job configuration setting via JSON file #1094

Job configuration setting via JSON file #1094

Comments

DatPhanTien commented Jul 23, 2018

bsikander commented Jul 23, 2018

DatPhanTien commented Jul 27, 2018

bsikander commented Aug 15, 2018