Rework API #2071

snkas · 2024-07-19T16:17:39Z

API rework which makes the program part of the pipeline and the connectors defined from the SQL.
Removes services fully.

Progress back-end:

Progress UI:

Merge the UI into this PR
Remove the old UI

Progress others:

Either temporarily disable or fix all existing demos
CHANGELOG entry

Try it out:
Start the pipeline-manager (note that it will delete all existing pipeline-related tables):

RUST_BACKTRACE=1 RUST_LOG=debug,tokio_postgres=info cargo run --package=pipeline-manager --features pg-embed --bin pipeline-manager -- --dev-mode

... followed by a Kafka instance:

docker compose -f deploy/docker-compose.yml -f deploy/docker-compose-dev.yml up redpanda --build

... followed by running the demo:

cd demo/simple-count
python3 run.py --api-url http://localhost:8080 --kafka localhost:19092

Is this a user-visible change (yes/no): yes

gz · 2024-07-30T20:40:36Z

Great work, some feedback from giving it a first try:

Serve the new UI under / and remove the build and serving the old UI, also delete all of web-console/
left navigvation should be open by default at first:

Needs some text how about: "Start a New Pipeline"

This should have some more spacing between heading and description
Clicking this doesn't do anything yet? Best case can we configure it to fetch a file from github and populate the pipeline editor with that?

it seems odd that an empty SQL program is ready to run?

It's odd that we call this a "PROGRAM ERR" now maybe the state for a failed rust compilation is called something else? Otherwise maybe we should say "SYNTAX ERROR"

It's a bit confusing that the pipeline only shows up with a delay when creating a new one

Screencast.from.07-30-2024.01.25.22.PM.webm

The whole space should be clickable to go to the pipeline not just the "test'

There are these ERROR in the log that probably shouldn't be ERROR

2024-07-30 20:27:51 ERROR [manager] [HTTP error response] PipelineNotRunningOrPaused: Pipeline 0191054a-5ae2-7362-9993-c8c097f5a56b is not currently running or paused.

Sometimes clicking the play button doesn't have any effect

Screencast.from.07-30-2024.01.28.57.PM.webm

remove ad-hoc query and query-plan tabs for now
The errors only show up seconds after the program failed to compile here
Screencast from 07-30-2024 01:31:16 PM.webm
if I use the mini-view in the text editor to slide up in the text and let the mouse go on the slack button it sends me to the community slack (note I just let the mouse button go I didn't click the community slack link or anything -- this is on firefox)

Screencast.from.07-30-2024.01.33.27.PM.webm

-if autosave is off and turned back on, if the state has changed the default action should be to save
Screencast from 07-30-2024 01:35:16 PM.webm

having this set to null seems wrong?

If there is an error in the json, we should grey out the save button and add a tooltip to it to make sure to inform the user that there is a json syntax error
Screencast from 07-30-2024 01:37:17 PM.webm

snkas · 2024-07-31T12:08:11Z

Thanks a ton for the feedback, a few replies already:

There are these ERROR in the log that probably shouldn't be ERROR

The return code should be an error (issuing an action which is not valid in a certain status).

having this set to null seems wrong?

These are the current defaults, I agree it would be good to change to None, though for this PR not planning on changing that part of the API.

lalithsuresh · 2024-07-31T15:25:17Z

web-console-sveltekit/src/lib/services/pipelineManager.ts

+    .with(['Initializing', P.any, P.nullish, P._], () => 'Initializing' as const)
+    .with(['Paused', P.any, P.nullish, 'CompilingSql'], () => 'Compiling sql' as const)
+    .with(['Paused', P.any, P.nullish, 'Pending'], () => 'Queued' as const)
+    .with(['Paused', P.any, P.nullish, 'CompilingRust'], () => 'Compiling bin' as const)


Compiling Rust

lalithsuresh · 2024-07-31T15:25:22Z

web-console-sveltekit/src/lib/services/pipelineManager.ts

+    .with(['Shutdown', 'Running', P.nullish, P._], () => 'Running' as const)
+    .with(['Provisioning', P.any, P.nullish, P._], () => 'Starting up' as const)
+    .with(['Initializing', P.any, P.nullish, P._], () => 'Initializing' as const)
+    .with(['Paused', P.any, P.nullish, 'CompilingSql'], () => 'Compiling sql' as const)


Compiling SQL

lalithsuresh · 2024-07-31T15:25:30Z

web-console-sveltekit/src/lib/services/pipelineManager.ts

+    .with(['Paused', P.any, P.nullish, 'CompilingRust'], () => 'Compiling bin' as const)
+    .with(['Paused', P.any, P.nullish, P._], () => 'Paused' as const)
+    .with(['Running', P.any, P.nullish, P._], () => 'Running' as const)
+    .with(['ShuttingDown', P.any, P.nullish, P._], () => 'ShuttingDown' as const)


Shutting Down

lalithsuresh · 2024-07-31T15:25:52Z

web-console-sveltekit/src/lib/services/pipelineManager.ts

+    .with(['Shutdown', P.any, P.nullish, 'Success'], () => 'Shutdown' as const)
+    .with(['Shutdown', P.any, P.select(P.nonNullable), P.any], () => 'Shutdown' as const)
+    .with(['Shutdown', 'Running', P.nullish, P._], () => 'Running' as const)
+    .with(['Provisioning', P.any, P.nullish, P._], () => 'Starting up' as const)


=> 'Provisioning' sounds clearer

gz · 2024-07-31T16:21:23Z

The return code should be an error (issuing an action which is not valid in a certain status).

I think for logging we should only log error! if something goes really wrong, returning an error status code to the user that's not due to an unexpected thing happening in the manager should probably just be warn! or info!

(I think there were cases even before this where we logged error! in the manager when we shouldn't have so maybe it's still the same issue)

lalithsuresh · 2024-07-31T16:41:21Z

Yeah I noticed this too after coming back to the pipeline manager after a while. We shouldn't be spamming error! logs when the user isn't doing anything wrong. So either these are not legitimate errors, and shouldn't be logged, or the UI shouldn't be doing whatever it was that's causing these error logs.

snkas · 2024-07-31T17:04:43Z

We could distinguish that error-level log messages should only occur when there was truly an error at the api-server.

In this case, the UI is issuing an invalid request (we can diagnose why), so it should get back an HTTP error though (maybe log level should just be info of it though, or even debug).

gz · 2024-07-31T17:16:03Z

In this case, the UI is issuing an invalid request (we can diagnose why),

For a user it's confusing to see errors in a log if they aren't really errors for him -- for debugging our own code a lower log-level should suffice. Sending HTTP errors back is totally fine, just doesn't need to be logged as an error.

gz

I read every character!

gz · 2024-07-31T21:40:50Z

bruno_api_preset/bruno.json

@@ -0,0 +1,9 @@
+{


can we move this out of basedir into web-console?

gz · 2024-07-31T21:41:25Z

crates/adapters/src/format/parquet/test.rs

@@ -117,7 +118,13 @@ fn parquet_output() {
    let mut encoder = ParquetEncoder::new(
        Box::new(consumer),
        config,
-        Relation::new("TestStruct2", false, TestStruct2::schema(), false),
+        Relation::new(


probably should make this a method on teststruct2, but not so important

gz · 2024-07-31T21:42:07Z

crates/pipeline-types/src/config.rs

@@ -44,6 +41,7 @@ fn default_tracing() -> bool {

 /// Pipeline configuration specified by the user when creating
 /// a new pipeline instance.
+/// TODO: change this description


can it be fixed now?

Yes, changed to:

/// Overall pipeline configuration. /// /// It is generated upon the deployment of a pipeline and contains /// the shape of the overall pipeline configuration. /// /// Its input and output endpoints are generated based on the schema /// of the compiled program. The runtime configuration is directly /// provided by the user. Storage configuration, if applicable, /// is set by the runner.

gz · 2024-07-31T21:43:02Z

crates/pipeline-types/src/config.rs

@@ -74,6 +72,16 @@ pub struct PipelineConfig {
    pub outputs: BTreeMap<Cow<'static, str>, OutputEndpointConfig>,
 }

+impl PipelineConfig {
+    pub fn from_yaml(s: &str) -> Self {


doesn't need to be fixed now but maybe we can make an issue to settle on one config format, e.g., keep json everywhere or switch everything to a human friendly format, but ideally not have both

Yes, currently it uses YAML in the database but JSON for the API, I'm in favor of the latter I think

Will make a note of it, but likely not this PR

gz · 2024-07-31T21:45:50Z

crates/pipeline_manager/src/api/pipeline.rs

-    version: Version,
+// REGULAR ENDPOINTS
+
+/// Extended pipeline descriptor with code being optionally included.


maybe it should just always have the code?

The motivation is that with many pipelines could become a large response if polled regularly.

It would simplify it though -- let's see in terms of API poll performance how we fare

gz · 2024-07-31T21:48:19Z

crates/pipeline_manager/src/api/pipeline.rs

+            , body = ErrorResponse
+            , example = json!(examples::error_duplicate_name())),
+        (status = BAD_REQUEST
+            , description = "Pipeline is not shutdown"


Suggested change

, description = "Pipeline is not shutdown"

, description = "Pipeline needs to be shutdown to be modified"

Done, updated in three places (full update, partial update, and delete).

gz · 2024-07-31T21:52:27Z

web-console-sveltekit/README.md

Adding general comment here but we should rm -rf web-console and mv web-console-sveltekit to web-console

snkas · 2024-08-01T12:08:22Z

Rebased on latest main

lalithsuresh · 2024-08-01T14:56:19Z

openapi.json

@@ -2913,7 +2913,7 @@
            }
          }
        ],
-        "description": "Pipeline configuration specified by the user when creating\na new pipeline instance.\nTODO: change this description\n\nThis is the shape of the overall pipeline configuration. It encapsulates a\n[`RuntimeConfig`], which is the publicly exposed way for users to configure\npipelines."
+        "description": "Overall pipeline configuration.\n\nIt is generated upon the deployment of a pipeline and contains\nthe shape of the overall pipeline configuration.\n\nIts input and output endpoints are generated based on the schema\nof the compiled program. The runtime configuration is directly\nprovided by the user. Storage configuration, if applicable,\nis set by the runner."


Is this for the PipelineConfig? The first two sentences are redundant and don't tell the user much about the object. Just say: "Pipeline's runtime configuration, including the number of workers, resource allocations, and connectors used. It represents configuration entries provided by the user (...) and entries derived from the program and system deployment (connectors...)"

It is part of the OpenAPI spec because it is a generated field deployment_config, which the user has in the extended pipeline descriptor. I agree I wrote it a bit verbose, will adjust the text 👍

gz · 2024-08-01T16:42:58Z

@Karakatiza666 the pipeline settings no longer shows after click for me on latest commit

Screencast.from.08-01-2024.09.41.06.AM.webm

lalithsuresh · 2024-08-02T02:22:40Z

@snkas Some bugs to fix.

When following the REST API tutorial, I noticed a few things. I created the following file:

➜  /tmp cat program.json
{
    "name": "sc-program",
    "description": "Supply Chain program",
    "runtime_config": {"workers": 4},
    "program_config": {},
    "program_code": "
CREATE TABLE vendor (                                    \n
    id BIGINT NOT NULL PRIMARY KEY,                      \n
    name VARCHAR,                                        \n
    address VARCHAR                                      \n
);                                                       \n
                                                         \n
CREATE TABLE part (                                      \n
    id bigint NOT NULL PRIMARY KEY,                      \n
    name VARCHAR                                         \n
);                                                       \n
                                                         \n
CREATE TABLE price (                                     \n
    part BIGINT NOT NULL,                                \n
    vendor BIGINT NOT NULL,                              \n
    price DECIMAL                                        \n
);                                                       \n
                                                         \n
CREATE VIEW low_price                                    \n
    (part, price)                                        \n
    AS                                                   \n
    SELECT                                               \n
        price.part AS part,                              \n
        MIN(price.price) AS price                        \n
    FROM price                                           \n
    GROUP BY price.part;                                 \n
                                                         \n
CREATE MATERIALIZED VIEW preferred_vendor                \n
    (part_id, part_name, vendor_id, vendor_name, price)  \n
    AS                                                   \n
    SELECT                                               \n
        part.id AS part_id,                              \n
        part.name AS part_name,                          \n
        vendor.id AS vendor_id,                          \n
        vendor.name AS vendor_name,                      \n
        price.price AS price                             \n
    FROM                                                 \n
        price, part, vendor, low_price                   \n
    WHERE                                                \n
        price.price = low_price.price AND                \n
        price.part = low_price.part AND                  \n
        part.id = price.part AND                         \n
        vendor.id = price.vendor;                        \n
"
}

I tried to use PUT /v0/pipelines/{name}. I started without "name", or the two config fields. I was forced to add each of them by consecutive error messages that asked for the missing fields. The config fields should be optional.
Minor: After that however, I get the following error for PUT, which was confusing (note the names is off in the URL vs the program which I fixed afterwards). PUT doesn't usually signal "rename" to a user, so perhaps just give a better error message here:

➜  /tmp curl -i -X PUT http://localhost:8080/v0/pipelines/sc-pipeline \
-H 'Content-Type: application/json' \
-d @program.json
HTTP/1.1 400 Bad Request
content-length: 119
vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers
content-type: application/json
access-control-allow-credentials: true
date: Fri, 02 Aug 2024 00:32:47 GMT

{"message":"Cannot rename a pipeline which does not exist","error_code":"CannotRenameNonExistingPipeline","details":{}}%

Source position error when there is a connector misconfiguration. I kept going where the editor directed me to (line 6) but what the error message actually means is line 6 within the create table statement on line 13.

snkas · 2024-08-02T10:00:42Z

The config fields should be optional.
@lalithsuresh

The biggest downside to making them optional is that any typos will go unnoticed as the property is now optional (e.g., runtiem_config: { workers: 10 } would just be ignored and instead the default used). We could address this by adding disallowing unknown fields, but that would be inconsistent behavior with all the other serialized types in the nesting of runtime_config and program_config, and as well might affect compatibility in future.

The primary form factor where this is an inconvenience is doing direct API calls, which now that the CURL tutorial is rewritten (thank you :) ) people will use as basis in any case. I'm leaning towards keeping it explicit for now ({} = set to default) and see, what do you think?

snkas · 2024-08-02T12:02:59Z

error for PUT, which was confusing
@lalithsuresh

Replaced with the error:

The pipeline name in the request body does not match the one provided in the URL path. This is not allowed when no pipeline with the name provided in the URL path exists.

Source position error when there is a connector misconfiguration
@lalithsuresh

I've added the relation name now to the error and added that the position is within the string, for example:

 ConnectorGenerationError:
relation 'example': property 'connectors' has value '[{
        "transport2": {
         "name": "datagen",
         "config": {
            "plan": [ { "rate": 1000 } ]
         }
      }
    }]' which is invalid: deserialization failed: missing field `transport` at line 8 column 5 (position is within the string itself)

Exact positional indication in the SQL program will require additional information from the SQL compiler.

mihaibudiu · 2024-08-02T14:24:37Z

The compiler can provide the starting point in the source sql program for the connectors string. If that's enough I can add a property to the Metadata json

snkas · 2024-08-02T14:31:02Z

Ideally it would be just for each property such that in the future if we have new or different property expectations, it line/column number can be used for those as well.

We can for now have a dedicated one for the connectors key, where in the ProgramSchema is Metadata located though? I cannot seem to find it?

mihaibudiu · 2024-08-02T14:39:37Z

I call everything metadata including the properties themselves

Signed-off-by: Simon Kassing <simon.kassing@feldera.com>

Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>