How perspective interacts with data coming from a c program #1009

stevedanomodolor · 2020-04-07T10:43:00Z

Support Question

...ask your question here.
This question is just to understand how, what I am want to to do would work using perspective because I am a bit lost at the moment. I am planning on using perspective to stream data coming from a c application I wrote. I still don't know how arrow connects my c program to perspective and lets me stream this data. Can someone explain please how that would work. After reading the perspective page and seeing the examples. I am assuming that perspective reads data from an arrow type format and represents that data . Which means we have to constantly be updating the arrow file and also creating and arrow file. How does the read and write work in this case, are you reading from the arrow file.
Also is there a possibility that perspective can read from an arrow table that was created in c code without the need of creating a file.

I would be really grateful, if someone can answer my questions, because, there is a concept I am lacking that isn't letting me to understand.

sc1f · 2020-04-07T15:48:43Z

Perspective will read data in multiple formats - Arrow files and streams, but also row/column oriented data, Pandas DataFrames, and CSV.

I would start by going through the Concepts documentation as well as the quickstart user guide for Javascript and Python.

In terms of interfacing with data generated from a C program, the easiest way would be to expose your data through a web API, be it in the format of an arrow file or properly formatted JSON that can be accessed in Perspective's Javascript or Python libraries. Perspective reads both file and stream-formatted Arrow, but you will have to implement the transport layer regardless of what data format you choose to use.

stevedanomodolor · 2020-04-07T17:11:58Z

Thank you very much. You have clarified most of of the doubt I have but I still have a bit of problem with concepts you mentioned. Thank you very much because you are really being helpful.

You mentioned .arrow files and streams, I was able to represent data from an arrow format file, but I dont understand what you mean by stream. Are you referring to sending the data without creating a file?

When you refer to stream formatted arrow, do you mean the binary arrow file like a .arrow file(superstore.arrow for example).

One last concept you mentioned is the transport layer, what do you mean by that and please can you send me resources to understand it better, like examples or any documentation?

Thank you Very much.

sc1f · 2020-04-07T17:32:21Z

Using Pyarrow as an example:

def make_arrow(names, data, types=None, legacy=False):
        """Create an arrow binary that can be loaded and manipulated from memory.

        Args:
            names (list): a list of str column names
            data (list): a list of lists containing data for each column
            types (list): an optional list of `pyarrow.type` function references.
                Types will be inferred if not provided.
            legacy (bool): if True, use legacy IPC format (pre-pyarrow 0.15). Defaults to False.

        Returns:
            bytes : a bytes object containing the arrow-serialized output.
        """
        stream = pa.BufferOutputStream()
        arrays = []

        for idx, column in enumerate(data):
            # only apply types if array is present
            kwargs = {}
            if types:
                kwargs["type"] = types[idx]
            arrays.append(pa.array(column, **kwargs))

        batch = pa.RecordBatch.from_arrays(arrays, names)
        table = pa.Table.from_batches([batch])
        writer = pa.RecordBatchStreamWriter(
            stream, table.schema, use_legacy_format=legacy)

        writer.write_table(table)
        return stream.getvalue().to_pybytes()

This method (which we use in our Python testing suite to test all combinations of arrow data without writing anything to file) generates a bytes object that contains arrow data. For specifics I'd suggest looking at the Arrow documentation, but basically yes - in this case the stream does not write anything to disk.

Given that stream, you would be able to write a webserver that serves binary data (over HTTP or websockets) without writing any .arrow files.

We don't have any specific docs for the transport layer, as I'm just using that term to refer to the API from which you serve data into Perspective. That depends on how you want to set up the data server.

stevedanomodolor closed this as completed Apr 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How perspective interacts with data coming from a c program #1009

How perspective interacts with data coming from a c program #1009

stevedanomodolor commented Apr 7, 2020

sc1f commented Apr 7, 2020

stevedanomodolor commented Apr 7, 2020

sc1f commented Apr 7, 2020

How perspective interacts with data coming from a c program #1009

How perspective interacts with data coming from a c program #1009

Comments

stevedanomodolor commented Apr 7, 2020

Support Question

sc1f commented Apr 7, 2020

stevedanomodolor commented Apr 7, 2020

sc1f commented Apr 7, 2020