Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How perspective interacts with data coming from a c program #1009

Closed
stevedanomodolor opened this issue Apr 7, 2020 · 3 comments
Closed

Comments

@stevedanomodolor
Copy link

Support Question

...ask your question here.
This question is just to understand how, what I am want to to do would work using perspective because I am a bit lost at the moment. I am planning on using perspective to stream data coming from a c application I wrote. I still don't know how arrow connects my c program to perspective and lets me stream this data. Can someone explain please how that would work. After reading the perspective page and seeing the examples. I am assuming that perspective reads data from an arrow type format and represents that data . Which means we have to constantly be updating the arrow file and also creating and arrow file. How does the read and write work in this case, are you reading from the arrow file.
Also is there a possibility that perspective can read from an arrow table that was created in c code without the need of creating a file.

I would be really grateful, if someone can answer my questions, because, there is a concept I am lacking that isn't letting me to understand.

@sc1f
Copy link
Contributor

sc1f commented Apr 7, 2020

Perspective will read data in multiple formats - Arrow files and streams, but also row/column oriented data, Pandas DataFrames, and CSV.

I would start by going through the Concepts documentation as well as the quickstart user guide for Javascript and Python.

In terms of interfacing with data generated from a C program, the easiest way would be to expose your data through a web API, be it in the format of an arrow file or properly formatted JSON that can be accessed in Perspective's Javascript or Python libraries. Perspective reads both file and stream-formatted Arrow, but you will have to implement the transport layer regardless of what data format you choose to use.

@stevedanomodolor
Copy link
Author

Thank you very much. You have clarified most of of the doubt I have but I still have a bit of problem with concepts you mentioned. Thank you very much because you are really being helpful.

You mentioned .arrow files and streams, I was able to represent data from an arrow format file, but I dont understand what you mean by stream. Are you referring to sending the data without creating a file?

When you refer to stream formatted arrow, do you mean the binary arrow file like a .arrow file(superstore.arrow for example).

One last concept you mentioned is the transport layer, what do you mean by that and please can you send me resources to understand it better, like examples or any documentation?

Thank you Very much.

@sc1f
Copy link
Contributor

sc1f commented Apr 7, 2020

Using Pyarrow as an example:

def make_arrow(names, data, types=None, legacy=False):
        """Create an arrow binary that can be loaded and manipulated from memory.

        Args:
            names (list): a list of str column names
            data (list): a list of lists containing data for each column
            types (list): an optional list of `pyarrow.type` function references.
                Types will be inferred if not provided.
            legacy (bool): if True, use legacy IPC format (pre-pyarrow 0.15). Defaults to False.

        Returns:
            bytes : a bytes object containing the arrow-serialized output.
        """
        stream = pa.BufferOutputStream()
        arrays = []

        for idx, column in enumerate(data):
            # only apply types if array is present
            kwargs = {}
            if types:
                kwargs["type"] = types[idx]
            arrays.append(pa.array(column, **kwargs))

        batch = pa.RecordBatch.from_arrays(arrays, names)
        table = pa.Table.from_batches([batch])
        writer = pa.RecordBatchStreamWriter(
            stream, table.schema, use_legacy_format=legacy)

        writer.write_table(table)
        return stream.getvalue().to_pybytes()

This method (which we use in our Python testing suite to test all combinations of arrow data without writing anything to file) generates a bytes object that contains arrow data. For specifics I'd suggest looking at the Arrow documentation, but basically yes - in this case the stream does not write anything to disk.

Given that stream, you would be able to write a webserver that serves binary data (over HTTP or websockets) without writing any .arrow files.

We don't have any specific docs for the transport layer, as I'm just using that term to refer to the API from which you serve data into Perspective. That depends on how you want to set up the data server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants