Skip to content

Fixes to fix for FFI & GC interaction in tcp_connection.pony and file.ponyΒ #2526

Closed
@slfritchie

Description

While working on a Wallaroo data corruption problem, chats with @dipinhora revealed the cause of the data corruption problem & how to fix it. The Wallaroo code https://github.com/WallarooLabs/wallaroo/blob/master/lib/wallaroo/core/sink/tcp_sink/tcp_sink.pony#L105 and elsewhere in that file is closely related to code in packages/net/tcp_connection.pony and packages/files/file.pony

The use of the _pending array in packages/net/tcp_connection.pony appears to be write-only: only calls to .push() and .shift() (ignoring its return value) and .clear() are used. However, that data structure is necessary to prevent a bad interaction between the Pony FFI and GC systems: data written to the TCP socket can be corrupted immediately prior to the writev(2) system call execution. As I learned very recently, the race condition does indeed appear if I remove the seemingly write-only _pending data structure.

Wallaroo's commit WallarooLabs/wally@75f1c39 has a work-around for the data corruption problem. Dipin also put this bugfix into in this repo's tcp_connection.pony that's on the master branch today.

I recommend:

  1. Adding a warning to tcp_connection.pony about _pending's purpose.

  2. Adding the _pending fix to packages/files/file.pony. Its code uses a very similar buffering scheme as tcp_connection.pony's, and therefore it is very likely vulnerable to the same data-corrupting race condition.

Metadata

Assignees

No one assigned

    Labels

    triggers releaseMajor issue that when fixed, results in an "emergency" release

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions