Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sharding] Remote shard operations via gRPC #505

Merged
merged 10 commits into from
Apr 29, 2022

Conversation

agourlay
Copy link
Member

@agourlay agourlay commented Apr 22, 2022

This PR enables the remote shards to perform gRPC calls to resole the shard operations (#500)

Most of the code is data type conversion between gRPC types and domain types.

The main design choice is to start with a naive approach where the network channel is recreated for each call.
Channels pooling will be introduced later to reduce the complexity of this PR.

@agourlay agourlay changed the title Remote shard operations via gRPC WIP [sharding] Remote shard operations via gRPC Apr 22, 2022
@agourlay agourlay force-pushed the remote-shard-operation-grpc branch from 3f8b9fe to 5de4cf1 Compare April 25, 2022 12:48
@agourlay agourlay changed the title WIP [sharding] Remote shard operations via gRPC [sharding] Remote shard operations via gRPC Apr 26, 2022
@agourlay agourlay marked this pull request as ready for review April 26, 2022 16:02
@agourlay
Copy link
Member Author

Will need a conflict resolution once #509 is merged as it changes some moving parts regarding the address lookup for peers.

fn from(value: segment::types::GeoRadius) -> Self {
Self {
center: Some(value.center.into()),
radius: value.radius as f32, // TODO lossy ok?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to document this loss of precision from f64 to f32 somewhere?

Copy link
Contributor

@e-ivkov e-ivkov Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to declare them with the same type (e.g. both f64 or both f32)? Or there are some additional limitations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we use an f32 and not a f64 here because it corresponds to the float in the proto. definition.
https://developers.google.com/protocol-buffers/docs/proto3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but they also have double there which is f64 as far as I understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway this can be handled in a separate issue, maybe @generall also has some thoughts on this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I'd rather not change this API contract in the middle of this giant PR 👍

@agourlay agourlay force-pushed the remote-shard-operation-grpc branch 2 times, most recently from fc9d64b to 0df92b9 Compare April 27, 2022 17:45
@agourlay
Copy link
Member Author

Rebased to master to integrate ip_to_address using Uri instead of SocketAddr.

@agourlay
Copy link
Member Author

The channel pooling will be introduced later via #518

@agourlay agourlay linked an issue Apr 28, 2022 that may be closed by this pull request
Comment on lines 626 to 637
#[cfg(not(feature = "consensus"))]
{
collection
.scroll_by(
request,
self.segment_searcher.deref(),
shard_selection,
&HashMap::new(),
)
.await
.map_err(|err| err.into())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating everything, can't we just return HashMap::new() from the peer_address_by_id by feature condition?

}

pub async fn update_from_client(
&self,
operation: CollectionUpdateOperations,
wait: bool,
ip_to_address: &HashMap<u64, Uri>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the ip_to_address should be a part of self, otherwise it's a leak of abstraction. As a developer I should not care about cluster state if I want to make an update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this choice as well during the design phase with @e-ivkov .

The source of truth is the RaftState on the toc:

raft_state: Arc<std::sync::Mutex<PersistentRaftState>>,

exposed via

pub fn peer_address_by_id(&self) -> Result<PeerAddressById, StorageError> {
     Ok(self.raft_state.lock()?.peer_address_by_id().clone())
}

Given that the RemoteShard in the collection module does not see any of the RaftState types in the storage, we have decided to go with a more explicit approach propagating only the HashMap at the call site.

What would be you preferred way to encapsulate that state safely in the RemoteShard struct or somewhere else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most simple alternative is to have RwLocked or Lock-free shared copy of the peer_address_by_id, which should be updated on changes in raft state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I will give it a try 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the change in 9e451ac

As we are not building RemoteShard yet, I left a TODO to mention to share the Arc in the toc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future PR, we could introduce Egor's idea regarding having a read-only lock to prevent the remote shards from modifying the mappings.
I decided to keep it out for now for simplicity.

@agourlay agourlay force-pushed the remote-shard-operation-grpc branch from e1cd954 to cc3456f Compare April 29, 2022 09:06
@@ -55,7 +56,7 @@ impl UnappliedEntries {
pub struct Persistent {
state: RaftStateWrapper,
unapplied_entries: UnappliedEntries,
peer_address_by_id: PeerAddressByIdWrapper,
peer_address_by_id: Arc<std::sync::RwLock<PeerAddressByIdWrapper>>, // TODO pass to remote shard
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the RemoteShard instances are not yet created in the code base so I left a TODO

@@ -333,7 +333,7 @@ pub mod collections_client {
where
T: tonic::client::GrpcService<tonic::body::BoxBody>,
T::Error: Into<StdError>,
T::ResponseBody: Default + Body<Data = Bytes> + Send + 'static,
T::ResponseBody: Body<Data = Bytes> + Send + 'static,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the diff in generated code must come from #520

@agourlay agourlay requested a review from generall April 29, 2022 13:42
@agourlay agourlay merged commit 6f93964 into master Apr 29, 2022
@agourlay agourlay deleted the remote-shard-operation-grpc branch April 29, 2022 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[sharding] Implement ShardOperation for RemoteShard using gRPC
3 participants