-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sharding] Remote shard operations via gRPC #505
Conversation
3f8b9fe
to
5de4cf1
Compare
Will need a conflict resolution once #509 is merged as it changes some moving parts regarding the address lookup for peers. |
fn from(value: segment::types::GeoRadius) -> Self { | ||
Self { | ||
center: Some(value.center.into()), | ||
radius: value.radius as f32, // TODO lossy ok? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to document this loss of precision from f64 to f32 somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to declare them with the same type (e.g. both f64 or both f32)? Or there are some additional limitations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we use an f32
and not a f64
here because it corresponds to the float
in the proto. definition.
https://developers.google.com/protocol-buffers/docs/proto3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but they also have double
there which is f64 as far as I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway this can be handled in a separate issue, maybe @generall also has some thoughts on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I'd rather not change this API contract in the middle of this giant PR 👍
5d903e6
to
f6d4068
Compare
fc9d64b
to
0df92b9
Compare
Rebased to master to integrate |
0df92b9
to
e1cd954
Compare
The channel pooling will be introduced later via #518 |
#[cfg(not(feature = "consensus"))] | ||
{ | ||
collection | ||
.scroll_by( | ||
request, | ||
self.segment_searcher.deref(), | ||
shard_selection, | ||
&HashMap::new(), | ||
) | ||
.await | ||
.map_err(|err| err.into()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of duplicating everything, can't we just return HashMap::new()
from the peer_address_by_id
by feature condition?
lib/collection/src/lib.rs
Outdated
} | ||
|
||
pub async fn update_from_client( | ||
&self, | ||
operation: CollectionUpdateOperations, | ||
wait: bool, | ||
ip_to_address: &HashMap<u64, Uri>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the ip_to_address
should be a part of self
, otherwise it's a leak of abstraction. As a developer I should not care about cluster state if I want to make an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this choice as well during the design phase with @e-ivkov .
The source of truth is the RaftState
on the toc:
raft_state: Arc<std::sync::Mutex<PersistentRaftState>>,
exposed via
pub fn peer_address_by_id(&self) -> Result<PeerAddressById, StorageError> {
Ok(self.raft_state.lock()?.peer_address_by_id().clone())
}
Given that the RemoteShard
in the collection module does not see any of the RaftState
types in the storage, we have decided to go with a more explicit approach propagating only the HashMap at the call site.
What would be you preferred way to encapsulate that state safely in the RemoteShard
struct or somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most simple alternative is to have RwLocked or Lock-free shared copy of the peer_address_by_id
, which should be updated on changes in raft state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I will give it a try 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the change in 9e451ac
As we are not building RemoteShard
yet, I left a TODO to mention to share the Arc
in the toc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a future PR, we could introduce Egor's idea regarding having a read-only lock to prevent the remote shards from modifying the mappings.
I decided to keep it out for now for simplicity.
e1cd954
to
cc3456f
Compare
@@ -55,7 +56,7 @@ impl UnappliedEntries { | |||
pub struct Persistent { | |||
state: RaftStateWrapper, | |||
unapplied_entries: UnappliedEntries, | |||
peer_address_by_id: PeerAddressByIdWrapper, | |||
peer_address_by_id: Arc<std::sync::RwLock<PeerAddressByIdWrapper>>, // TODO pass to remote shard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the RemoteShard
instances are not yet created in the code base so I left a TODO
@@ -333,7 +333,7 @@ pub mod collections_client { | |||
where | |||
T: tonic::client::GrpcService<tonic::body::BoxBody>, | |||
T::Error: Into<StdError>, | |||
T::ResponseBody: Default + Body<Data = Bytes> + Send + 'static, | |||
T::ResponseBody: Body<Data = Bytes> + Send + 'static, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the diff in generated code must come from #520
This PR enables the remote shards to perform gRPC calls to resole the shard operations (#500)
Most of the code is data type conversion between gRPC types and domain types.
The main design choice is to start with a naive approach where the network channel is recreated for each call.
Channels pooling will be introduced later to reduce the complexity of this PR.