Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace go-etcd with etcd/client #11962

Closed
bcwaldon opened this issue Jul 28, 2015 · 43 comments
Closed

Replace go-etcd with etcd/client #11962

bcwaldon opened this issue Jul 28, 2015 · 43 comments
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@bcwaldon
Copy link

The go-etcd client library (github.com/coreos/go-etcd) is deprecated in favor of the client library that lives directly in the etcd repo (github.com/coreos/etcd/client). This new library is receiving a lot more love from the core etcd development team, and is the current blessed set of golang etcd bindings.

@bcwaldon
Copy link
Author

/cc @yichengq @xiang90 @barakmich

@bprashanth
Copy link
Contributor

Does it reuse tcp/http connections or do we have to port our patch? https://github.com/GoogleCloudPlatform/kubernetes/tree/master/third_party/forked/coreos/go-etcd

@bcwaldon
Copy link
Author

I can't tell you for sure, but I do see that the client sets a default KeepAlive value on a customer net.Dialer: https://github.com/coreos/etcd/blob/master/client/client.go#L43. Maybe the etcd team can chime in here.

@bprashanth
Copy link
Contributor

@lavalamp @wojtek-t

@yichengq
Copy link
Contributor

Does it reuse tcp/http connections or do we have to port our patch?

@bprashanth You still need to set MaxIdleConnsPerHost for Transport in client.Config. But you don't need to fork anything, and you could set the transport used when creating the client.

The benefits I could see:

  1. cleaner API. keysAPI has only 7 calls now(http://godoc.org/github.com/coreos/etcd/client#KeysAPI), and it is easily extensible.
  2. It support members and auth API.
  3. it is thread-safe.

Possible drawbacks:

  1. It doesn't fasten its unmarshal function yet. Here is the issue to track it: use fast unmarshalJson code in etcd/client etcd-io/etcd#3193

@wojtek-t
Copy link
Member

I think that we should wait until etcd-io/etcd#3193 is fixed (it doesn't seem to be a lot of work). Once this is fixed we should be able to migrate to the new library relatively quickly.

@timothysc
Copy link
Member

2 questions:

  1. Do we have a time-horizon on this integration?
  2. Given that the change could be destabilizing, do we have a vetting plan, e.g. (-devel) vs. 1.1? The previous client had a lot of cycles under its belt prior to 1.0, and I would feel far more comfortable putting such changes into a -devel series.

cc @smarterclayton @bgrant0607 @dchen1107

@bgrant0607 bgrant0607 added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed team/master labels Aug 13, 2015
@bgrant0607
Copy link
Member

What problems would this solve? What performance gain could be expected? What bugs would it fix? Will the existing library be deprecated or deleted at some point?

If we need to make the change in the 1.1 timeframe, earlier would be better. If someone could do that and run all the e2e tests that would help us assess the stability risk.

cc @lavalamp

@timothysc
Copy link
Member

@bgrant0607 The key issue of interest is the thread safety issue outlined in #7566 .
This effects HA, and read-perf in a HA env.

@bgrant0607
Copy link
Member

Someone just needs to try it and run all of our e2e tests and benchmarks.

@timothysc
Copy link
Member

@yichengq do you have an ETA?

@yichengq
Copy link
Contributor

@timothysc I plan to finish the left performance issue etcd-io/etcd#3193 in the next week.

@wojtek-t
Copy link
Member

I can try to pick it up then.

@timothysc
Copy link
Member

Per our discussion today, the client is available:
https://github.com/coreos/etcd/tree/master/client + godoc:
https://godoc.org/github.com/coreos/etcd/client

@wojtek-t is isolating the dependencies which appear to be in a lot of locations, including a forked client in server.go.?.?.?

$ grep -r "go-etcd" * | wc -l
39

I'm starting from the helper and updating.

/cc @danmcp

@wojtek-t
Copy link
Member

wojtek-t commented Sep 4, 2015

Thanks @timothysc

One PR slightly reducing dependencies on go-etcd is already LGTMed:
#13509

I have another one pretty close - should send it out for review today and will post its number here.

@wojtek-t
Copy link
Member

wojtek-t commented Sep 4, 2015

But I think that we will need to convert few things anyway. These are:

  • the whole pkg/storage
  • references to it under pkt/tool
  • few reverences to the client itself under test/integration, cmd/ and cluster/addons

@wojtek-t
Copy link
Member

wojtek-t commented Sep 4, 2015

No I didn't touch server.go

However, once #13584 is merged it will be much better:

$ git grep "go-etcd" | wc -l
23

(And changing half of them should be trivial)

@timothysc
Copy link
Member

Deps list for those watching: etcd-io/etcd#3471

@smarterclayton
Copy link
Contributor

I'd like more context behind the second bullet - that doesn't make a lot of
sense to me to require a database to be running to run the bulk of our unit
tests that touch the storage path.

On Wed, Oct 14, 2015 at 12:51 PM, Timothy St. Clair <
notifications@github.com> wrote:

Just to give a complete update..

  • 1st commit around contexts is in.
  • We've agreed upon removing the FakeClient and stading up etcd per
    unit test that needs it. Currently block on godep issues
  • Code exists to transition to new client, but there is some runtime
    error that is masked by the FakeClient and only visible by the integration
    tests.


Reply to this email directly or view it on GitHub
#11962 (comment)
.

@timothysc
Copy link
Member

The SIG agreed that the FakeClient doesn't make much sense going forwards as the semantics change with both the current mod, as well as the upcoming etcd3 mods. We are basically fixing the fake client to match the shift, while it still masks behavioral differences between clients.

Thus standing up etcd programmatically is relatively low cost and can be done on an as needed basis.

#15392

@smarterclayton
Copy link
Contributor

Can you explain why FakeClient doesn't make sense? Emulating deep etcd
behavior I agree is not productive. But higher level tests having to start
etcd concerns me more.

On Wed, Oct 14, 2015 at 1:17 PM, Timothy St. Clair <notifications@github.com

wrote:

The SIG agreed that the FakeClient doesn't make much sense going forwards
as the semantics change with both the current mod, as well as the upcoming
etcd3 mods. We are basically fixing the fake client to match the shift,
while it still masks behavioral differences between clients.

Thus standing up etcd programmatically is relatively low cost and can be
done on an as needed basis.

#15392 #15392


Reply to this email directly or view it on GitHub
#11962 (comment)
.

@timothysc
Copy link
Member

Can you explain why FakeClient doesn't make sense? Emulating deep etcd behavior I agree is not productive.

The updated client libraries change both the api and behaviour. On those modifications, it forces a shift in the FakeClient to match on both fronts. So shift now + shift later ...

But higher level tests having to start etcd concerns me more.

We counted ~15 files, and not all tests within those files. In general, the code has not properly abstracted the storage layer, and that concerns me even more.

Removing the FakeClient ensures that we are testing reality as this changes. Ideally we hide behind the storage layer for all of this.

@smarterclayton
Copy link
Contributor

Ok, thanks for the clarification.

On Wed, Oct 14, 2015 at 3:12 PM, Timothy St. Clair <notifications@github.com

wrote:

Can you explain why FakeClient doesn't make sense? Emulating deep etcd
behavior I agree is not productive.

The updated client libraries change both the api and behaviour. On those
modifications, it forces a shift in the FakeClient to match on both fronts.
So shift now + shift later ...

But higher level tests having to start etcd concerns me more.

We counted ~15 files, and not all tests within those files. In general,
the code has not properly abstracted the storage layer, and that concerns
me even more.

Removing the FakeClient ensures that we are testing reality as this
changes. Ideally we hide behind the storage layer for all of this.


Reply to this email directly or view it on GitHub
#11962 (comment)
.

@timothysc
Copy link
Member

For update, code needed to replace fakeClient is now I'm in...
Currently in flight on replacement, and will remove /tool dependencies and shift to storage as we go.

@timothysc
Copy link
Member

Storage layer is nearly completely isolated only a couple of minor changes needed now.

@wojtek-t
Copy link
Member

There are at least two places where this should be relatively simple to get rid of the dependency:

  • contrib/mesos
  • pkg/master/master_test.go
    I will try to do at least second one tomorrow.

@wojtek-t
Copy link
Member

Also - I would like to eliminate tools.EtcdClient interface completely,

@timothysc
Copy link
Member

agreed

@liggitt
Copy link
Member

liggitt commented Dec 9, 2015

any update on this? I'm interested in being able to plumb a quorum read in a couple particular cases, and only the new client exposes that option

@timothysc
Copy link
Member

I'll get the PR in this week, we've finished off the isolation to the storage layer this week.

Pending merge on: #18383 & #18163

@yuvipanda
Copy link
Contributor

/me registers interest in this since that'll probably allow easier usage of etcd ACLs

@timothysc
Copy link
Member

@wojtek-t we can finally close this one now.

@wojtek-t
Copy link
Member

I think there is two more actions to do here

  • replace go-etcd with new etcd client in cluster/addons/dns/kube2sky
  • removed go-etcd dependency completely.

@timothysc
Copy link
Member

There is a separate issue for that. #18791

@wojtek-t
Copy link
Member

OK - then I guess we can close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests