when consumer connect to zk fail, it create a large number connect in short time. #170
Description
I find when consumer client connect to zk fail , it will try to connect zk again very quickly.then it create a lager number of connect in short time.
In this status, consumer client cost too many file handler, and start to show logs like that:
2015/12/08 07:45:02 Failed to connect to 29.1.1.85:3351: dial tcp 29.1.1.85:3351: too many open files
2015/12/08 07:45:02 Failed to connect to 29.1.1.87:3351: dial tcp 29.1.1.87:3351: too many open files
2015/12/08 07:45:02 Failed to connect to 29.1.1.87:3351: dial tcp 29.1.1.87:3351: too many open files
This err cause other processes in the same system can't work abnormal.
In my logs, I get this info:
First, I get logs like this, may be zk have some problem.This
2015/12/08 19:39:00 Failed to connect to 29.1.1.85:3351: dial tcp 29.1.1.85:3351: i/o timeout
2015/12/08 19:39:00 Failed to connect to 29.1.1.85:3351: dial tcp 29.1.1.85:3351: i/o timeout
2015/12/08 19:39:00 Failed to connect to 29.1.1.85:3351: dial tcp 29.1.1.85:3351: i/o timeout
also, it has some panic info in my log:
panic: zk: could not connect to a server
goroutine 1 [running]:
github.com/stealthly/go_kafka_client.NewConsumer(0xc2080e8000, 0xc2080ba110)
/home/zt/git/upload/ops-mgr/src/paas-sdk/src/github.com/stealthly/go_kafka_client/consumer.go:89 +0x628
policy_engine/datamanager/datacollector.NewCollector(0xc2080e8240, 0x7f52712724c8)
/tmp/tmp.jvHyXSdhIY/gopath/src/policy_engine/datamanager/datacollector/datacollector.go:39 +0x378
main.main()
/tmp/tmp.jvHyXSdhIY/gopath/src/policy_engine/worker/heartbeat/main.go:48 +0x307
how can I avoid this problem?
I think it may be didn't close zk connect in some abnormal situation and try to recreate new connect too quickly.
I use old version , maybe two month ago. If this problem had be fixed ,it will be good.