cluster computing - How to deal with stale data when doing service discovery with etcd on CoreOS? -

- June 15, 2014

i tinkering coreos , creating cluster based upon it. far, experience coreos on single host quite smooth. things little hazy when comes service discovery. somehow don't overall idea, hence asking here help.

what want have 2 docker containers running first relies on second. if talking pure docker, can solve using linked containers. far, good.

but approach not work across machine boundaries, because docker can not link containers across multiple hosts. wondering how this.

what i've understand far coreos's idea of how deal use etcd service, distributed key-value-store accessible on each host locally via port 4001, not have deal (as consumer of etcd) networking details: access localhost:4001 , you're fine.

so, in head, have idea means when docker provides service spins up, registers (i.e. ip address , port) in local etcd, , etcd takes care of distributing information across network. way, e.g. key-value pairs such as:

redisservice => 192.168.3.132:49236

now, when docker container needs access redisservice, gets ip address , port own local etcd, @ least once information has been distributed across network. far, good.

but have question can not answer, , puzzles me few days: happens when service goes down? cleans data inside of etcd? if not cleaned up, clients try access service no longer there.

the (reliable) solution can think of @ moment making use of etcd's ttl feature data, involves trade-off: either have quite high network traffic, need send heartbeat every few seconds, or have live stale data. both not fine.

the other, well, "solution" can think of make service deregister when goes down, works planned shutdowns, not crashes, power outeages, …

so, how solve this?

there few different ways solve this: sidekick method, using execstoppost , removing on failure. i'm assuming trio of coreos, etcd , systemd, these concepts apply elsewhere too.

the sidekick method

this involves running separate process next main application heartbeats etcd. on simple side, loop runs forever. can use systemd's bindsto ensure when main unit stops, service registration unit stops too. in execstop can explicitly delete key you're setting. we're setting ttl of 60 seconds handle ungraceful stoppage.

[unit] description=announce nginx1.service # binds unit , nginx1 together. when nginx1 stopped, unit stopped too. bindsto=nginx1.service  [service] execstart=/bin/sh -c "while true; etcdctl set /services/website/nginx1 '{ \"host\": \"10.10.10.2\", \"port\": 8080, \"version\": \"52c7248a14\" }' --ttl 60;sleep 45;done" execstop=/usr/bin/etcdctl delete /services/website/nginx1  [install] wantedby=local.target

on complex side, container starts , hits /health endpoint app provides run health check before sending data etcd.

execstoppost

if don't want run beside main app, can have etcdctl commands within main unit run on start , stop. aware, won't catch failures, mentioned.

[unit] description=mywebapp after=docker.service require=docker.service after=etcd.service require=etcd.service  [service] execstart=/usr/bin/docker run -rm -name myapp1 -p 8084:80 username/myapp command execstop=/usr/bin/etcdctl set /services/myapp/%h:8084 '{ \"host\": \"%h\", \"port\": 8084, \"version\": \"52c7248a14\" }' execstoppost=/usr/bin/etcdctl rm /services/myapp/%h:8084  [install] wantedby=local.target

%h systemd variable substitutes in hostname machine. if you're interested in more variable usage, check out coreos getting started systemd guide.

removing on failure

on client side, remove instance have failed connect more x times. if 500 or timeout /services/myapp/instance1 run , keep increasing failure count , try connect other hosts in /services/myapp/ directory.

etcdctl set /services/myapp/instance1 '{ \"host\": \"%h\", \"port\": 8084, \"version\": \"52c7248a14\", \"failures\": 1 }'

when hit desired threshold, remove key etcdctl.

regarding network traffic heartbeating cause – in cases should sending traffic on local private network provider runs should free , fast. etcd heartbeating peers anyways, little increase in traffic.

hop #coreos on freenode if have other questions!

Search This Blog

KBPS

cluster computing - How to deal with stale data when doing service discovery with etcd on CoreOS? -

Comments

Post a Comment

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -