Noah Levitt
71221dbe54
minimize impact of down server
...
The last approach was not good, timeout of 0.1 seconds was too short. A
bunch of stuff has to happen in the timeout period inside of
rethinkdb.connect(). It doesn't offer a way to set only the socket
timeout. Even a timeout of 0.5 seconds results in a noticeable error
rate.
The new approach is to put a server in the penalty box for 5 minutes
when it errors. While the server is in the penalty box, we don't try to
connect to it, unless all the servers are in the penalty box, in which
case we try the server that errored least recently.
2018-11-02 18:05:18 +00:00
Noah Levitt
72d6e4d39b
bump version number
2018-11-01 19:23:28 +00:00
Noah Levitt
8ec2d853b6
connect quickly when a server is down
...
Another tweak to that end. We have observed that when a rethinkdb server
is offline, an attempt to connect to it takes a second or two to time
out. On the other hand, if the host is up but the port is not open
(rethinkdb is not running or something like that), the connection
failure happens very quickly.
To achieve good performance in case a rethinkdb server is down, we are
now setting a timeout on the connect() call. The timeout starts at
0.1 sec, for quick retry, and backs off up to 10 sec in case of repeated
failures.
2018-11-01 19:17:50 +00:00
Noah Levitt
59c8e7b8cd
bump version after merge
2018-10-29 16:58:04 -07:00
jkafader
8f772d9d29
Merge pull request #11 from internetarchive/conn-backoff
...
retry quickly on first connection failure then...
2018-10-29 14:05:08 -07:00
Noah Levitt
e188c83063
retry quickly on first connection failure then...
...
... back off on each subsequent retry
2018-10-29 13:34:45 -07:00
Noah Levitt
edf68edaa2
pass through extra args to run()
2018-09-28 12:27:13 -07:00
Noah Levitt
a43fb74464
bump version after merge
2018-09-27 13:40:01 -07:00
jkafader
acf9d8918a
Merge pull request #10 from internetarchive/err-iterating-2
...
fail after 20 "recoverable" exception in iterator
2018-09-27 13:22:03 -07:00
Noah Levitt
c5b1b0a620
fail after 20 "recoverable" exception in iterator
...
it turns out that when iterating over results sometimes (always?) errors
that are recoverable when running a query are not recoverable, so we've
been ending up in infinite loops
2018-09-27 12:56:30 -07:00
Noah Levitt
a9f764fb45
bump version after merge
2018-09-18 15:58:09 -07:00
jkafader
99398a83ba
Merge pull request #9 from internetarchive/err-iterating
...
handle recoverable errors that happen while iterating over result
2018-09-18 15:56:55 -07:00
Noah Levitt
f66656fa77
MagicMock is not an iterable in pypy (2) either
2018-09-17 13:28:36 -07:00
Noah Levitt
f347407c5b
making tests work better and pass
...
but not in python 2.7. mock.MagicMock is not an iterator there
apparently :(
2018-09-17 13:24:59 -07:00
Noah Levitt
968513cdb5
handle recoverable errors that happen while
...
iterating over results!
2018-09-17 12:03:51 -07:00
Noah Levitt
95c4cff838
test rethinker error handling, exposing fact that
...
recoverable errors that happen while iterating over results are not
caught
2018-09-17 11:58:44 -07:00
Noah Levitt
7692992676
fix dumb bug
2018-03-22 16:01:21 -07:00
Noah Levitt
efa01d40ac
test exposing dumb bug
2018-03-22 16:00:44 -07:00
Noah Levitt
5cbfe18f9e
make service registry table name configurable
2017-10-10 11:05:32 -07:00
Noah Levitt
c02c4b7d2c
new api parse_rethinkdb_url()
2017-10-09 17:22:26 -07:00
Noah Levitt
3ad24d8a08
bump dev version number after merging pull request
2017-10-03 16:41:10 -07:00
Noah Levitt
e5b2e2c327
Merge pull request #8 from internetarchive/adds-cron-garbage-collector
...
Adds cron garbage collector
2017-10-03 16:36:38 -07:00
James Kafader
e1b4153712
clean up small items, typos, change command name, clean up tests in re: exit code testing.
2017-10-03 16:13:42 -07:00
James Kafader
df7c0b8e32
added tests for purging stale services and minimal tests for command line tool
2017-10-03 14:38:31 -07:00
James Kafader
dd5b2122cf
improve the git diff here so this runs
2017-10-03 13:56:52 -07:00
James Kafader
a57b4484d3
initial (failing) version of tests file for CLI, changes to CLI to get it minimally working
2017-10-03 13:56:32 -07:00
James Kafader
8f5232ac73
a few more revisions after consultation with noah.
2017-09-26 17:00:17 -07:00
James Kafader
872ef2d93b
changed after reviewing merge request
2017-09-26 16:43:37 -07:00
James Kafader
a877fa0fd8
Adds cron garbage collector
2017-09-26 15:51:11 -07:00
Noah Levitt
03e641549e
use new index "role" in service registry
2017-09-06 17:25:35 -07:00
Noah Levitt
43cbcdf644
try again travis-ci
2017-06-27 11:15:41 -07:00
Noah Levitt
7cf33a81ea
retry in case of another type of recoverable error from a rethinkdb operation
2017-06-27 10:58:30 -07:00
Noah Levitt
b063fdc1fb
fix the KeyError bug in unique_service()
2017-05-26 14:52:36 -07:00
Noah Levitt
492c97ad31
have test expose bug in unique_service()
2017-05-26 14:48:18 -07:00
Noah Levitt
9194085d0c
a note about connection pooling
2017-05-22 14:28:22 -07:00
Noah Levitt
def44503bf
bump version after merging pr #5
2017-05-17 12:29:55 -07:00
Noah Levitt
3dbd3f8ae1
Merge pull request #5 from internetarchive/rename-heartbeat-interval-to-ttl
...
rename "heartbeat_interval" -> "ttl", simplify mathematics.
2017-05-17 12:29:07 -07:00
Noah Levitt
d33695e40b
tweak some docs
2017-05-17 12:28:48 -07:00
Noah Levitt
03e9d4eeef
Merge branch 'master' into rename-heartbeat-interval-to-ttl
...
* master:
bump version for pull request just merged and tweak run-tests.sh
avoid database transaction to get current time
make sure this variable is actually defined
correct comment
standardize the concept of 'now' to ensure that the same view of the service is returned from the read and update queries.
2017-05-17 12:15:44 -07:00
Noah Levitt
20857c4e7a
bump version for pull request just merged and tweak run-tests.sh
2017-05-17 12:13:10 -07:00
Noah Levitt
9b8b708c8c
Merge pull request #4 from internetarchive/fixes-no-unique-service-after-nomination
...
Fixes no unique service after nomination
2017-05-17 12:11:59 -07:00
Noah Levitt
158923d88b
avoid database transaction to get current time
2017-05-17 12:11:26 -07:00
James Kafader
a0d17151fe
make sure this variable is actually defined
2017-05-16 18:16:54 -07:00
James Kafader
e1b9451a6c
forgot to multiply the constants by 3
2017-05-16 18:02:03 -07:00
James Kafader
6dc3967bd6
rename "heartbeat_interval" -> "ttl", simplify mathematics.
2017-05-16 14:31:39 -07:00
James Kafader
55331083b3
correct comment
2017-05-16 11:33:15 -07:00
James Kafader
5fbedb0443
Merge branch 'master' into fixes-no-unique-service-after-nomination
2017-05-16 11:31:54 -07:00
James Kafader
78c26186b0
standardize the concept of 'now' to ensure that the same view of the service is returned from the read and update queries.
2017-05-16 11:31:03 -07:00
Noah Levitt
28b8c2eaac
remove accidentally added file __init__.pyc and allow travis-ci failures on non-stable versions of python
2017-05-01 19:56:52 -07:00
Noah Levitt
406a617d01
generalize regex to handle another exception message "Cannot perform read: The primary replica isn't connected to a quorum of replicas. ..."
2017-05-01 15:29:27 -07:00