Another tweak to that end. We have observed that when a rethinkdb server
is offline, an attempt to connect to it takes a second or two to time
out. On the other hand, if the host is up but the port is not open
(rethinkdb is not running or something like that), the connection
failure happens very quickly.
To achieve good performance in case a rethinkdb server is down, we are
now setting a timeout on the connect() call. The timeout starts at
0.1 sec, for quick retry, and backs off up to 10 sec in case of repeated
failures.
it turns out that when iterating over results sometimes (always?) errors
that are recoverable when running a query are not recoverable, so we've
been ending up in infinite loops