* wip-postfetch-chain:
postfetch chain info for /status and service reg
batch for at least 2 seconds
batch storing for trough dedup
fixes to make tests pass
use batch postfetch processor for stats
don't keep next processor waiting
include RunningStats raw stats in status info
make --profile work again
improve batching, make tests pass
batch trough dedup loader
fix running_stats thing
make run-benchmarks.py work (with no args)
keep running stats
shut down postfetch processors
tests are passing
slightly less incomplete work on new postfetch processor chain
very incomplete work on new postfetch processor chain
Update CdxServerDedup unit test
Chec writer._fname in unit test
Configurable CdxServerDedup urllib3 connection pool size
roll over idle warcs on time
Yet another unit test fix
Change the writer unit test
fix github problem with unit test
Another fix for the unit test
Fix writer unit test
Add WarcWriter warc_filename unit test
Fix warc_filename default value
Configurable WARC filenames
fix logging.notice/trace methods which were masking file/line/function of log message
update test_svcreg_status to expect new fields
change where RunningStats is initialized and fix tests
more stats available from /status (and in rethindkb services table)
timeouts for trough requests to prevent hanging
dropping claim of support for python 2.7 (not worth hacking around tempfile.TemporaryDirectory to make tests pass)
implementation of special prefix "-" which means "do not archive"
test for special warc prefix "-" which means "do not archive"
if --profile is enabled, dump results every ten minutes, as well as at shutdown
* master:
fix running_stats thing
Update CdxServerDedup unit test
Chec writer._fname in unit test
Configurable CdxServerDedup urllib3 connection pool size
Yet another unit test fix
Change the writer unit test
fix github problem with unit test
Another fix for the unit test
Fix writer unit test
Add WarcWriter warc_filename unit test
Fix warc_filename default value
Configurable WARC filenames
To work correctly with the new way we init the
``CdxServerDedup.http_pool``. Use ``mock.MagicMock`` instead of
``mock.patch``. The unit test logic remains entirely the same.
For some reason this test previously failed in github. Maybe it has to
do with the temporary files I need to create there... in any case, I
changed what we check and evaluate the ``write._fname`` for the correct
filename format.
urllib3 pool has default ``maxsize=1``
http://urllib3.readthedocs.io/en/latest/advanced-usage.html.
We need to set a higher value because we get warnings like this:
```
2018-01-15 20:04:10,044 18436 WARNING WarcWriterThread030(tid=18502)
urllib3.connectionpool._put_conn(connectionpool.py:277) Connection pool
is full, discarding connection: wwwb-dedup
```
We set value: ```cdxserver_maxsize = args.writer_threads or 200```.
Note that the ideal would be to use this
https://github.com/internetarchive/warcprox/blob/master/warcprox/main.py#L284
but it is initialized after dedup, there is a dependency and we cannot
use it.