For some reason this test previously failed in github. Maybe it has to
do with the temporary files I need to create there... in any case, I
changed what we check and evaluate the ``write._fname`` for the correct
filename format.
urllib3 pool has default ``maxsize=1``
http://urllib3.readthedocs.io/en/latest/advanced-usage.html.
We need to set a higher value because we get warnings like this:
```
2018-01-15 20:04:10,044 18436 WARNING WarcWriterThread030(tid=18502)
urllib3.connectionpool._put_conn(connectionpool.py:277) Connection pool
is full, discarding connection: wwwb-dedup
```
We set value: ```cdxserver_maxsize = args.writer_threads or 200```.
Note that the ideal would be to use this
https://github.com/internetarchive/warcprox/blob/master/warcprox/main.py#L284
but it is initialized after dedup, there is a dependency and we cannot
use it.
* master:
fix test in py<=3.4
fix failing test, and change response code from 500 to more appropriate 502
failing test for correct handling of "http.client.RemoteDisconnected: Remote end closed connection without response" from remote server
fix oops
better error message for bad WARCPROX_WRITE_RECORD request
fix mistakes in warc write thread profile aggregation
aggregate warc writer thread profiles much like we do for proxy threads
have --profile profile proxy threads as well as warc writer threads
hacky way to fix problem of benchmarks arguments getting stale
* trough-dedup:
py2 fix
automatic segment promotion every hour
move trough client into separate module
pypy and pypy3 are passing at the moment, so why not :)
more cleanly separate trough client code from the rest of TroughDedup
update payload_digest reference in trough dedup for changes in commit 3a0f6e0947
hopefully fix test failing occasionally apparently due to race condition by checking that the file we're waiting for has some content
fix payload digest by pulling calculation up one level where content has already been transfer-decoded
new failing test for correct calculation of payload digest
missed a spot handling case of no warc records written
eh, don't prefix sqlite filenames with 'warcprox-trough-'; logging tweaks
not gonna bother figuring out why pypy regex is not matching https://travis-ci.org/internetarchive/warcprox/jobs/299864258#L615
fix failing test just committed, which involves running "listeners" for all urls, including those not archived; make adjustments accordingly
make test_crawl_log expect HEAD request to be logged
fix crawl log handling of WARCPROX_WRITE_RECORD request
modify test_crawl_log to expect crawl log to honor --base32 setting and add tests of WARCPROX_WRITE_RECORD request and HEAD request (not written to warc)
bump dev version number
add --crawl-log-dir option to fix failing test