mirror of
https://github.com/internetarchive/warcprox.git
synced 2025-01-18 13:22:09 +01:00
* master: Update docstring Move Warcprox-Meta header construction to warcproxy Improve test_writer tests Replace timestamp parameter with more generic request/response syntax Return capture timestamp Swap fcntl.flock with fcntl.lockf Unit test fix for Python2 compatibility Test WarcWriter file locking when no_warc_open_suffix=True Rename writer var and add exception handling Acquire and exclusive file lock when not using .open WARC suffix Add hidden --no-warc-open-suffix CLI option Fix missing dummy url param in bigtable lookup method back to dev version number version 2.2 for pypi to address https://github.com/internetarchive/warcprox/issues/42 Expand comment with limit=-1 explanation Drop unnecessary split for newline in CDX results fix benchmarks (update command line args) Update CdxServerDedup lookup algorithm Pass url instead of recorded_url obj to dedup lookup methods Filter out warc/revisit records in CdxServerDedup Improve CdxServerDedup implementation Fix minor CdxServerDedup unit test Fix bug with dedup_info date encoding Add mock pkg to run-tests.sh Add CdxServerDedup unit tests and improve its exception handling Add CDX Server based deduplication cryptography lib version 2.1.1 is causing problems Revert changes to test_warcprox.py Revert changes to bigtable and dedup Revert warc to previous behavior Update unit test Replace invalid warcfilename variable in playback Stop using WarcRecord.REFERS_TO header and use payload_digest instead greatly simplify automated test setup by reusing initialization code from the command line executable; this also has the benefit of testing that initialization code avoid TypeError: 'NoneType' object is not iterable exception at shutdown wait for rethinkdb indexes to be ready Remove deleted ``close`` method call from test. bump dev version number after merging pull requests Add missing "," in deps Remove tox.ini, move warcio to test_requires allow very long request header lines, to support large warcprox-meta header values Remove redundant stop() & sync() dedup methods Remove redundant close method from DedupDb and RethinkDedupDb Remove unused imports Add missing packages from setup.py, add tox config. fix python2 tests don't use http.client.HTTPResponse.getheader() to get the content-type header, because it can return a comma-delimited string no SIGQUIT on windows, so no SIGQUIT handler https://github.com/internetarchive/warcprox/pull/32 warrants a version bump fix --size option (https://github.com/internetarchive/warcprox/issues/31) fix --playback-port option (https://github.com/internetarchive/warcprox/issues/29) fix zero-indexing of warc_writer_threads so they can be disabled via empty list