19 Commits

Author SHA1 Message Date
Adam Miller
1e3d22aba4 Better handle non-ascii urls for crawl log hop info 2022-04-20 22:48:28 +00:00
Noah Levitt
d834ac3e59 only run tests in py3 2018-05-16 14:21:18 -07:00
Noah Levitt
ed49eea4d5 Merge branch 'master' into trough-dedup
* master:
  Update docstring
  Move Warcprox-Meta header construction to warcproxy
  Improve test_writer tests
  Replace timestamp parameter with more generic request/response syntax
  Return capture timestamp
  Swap fcntl.flock with fcntl.lockf
  Unit test fix for Python2 compatibility
  Test WarcWriter file locking when no_warc_open_suffix=True
  Rename writer var and add exception handling
  Acquire and exclusive file lock when not using .open WARC suffix
  Add hidden --no-warc-open-suffix CLI option
  Fix missing dummy url param in bigtable lookup method
  back to dev version number
  version 2.2 for pypi to address https://github.com/internetarchive/warcprox/issues/42
  Expand comment with limit=-1 explanation
  Drop unnecessary split for newline in CDX results
  fix benchmarks (update command line args)
  Update CdxServerDedup lookup algorithm
  Pass url instead of recorded_url obj to dedup lookup methods
  Filter out warc/revisit records in CdxServerDedup
  Improve CdxServerDedup implementation
  Fix minor CdxServerDedup unit test
  Fix bug with dedup_info date encoding
  Add mock pkg to run-tests.sh
  Add CdxServerDedup unit tests and improve its exception handling
  Add CDX Server based deduplication
  cryptography lib version 2.1.1 is causing problems
  Revert changes to test_warcprox.py
  Revert changes to bigtable and dedup
  Revert warc to previous behavior
  Update unit test
  Replace invalid warcfilename variable in playback
  Stop using WarcRecord.REFERS_TO header and use payload_digest instead
2017-11-02 16:34:52 -07:00
Vangelis Banos
59e995ccdf Add mock pkg to run-tests.sh 2017-10-19 22:22:14 +00:00
Noah Levitt
828a2c3dcf get all the tests to pass with ./tests/run-tests.sh 2017-10-13 15:54:05 -07:00
Noah Levitt
369dc5c124 install and run trough in docker container for testing 2017-10-11 17:28:47 -07:00
Noah Levitt
d177b3b80d change rethinkdb-related command line options to use "rethinkdb urls" (parser just added to doublethink) to reduce the proliferation of rethinkdb options, and add --rethinkdb-trough-db-url option 2017-10-11 12:06:19 -07:00
Noah Levitt
ca7625b18d set via header on request and response, record request via in warc (because it is sent to the remote site), do not record response via in warc (because it is not sent by the remote site) 2017-04-28 11:07:33 -07:00
Noah Levitt
35d7ccd12e add seconds_behind to service registry and status api, which is the length of time the next url to be written to warc has been waiting in the queue 2017-03-30 15:54:19 -07:00
Noah Levitt
e5f2c348e2 fix dockerized automated tests now that phusion/baseimage is ubuntu xenial 2016-11-15 12:09:09 -08:00
Noah Levitt
2c65ff89fa add license headers 2016-04-06 19:37:55 -07:00
Noah Levitt
df31068c80 improve test running script 2016-01-26 18:47:08 -08:00
Noah Levitt
c9f5b72fd7 really run tor in docker container for tests 2016-01-26 18:47:08 -08:00
Noah Levitt
f38ce708bf set PYTHONDONTWRITEBYTECODE in one place 2016-01-26 18:47:08 -08:00
Noah Levitt
2ecd2facd9 surt 0.3b2 is in pypi now, no need for devpi 2016-01-26 18:47:08 -08:00
Noah Levitt
4930cc2d24 try to avoid conflicts with *.pyc files from outside of the docker tests 2016-01-26 18:47:08 -08:00
Noah Levitt
03c506dade stop after first failing test, use py.test -s 2016-01-26 18:47:08 -08:00
Noah Levitt
a41c426b0a giving up on using git revision in version number :( latest issue is when installing a package that calls git to compute a version number, but cwd is some other git project, you get the wrong thing 2016-01-26 18:47:08 -08:00
Noah Levitt
28d213fb18 spin up rethinkdb in docker, run tests in there 2016-01-26 18:47:08 -08:00