Noah Levitt
47680cc17d
let test_choose_a_port_for_me pass when service registry is missing, i.e. when not running with rethinkdb
2017-04-17 12:05:39 -07:00
Noah Levitt
3d87ed61be
whoops, stop warcprox and join thread in test_choose_a_port_for_me
2017-04-17 11:47:22 -07:00
Noah Levitt
1900dfac08
test choosing port 0 which means, let the system choose one for me, and fix a bug in service registry reporting of the port
2017-04-17 11:45:37 -07:00
Noah Levitt
21a9a26f51
fix some obsolete calls
2017-04-17 11:00:43 -07:00
Noah Levitt
e9d6a8fcf4
override mitmproxy.PooledMixIn.get_request to put a cap on the number of open file handles
2017-04-11 16:35:25 -07:00
Noah Levitt
cbefa37fd9
make --queue-size and --max-threads hidden options work
2017-04-11 16:29:57 -07:00
Noah Levitt
f17584836e
add another field to status api and service registry, "threads", the size of the proxy server thread pool
2017-03-30 16:18:50 -07:00
Noah Levitt
35d7ccd12e
add seconds_behind to service registry and status api, which is the length of time the next url to be written to warc has been waiting in the queue
2017-03-30 15:54:19 -07:00
Noah Levitt
da26b25ac3
accept failures from the tor test
2017-03-28 12:55:30 -07:00
Noah Levitt
1c035153de
shut down immediately on disk full error
2017-03-28 12:39:41 -07:00
Noah Levitt
73d934d0a4
turn down kafka log level
2017-03-27 22:42:46 +00:00
Noah Levitt
89643b7497
make the status api test pass in python 2
2017-03-23 10:13:14 -07:00
Noah Levitt
8caae0d7d3
new api, http://{warcprox_host}:{port}/status returns status info json
2017-03-23 09:56:51 -07:00
Noah Levitt
a2f11f4e66
damn it dude get it right
2017-03-15 12:38:38 -07:00
Noah Levitt
a3016227b4
oops, that surt needs to be a string for rethinkdb
2017-03-15 12:22:27 -07:00
Noah Levitt
fed8dfa978
fix buglet
2017-03-15 12:01:34 -07:00
Noah Levitt
f1d07ad921
use urlcanon library for canonicalization, surtification, scope match rules
2017-03-15 09:33:50 -07:00
Noah Levitt
f30160d8ee
avoid stack trace in case of urls without host
2017-03-02 15:23:50 -08:00
Noah Levitt
842bfd651c
rethinkstuff -> doublethink
2017-03-02 15:06:26 -08:00
Noah Levitt
3a80fde50c
back to dev version number
2017-02-14 13:57:32 -08:00
Noah Levitt
098b5d27ab
2.0.1 for pypi to include fixes for the last two critical bugs
2.0.1
2017-02-14 13:56:42 -08:00
Noah Levitt
7c1d5796a3
fix problem in python 2 where warcprox was always single-threaded, because of "old-style" class inheritance issues
2017-02-06 10:56:54 -08:00
Noah Levitt
adb264b40e
treat limit value of null, zero, or negative as meaning "unlimited"
2017-02-03 16:20:15 -08:00
Noah Levitt
1c7564ee6a
really fix tests for python2
2017-02-02 10:09:03 -08:00
Noah Levitt
859c93f390
comment out unused code that fails in py2
2017-02-01 15:42:02 -08:00
Noah Levitt
ddb60876a3
WARCPROX_WRITE_RECORD is exempt from method filter
2017-02-01 15:30:22 -08:00
Noah Levitt
f5498e1822
back to dev version number
2017-01-31 11:00:01 -08:00
Noah Levitt
acc6be7eb8
it's been stable running, and relatively stable in terms of code churn, time to mint 2.0
2.0
2017-01-31 10:58:21 -08:00
Noah Levitt
629795d617
update readme with latest --help output
2017-01-31 10:56:18 -08:00
Noah Levitt
907e519af0
python 3.6 is out now
2017-01-23 13:53:01 -08:00
Noah Levitt
884aa45066
be more robust and flexible updating the rethinkdb captures table
2017-01-23 13:33:06 -08:00
Noah Levitt
af74959864
add slack notification
2017-01-16 12:26:45 -08:00
Noah Levitt
4b505c524b
new flag dedup_ok and warcprox-meta field dedup-ok which can be used to prevent deduplication against particular entries rethinkdb big captures table
2017-01-13 17:29:05 -08:00
Noah Levitt
5bfdbc3d95
back to dev version number
2016-11-21 15:21:02 -08:00
Noah Levitt
564a058a9e
call this 2.0b2
2.0b2
2016-11-21 15:19:52 -08:00
Noah Levitt
ff3c9f2b72
trying to make hte copyright lines look better in the readme
2016-11-21 15:19:02 -08:00
Noah Levitt
d31cae2d51
two different measures of size in the big captures table, record_length and wire_bytes
2016-11-21 15:17:50 -08:00
Noah Levitt
2918a73a3b
warcprox runs in python 2.7 too
2016-11-21 15:16:35 -08:00
Noah Levitt
de7a23325b
a test for alex's method filter
2016-11-15 12:42:25 -08:00
Noah Levitt
f948850692
Merge pull request #21 from nla/method-filter
...
add --method-filter option
2016-11-15 12:12:53 -08:00
Noah Levitt
e5f2c348e2
fix dockerized automated tests now that phusion/baseimage is ubuntu xenial
2016-11-15 12:09:09 -08:00
Noah Levitt
3b167459e3
change tested idns to valid idna2008 now that requests 2.12.0 enforces that (for better or worse, see https://github.com/kennethreitz/requests/issues/3687 )
2016-11-15 12:08:07 -08:00
Alex Osborne
90031a2058
add --method-filter option
2016-11-15 23:26:13 +11:00
Noah Levitt
41bd6c72af
for big captures table, do insert with conflict="replace"
...
We're doing this because one time this happened:
rethinkdb.errors.ReqlOpIndeterminateError: Cannot perform write: The primary replica isn't connected to a quorum of replicas....
and on the next attempt this happened:
{'errors': 1, 'inserted': 1, 'first_error': 'Duplicate primary key `id`: ....
When we got ReqlOpIndeterminateError the operation actually succeeded
partially, one of the records was inserted. After that the batch insert
failed every time because it was trying to insert the same entry. With
this change there will be no error from a duplicate key.
2016-10-25 16:54:07 -07:00
Noah Levitt
1671080755
handle case of unlimited resource limits and cap max_threads at 5000
2016-10-20 17:31:52 -07:00
Noah Levitt
fa1e8d3af4
allow travis-ci failures for python-nightly and also test 3.6-dev (but allow failures);
...
enable the onion site tor test because apparently travis-ci is allowing me to
install tor now, see https://travis-ci.org/internetarchive/warcprox/jobs/169101744
although https://github.com/travis-ci/apt-package-whitelist/issues/1753 is still open
2016-10-19 18:24:25 -07:00
Noah Levitt
8001dd09b3
travis-ci svg badge looks nicer
2016-10-19 17:30:53 -07:00
Noah Levitt
de3c81fdc8
Merge pull request #17 from internetarchive/2.x
...
2.x
2016-10-19 15:34:49 -07:00
Noah Levitt
719380e612
refactor some general mitm proxy stuff into mitmproxy.py
2016-10-19 15:32:58 -07:00
Noah Levitt
15eeaebde5
fix for connection hang on https urls missing a content-length http response header
2016-10-19 13:45:46 -07:00