365 Commits

Author SHA1 Message Date
Noah Levitt
35d7ccd12e add seconds_behind to service registry and status api, which is the length of time the next url to be written to warc has been waiting in the queue 2017-03-30 15:54:19 -07:00
Noah Levitt
da26b25ac3 accept failures from the tor test 2017-03-28 12:55:30 -07:00
Noah Levitt
1c035153de shut down immediately on disk full error 2017-03-28 12:39:41 -07:00
Noah Levitt
73d934d0a4 turn down kafka log level 2017-03-27 22:42:46 +00:00
Noah Levitt
89643b7497 make the status api test pass in python 2 2017-03-23 10:13:14 -07:00
Noah Levitt
8caae0d7d3 new api, http://{warcprox_host}:{port}/status returns status info json 2017-03-23 09:56:51 -07:00
Noah Levitt
a2f11f4e66 damn it dude get it right 2017-03-15 12:38:38 -07:00
Noah Levitt
a3016227b4 oops, that surt needs to be a string for rethinkdb 2017-03-15 12:22:27 -07:00
Noah Levitt
fed8dfa978 fix buglet 2017-03-15 12:01:34 -07:00
Noah Levitt
f1d07ad921 use urlcanon library for canonicalization, surtification, scope match rules 2017-03-15 09:33:50 -07:00
Noah Levitt
f30160d8ee avoid stack trace in case of urls without host 2017-03-02 15:23:50 -08:00
Noah Levitt
842bfd651c rethinkstuff -> doublethink 2017-03-02 15:06:26 -08:00
Noah Levitt
3a80fde50c back to dev version number 2017-02-14 13:57:32 -08:00
Noah Levitt
098b5d27ab 2.0.1 for pypi to include fixes for the last two critical bugs 2.0.1 2017-02-14 13:56:42 -08:00
Noah Levitt
7c1d5796a3 fix problem in python 2 where warcprox was always single-threaded, because of "old-style" class inheritance issues 2017-02-06 10:56:54 -08:00
Noah Levitt
adb264b40e treat limit value of null, zero, or negative as meaning "unlimited" 2017-02-03 16:20:15 -08:00
Noah Levitt
1c7564ee6a really fix tests for python2 2017-02-02 10:09:03 -08:00
Noah Levitt
859c93f390 comment out unused code that fails in py2 2017-02-01 15:42:02 -08:00
Noah Levitt
ddb60876a3 WARCPROX_WRITE_RECORD is exempt from method filter 2017-02-01 15:30:22 -08:00
Noah Levitt
f5498e1822 back to dev version number 2017-01-31 11:00:01 -08:00
Noah Levitt
acc6be7eb8 it's been stable running, and relatively stable in terms of code churn, time to mint 2.0 2.0 2017-01-31 10:58:21 -08:00
Noah Levitt
629795d617 update readme with latest --help output 2017-01-31 10:56:18 -08:00
Noah Levitt
907e519af0 python 3.6 is out now 2017-01-23 13:53:01 -08:00
Noah Levitt
884aa45066 be more robust and flexible updating the rethinkdb captures table 2017-01-23 13:33:06 -08:00
Noah Levitt
af74959864 add slack notification 2017-01-16 12:26:45 -08:00
Noah Levitt
4b505c524b new flag dedup_ok and warcprox-meta field dedup-ok which can be used to prevent deduplication against particular entries rethinkdb big captures table 2017-01-13 17:29:05 -08:00
Noah Levitt
5bfdbc3d95 back to dev version number 2016-11-21 15:21:02 -08:00
Noah Levitt
564a058a9e call this 2.0b2 2.0b2 2016-11-21 15:19:52 -08:00
Noah Levitt
ff3c9f2b72 trying to make hte copyright lines look better in the readme 2016-11-21 15:19:02 -08:00
Noah Levitt
d31cae2d51 two different measures of size in the big captures table, record_length and wire_bytes 2016-11-21 15:17:50 -08:00
Noah Levitt
2918a73a3b warcprox runs in python 2.7 too 2016-11-21 15:16:35 -08:00
Noah Levitt
de7a23325b a test for alex's method filter 2016-11-15 12:42:25 -08:00
Noah Levitt
f948850692 Merge pull request #21 from nla/method-filter
add --method-filter option
2016-11-15 12:12:53 -08:00
Noah Levitt
e5f2c348e2 fix dockerized automated tests now that phusion/baseimage is ubuntu xenial 2016-11-15 12:09:09 -08:00
Noah Levitt
3b167459e3 change tested idns to valid idna2008 now that requests 2.12.0 enforces that (for better or worse, see https://github.com/kennethreitz/requests/issues/3687) 2016-11-15 12:08:07 -08:00
Alex Osborne
90031a2058 add --method-filter option 2016-11-15 23:26:13 +11:00
Noah Levitt
41bd6c72af for big captures table, do insert with conflict="replace"
We're doing this because one time this happened:
rethinkdb.errors.ReqlOpIndeterminateError: Cannot perform write: The primary replica isn't connected to a quorum of replicas....
and on the next attempt this happened:
{'errors': 1, 'inserted': 1, 'first_error': 'Duplicate primary key `id`: ....

When we got ReqlOpIndeterminateError the operation actually succeeded
partially, one of the records was inserted. After that the batch insert
failed every time because it was trying to insert the same entry. With
this change there will be no error from a duplicate key.
2016-10-25 16:54:07 -07:00
Noah Levitt
1671080755 handle case of unlimited resource limits and cap max_threads at 5000 2016-10-20 17:31:52 -07:00
Noah Levitt
fa1e8d3af4 allow travis-ci failures for python-nightly and also test 3.6-dev (but allow failures);
enable the onion site tor test because apparently travis-ci is allowing me to
install tor now, see https://travis-ci.org/internetarchive/warcprox/jobs/169101744
although https://github.com/travis-ci/apt-package-whitelist/issues/1753 is still open
2016-10-19 18:24:25 -07:00
Noah Levitt
8001dd09b3 travis-ci svg badge looks nicer 2016-10-19 17:30:53 -07:00
Noah Levitt
de3c81fdc8 Merge pull request #17 from internetarchive/2.x
2.x
2016-10-19 15:34:49 -07:00
Noah Levitt
719380e612 refactor some general mitm proxy stuff into mitmproxy.py 2016-10-19 15:32:58 -07:00
Noah Levitt
15eeaebde5 fix for connection hang on https urls missing a content-length http response header 2016-10-19 13:45:46 -07:00
Noah Levitt
314be33707 new test that reveals connection hang on https urls missing a content-length http response header (not chunked and server leaves connection open) -- reported by Alex Osborne 2016-10-19 13:43:44 -07:00
Noah Levitt
6000237c47 workaround for nasty python/ssl deadlock that has been affecting warcprox, same issue as https://github.com/pyca/cryptography/issues/2911 2016-09-23 15:54:31 +01:00
Noah Levitt
5d44859ba8 keep trying to connect to kafka and don't let connection failure interfere with other warcprox operations 2016-09-07 13:43:01 -07:00
Noah Levitt
504af2fb0f try to avoid ever blocking when sending messages to kafka 2016-09-07 13:01:11 -07:00
Noah Levitt
1ddebbc50e bump up to next dev version number 2016-07-21 19:12:46 -05:00
Noah Levitt
fdd6086d65 version 2.0b1 for upload to pypi 2.0b1 2016-07-21 19:09:35 -05:00
Noah Levitt
a5d6d634d8 enable pypy and pypy3 travis-ci tests, but allow failures 2016-07-11 11:23:53 -05:00