Noah Levitt
|
99dd840d20
|
use "ttl" for updated doublethink svc reg api
|
2017-05-23 10:37:39 -07:00 |
|
Noah Levitt
|
aca0b881c6
|
make sure records are written to warc in a predictable order to make tests pass consistently
|
2017-05-19 16:34:27 -07:00 |
|
Noah Levitt
|
ef5dd2e4ae
|
multiple warc writer threads (hacked in with little thought to code organization)
|
2017-05-19 16:10:44 -07:00 |
|
Noah Levitt
|
338e5cd878
|
comment out debug logging thing
|
2017-04-28 11:08:41 -07:00 |
|
Noah Levitt
|
ca7625b18d
|
set via header on request and response, record request via in warc (because it is sent to the remote site), do not record response via in warc (because it is not sent by the remote site)
|
2017-04-28 11:07:33 -07:00 |
|
Noah Levitt
|
47680cc17d
|
let test_choose_a_port_for_me pass when service registry is missing, i.e. when not running with rethinkdb
|
2017-04-17 12:05:39 -07:00 |
|
Noah Levitt
|
3d87ed61be
|
whoops, stop warcprox and join thread in test_choose_a_port_for_me
|
2017-04-17 11:47:22 -07:00 |
|
Noah Levitt
|
1900dfac08
|
test choosing port 0 which means, let the system choose one for me, and fix a bug in service registry reporting of the port
|
2017-04-17 11:45:37 -07:00 |
|
Noah Levitt
|
21a9a26f51
|
fix some obsolete calls
|
2017-04-17 11:00:43 -07:00 |
|
Noah Levitt
|
f17584836e
|
add another field to status api and service registry, "threads", the size of the proxy server thread pool
|
2017-03-30 16:18:50 -07:00 |
|
Noah Levitt
|
35d7ccd12e
|
add seconds_behind to service registry and status api, which is the length of time the next url to be written to warc has been waiting in the queue
|
2017-03-30 15:54:19 -07:00 |
|
Noah Levitt
|
da26b25ac3
|
accept failures from the tor test
|
2017-03-28 12:55:30 -07:00 |
|
Noah Levitt
|
89643b7497
|
make the status api test pass in python 2
|
2017-03-23 10:13:14 -07:00 |
|
Noah Levitt
|
8caae0d7d3
|
new api, http://{warcprox_host}:{port}/status returns status info json
|
2017-03-23 09:56:51 -07:00 |
|
Noah Levitt
|
f1d07ad921
|
use urlcanon library for canonicalization, surtification, scope match rules
|
2017-03-15 09:33:50 -07:00 |
|
Noah Levitt
|
842bfd651c
|
rethinkstuff -> doublethink
|
2017-03-02 15:06:26 -08:00 |
|
Noah Levitt
|
1c7564ee6a
|
really fix tests for python2
|
2017-02-02 10:09:03 -08:00 |
|
Noah Levitt
|
859c93f390
|
comment out unused code that fails in py2
|
2017-02-01 15:42:02 -08:00 |
|
Noah Levitt
|
ddb60876a3
|
WARCPROX_WRITE_RECORD is exempt from method filter
|
2017-02-01 15:30:22 -08:00 |
|
Noah Levitt
|
4b505c524b
|
new flag dedup_ok and warcprox-meta field dedup-ok which can be used to prevent deduplication against particular entries rethinkdb big captures table
|
2017-01-13 17:29:05 -08:00 |
|
Noah Levitt
|
de7a23325b
|
a test for alex's method filter
|
2016-11-15 12:42:25 -08:00 |
|
Noah Levitt
|
3b167459e3
|
change tested idns to valid idna2008 now that requests 2.12.0 enforces that (for better or worse, see https://github.com/kennethreitz/requests/issues/3687)
|
2016-11-15 12:08:07 -08:00 |
|
Noah Levitt
|
fa1e8d3af4
|
allow travis-ci failures for python-nightly and also test 3.6-dev (but allow failures);
enable the onion site tor test because apparently travis-ci is allowing me to
install tor now, see https://travis-ci.org/internetarchive/warcprox/jobs/169101744
although https://github.com/travis-ci/apt-package-whitelist/issues/1753 is still open
|
2016-10-19 18:24:25 -07:00 |
|
Noah Levitt
|
314be33707
|
new test that reveals connection hang on https urls missing a content-length http response header (not chunked and server leaves connection open) -- reported by Alex Osborne
|
2016-10-19 13:43:44 -07:00 |
|
Noah Levitt
|
46c24833ff
|
emoji idn fails with python 2.7, so test with a BMP unicode character
|
2016-06-29 17:16:50 -05:00 |
|
Noah Levitt
|
33775d360a
|
comment out segfaulting test
|
2016-06-29 16:47:54 -05:00 |
|
Noah Levitt
|
a59871e17b
|
idn support, at least for domain limits (getting a segfault in tests on mac however, let's see what happens on travis-ci)
|
2016-06-29 15:54:40 -05:00 |
|
Noah Levitt
|
c9e403585b
|
switching from host limits to domain limits, which apply in aggregate to the host and subdomains
|
2016-06-29 14:56:14 -05:00 |
|
Noah Levitt
|
2c8b194090
|
really only apply host limits to the host
|
2016-06-28 15:53:29 -05:00 |
|
Noah Levitt
|
04c4b63f03
|
renaming scope rule "host" to "domain" to make it a less confusing, since rules apply to subdomains as well
|
2016-06-28 15:35:02 -05:00 |
|
Noah Levitt
|
320df0565e
|
support "soft limits" which result in a different response code (430) than regular (hard) limits (which result in a 420)
|
2016-06-27 16:07:20 -05:00 |
|
Noah Levitt
|
fabd732b7f
|
couple of fixes for host limits
|
2016-06-24 21:58:37 -05:00 |
|
Noah Levitt
|
2fe0c2f25b
|
support for tallying substats of a configured bucket by host, and enforcing limits host limits using those stats, with tests
|
2016-06-24 20:04:27 -05:00 |
|
Noah Levitt
|
d48e2c462d
|
add a start() method to the two classes that save data to rethinkdb periodically in batches, instead of starting the timer in __init__
|
2016-06-16 00:04:59 +00:00 |
|
Noah Levitt
|
4bb3556709
|
implement enforcement of Warcprox-Meta header block rules; includes automated tests
|
2016-05-10 23:11:47 +00:00 |
|
Noah Levitt
|
6f10e2708d
|
disable tor test to give travis build a chance to pass tests (waiting on https://github.com/travis-ci/apt-package-whitelist/issues/1753)
|
2016-04-06 19:39:28 -07:00 |
|
Noah Levitt
|
2c65ff89fa
|
add license headers
|
2016-04-06 19:37:55 -07:00 |
|
Noah Levitt
|
42a81d8f8f
|
fix bug where two warc-payload-digest headers were written to revisit records
|
2016-03-15 06:27:21 +00:00 |
|
Noah Levitt
|
4bb7e043d4
|
wait longer for stats to be updated in test_limits(), now that rethinkdb stats are pushed only every 2.0 seconds
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
00dc9eed84
|
new option --onion-tor-socks-proxy, host:port of tor socks proxy, used only to connect to .onion sites
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
18cc818cf0
|
more timing tweaks to make sure tests pass, improved logging etc
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
e67c7be5bc
|
service registry init
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
95e611a5d0
|
update stats in RethinkDb asynchronously, since profiling shows this to be a bottleneck in WarcWriterThread (which in turn makes it a bottleneck for the whole app)
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
4c380dcc41
|
move tests out of installed package dir
|
2016-01-26 18:47:08 -08:00 |
|