Noah Levitt
|
828a2c3dcf
|
get all the tests to pass with ./tests/run-tests.sh
|
2017-10-13 15:54:05 -07:00 |
|
Noah Levitt
|
369dc5c124
|
install and run trough in docker container for testing
|
2017-10-11 17:28:47 -07:00 |
|
Noah Levitt
|
d177b3b80d
|
change rethinkdb-related command line options to use "rethinkdb urls" (parser just added to doublethink) to reduce the proliferation of rethinkdb options, and add --rethinkdb-trough-db-url option
|
2017-10-11 12:06:19 -07:00 |
|
Noah Levitt
|
9b8043d3a2
|
greatly simplify automated test setup by reusing initialization code from the command line executable; this also has the benefit of testing that initialization code
|
2017-10-06 17:00:35 -07:00 |
|
Noah Levitt
|
0de10791aa
|
Merge pull request #35 from vbanos/dedup-redundant-code
Remove redundant methods from dedup classes
|
2017-09-29 11:42:47 -07:00 |
|
Vangelis Banos
|
4e7d8fa917
|
Remove deleted `close ` method call from test.
|
2017-09-29 06:36:37 +00:00 |
|
Noah Levitt
|
faae23d764
|
allow very long request header lines, to support large warcprox-meta header values
|
2017-09-27 17:29:55 -07:00 |
|
Noah Levitt
|
b23e485898
|
simplify recovery of stats batch in case of exception saving them (not sure what was wrong with summy_merge, but this is simpler)
|
2017-06-22 16:54:04 -07:00 |
|
Noah Levitt
|
c0ee9c6093
|
avoid holding the lock, which makes all warc writer threads block, while doing rethinkdb operations, in RethinkStatsDb
|
2017-06-22 16:17:25 -07:00 |
|
Noah Levitt
|
1500341875
|
use %r instead of calling repr()
|
2017-06-07 16:05:47 -07:00 |
|
Noah Levitt
|
95dfa54968
|
get rid of dbm, switch to sqlite, for easier portability, clarity around threading
|
2017-05-24 13:57:09 -07:00 |
|
Noah Levitt
|
99dd840d20
|
use "ttl" for updated doublethink svc reg api
|
2017-05-23 10:37:39 -07:00 |
|
Noah Levitt
|
aca0b881c6
|
make sure records are written to warc in a predictable order to make tests pass consistently
|
2017-05-19 16:34:27 -07:00 |
|
Noah Levitt
|
ef5dd2e4ae
|
multiple warc writer threads (hacked in with little thought to code organization)
|
2017-05-19 16:10:44 -07:00 |
|
Noah Levitt
|
338e5cd878
|
comment out debug logging thing
|
2017-04-28 11:08:41 -07:00 |
|
Noah Levitt
|
ca7625b18d
|
set via header on request and response, record request via in warc (because it is sent to the remote site), do not record response via in warc (because it is not sent by the remote site)
|
2017-04-28 11:07:33 -07:00 |
|
Noah Levitt
|
47680cc17d
|
let test_choose_a_port_for_me pass when service registry is missing, i.e. when not running with rethinkdb
|
2017-04-17 12:05:39 -07:00 |
|
Noah Levitt
|
3d87ed61be
|
whoops, stop warcprox and join thread in test_choose_a_port_for_me
|
2017-04-17 11:47:22 -07:00 |
|
Noah Levitt
|
1900dfac08
|
test choosing port 0 which means, let the system choose one for me, and fix a bug in service registry reporting of the port
|
2017-04-17 11:45:37 -07:00 |
|
Noah Levitt
|
21a9a26f51
|
fix some obsolete calls
|
2017-04-17 11:00:43 -07:00 |
|
Noah Levitt
|
f17584836e
|
add another field to status api and service registry, "threads", the size of the proxy server thread pool
|
2017-03-30 16:18:50 -07:00 |
|
Noah Levitt
|
35d7ccd12e
|
add seconds_behind to service registry and status api, which is the length of time the next url to be written to warc has been waiting in the queue
|
2017-03-30 15:54:19 -07:00 |
|
Noah Levitt
|
da26b25ac3
|
accept failures from the tor test
|
2017-03-28 12:55:30 -07:00 |
|
Noah Levitt
|
89643b7497
|
make the status api test pass in python 2
|
2017-03-23 10:13:14 -07:00 |
|
Noah Levitt
|
8caae0d7d3
|
new api, http://{warcprox_host}:{port}/status returns status info json
|
2017-03-23 09:56:51 -07:00 |
|
Noah Levitt
|
f1d07ad921
|
use urlcanon library for canonicalization, surtification, scope match rules
|
2017-03-15 09:33:50 -07:00 |
|
Noah Levitt
|
842bfd651c
|
rethinkstuff -> doublethink
|
2017-03-02 15:06:26 -08:00 |
|
Noah Levitt
|
1c7564ee6a
|
really fix tests for python2
|
2017-02-02 10:09:03 -08:00 |
|
Noah Levitt
|
859c93f390
|
comment out unused code that fails in py2
|
2017-02-01 15:42:02 -08:00 |
|
Noah Levitt
|
ddb60876a3
|
WARCPROX_WRITE_RECORD is exempt from method filter
|
2017-02-01 15:30:22 -08:00 |
|
Noah Levitt
|
4b505c524b
|
new flag dedup_ok and warcprox-meta field dedup-ok which can be used to prevent deduplication against particular entries rethinkdb big captures table
|
2017-01-13 17:29:05 -08:00 |
|
Noah Levitt
|
de7a23325b
|
a test for alex's method filter
|
2016-11-15 12:42:25 -08:00 |
|
Noah Levitt
|
e5f2c348e2
|
fix dockerized automated tests now that phusion/baseimage is ubuntu xenial
|
2016-11-15 12:09:09 -08:00 |
|
Noah Levitt
|
3b167459e3
|
change tested idns to valid idna2008 now that requests 2.12.0 enforces that (for better or worse, see https://github.com/kennethreitz/requests/issues/3687)
|
2016-11-15 12:08:07 -08:00 |
|
Noah Levitt
|
fa1e8d3af4
|
allow travis-ci failures for python-nightly and also test 3.6-dev (but allow failures);
enable the onion site tor test because apparently travis-ci is allowing me to
install tor now, see https://travis-ci.org/internetarchive/warcprox/jobs/169101744
although https://github.com/travis-ci/apt-package-whitelist/issues/1753 is still open
|
2016-10-19 18:24:25 -07:00 |
|
Noah Levitt
|
719380e612
|
refactor some general mitm proxy stuff into mitmproxy.py
|
2016-10-19 15:32:58 -07:00 |
|
Noah Levitt
|
314be33707
|
new test that reveals connection hang on https urls missing a content-length http response header (not chunked and server leaves connection open) -- reported by Alex Osborne
|
2016-10-19 13:43:44 -07:00 |
|
Noah Levitt
|
46c24833ff
|
emoji idn fails with python 2.7, so test with a BMP unicode character
|
2016-06-29 17:16:50 -05:00 |
|
Noah Levitt
|
33775d360a
|
comment out segfaulting test
|
2016-06-29 16:47:54 -05:00 |
|
Noah Levitt
|
a59871e17b
|
idn support, at least for domain limits (getting a segfault in tests on mac however, let's see what happens on travis-ci)
|
2016-06-29 15:54:40 -05:00 |
|
Noah Levitt
|
c9e403585b
|
switching from host limits to domain limits, which apply in aggregate to the host and subdomains
|
2016-06-29 14:56:14 -05:00 |
|
Noah Levitt
|
2c8b194090
|
really only apply host limits to the host
|
2016-06-28 15:53:29 -05:00 |
|
Noah Levitt
|
04c4b63f03
|
renaming scope rule "host" to "domain" to make it a less confusing, since rules apply to subdomains as well
|
2016-06-28 15:35:02 -05:00 |
|
Noah Levitt
|
320df0565e
|
support "soft limits" which result in a different response code (430) than regular (hard) limits (which result in a 420)
|
2016-06-27 16:07:20 -05:00 |
|
Noah Levitt
|
fabd732b7f
|
couple of fixes for host limits
|
2016-06-24 21:58:37 -05:00 |
|
Noah Levitt
|
2fe0c2f25b
|
support for tallying substats of a configured bucket by host, and enforcing limits host limits using those stats, with tests
|
2016-06-24 20:04:27 -05:00 |
|
Noah Levitt
|
d48e2c462d
|
add a start() method to the two classes that save data to rethinkdb periodically in batches, instead of starting the timer in __init__
|
2016-06-16 00:04:59 +00:00 |
|
Noah Levitt
|
4bb3556709
|
implement enforcement of Warcprox-Meta header block rules; includes automated tests
|
2016-05-10 23:11:47 +00:00 |
|
Noah Levitt
|
6f10e2708d
|
disable tor test to give travis build a chance to pass tests (waiting on https://github.com/travis-ci/apt-package-whitelist/issues/1753)
|
2016-04-06 19:39:28 -07:00 |
|
Noah Levitt
|
2c65ff89fa
|
add license headers
|
2016-04-06 19:37:55 -07:00 |
|