Noah Levitt
be6fe83c56
bump dev version number after merging pull requests
2017-09-28 14:37:30 -07:00
Noah Levitt
2e5f8a733a
Merge pull request #33 from vbanos/fix-unit-tests
...
Add missing packages from setup.py, add tox config.
2017-09-28 14:35:48 -07:00
Vangelis Banos
6fd687f2b6
Add missing "," in deps
2017-09-28 20:37:15 +00:00
Vangelis Banos
51a2178cbd
Remove tox.ini, move warcio to test_requires
2017-09-28 20:35:47 +00:00
Noah Levitt
faae23d764
allow very long request header lines, to support large warcprox-meta header values
2017-09-27 17:29:55 -07:00
Vangelis Banos
b1819c51b9
Add missing packages from setup.py, add tox config.
...
Add missing `requests` and `warcio` packages. They are used in unit tests but
they were not included in `setup.py`.
Add `tox` configuration in order to be able to run unit tests for py27,
py34 and py35 with 1 command.
2017-09-24 10:51:29 +00:00
Noah Levitt
8bfda9f4b3
fix python2 tests
2017-09-20 11:03:36 -07:00
Noah Levitt
1bca9d0324
don't use http.client.HTTPResponse.getheader() to get the content-type header, because it can return a comma-delimited string
2017-09-18 14:45:16 -07:00
Noah Levitt
b89f834ce3
no SIGQUIT on windows, so no SIGQUIT handler
2017-09-07 12:01:51 -07:00
Noah Levitt
3003c46c10
https://github.com/internetarchive/warcprox/pull/32 warrants a version bump
2017-09-07 10:33:21 -07:00
Noah Levitt
c73fdd91f8
Merge pull request #32 from internetarchive/trough
...
hello --plugin, goodbye kafka feed
2017-09-07 10:31:42 -07:00
Noah Levitt
db0f36c745
fix --size option ( https://github.com/internetarchive/warcprox/issues/31 )
2017-09-05 12:43:55 -07:00
Noah Levitt
7e55568851
fix --playback-port option ( https://github.com/internetarchive/warcprox/issues/29 )
2017-09-05 12:20:22 -07:00
Noah Levitt
c0cb59e5af
Merge branch 'master' into trough
...
* master:
hidden argument --rethinkdb-big-table-name
try to fix https://github.com/internetarchive/warcprox/issues/27
2017-08-03 11:22:27 -07:00
Noah Levitt
13ee68ce4a
hidden argument --rethinkdb-big-table-name
2017-07-20 12:53:59 -07:00
Noah Levitt
b1a8fecd9d
try to fix https://github.com/internetarchive/warcprox/issues/27
2017-07-07 14:54:55 -07:00
Noah Levitt
2c95a1f2ee
remove kafka feed code
2017-06-28 13:12:30 -07:00
Noah Levitt
b23e485898
simplify recovery of stats batch in case of exception saving them (not sure what was wrong with summy_merge, but this is simpler)
2017-06-22 16:54:04 -07:00
Noah Levitt
c0ee9c6093
avoid holding the lock, which makes all warc writer threads block, while doing rethinkdb operations, in RethinkStatsDb
2017-06-22 16:17:25 -07:00
Noah Levitt
24082c2e8c
don't wait for queue to be empty to do idle rollovers, because sometimes warcprox can stay busy for a long, long time
2017-06-22 15:04:01 -07:00
Noah Levitt
808950abb4
recover properly from exception updating stats in rethinkdb
2017-06-12 16:51:45 -07:00
Noah Levitt
1500341875
use %r instead of calling repr()
2017-06-07 16:05:47 -07:00
Noah Levitt
2f93cdcad9
use locking to ensure consistency and avoid this kind of test failure https://travis-ci.org/internetarchive/warcprox/jobs/235819316
2017-05-25 17:38:20 +00:00
Noah Levitt
95dfa54968
get rid of dbm, switch to sqlite, for easier portability, clarity around threading
2017-05-24 13:57:09 -07:00
Noah Levitt
99dd840d20
use "ttl" for updated doublethink svc reg api
2017-05-23 10:37:39 -07:00
Noah Levitt
aca0b881c6
make sure records are written to warc in a predictable order to make tests pass consistently
2017-05-19 16:34:27 -07:00
Noah Levitt
ef5dd2e4ae
multiple warc writer threads (hacked in with little thought to code organization)
2017-05-19 16:10:44 -07:00
Noah Levitt
515dd84aed
lock to certauth < 1.2 until we port
2017-05-19 15:44:00 -07:00
Noah Levitt
a3dde3d97f
fix mistake (incorrect interpration of concurrent.futures.ThreadPoolExecutor internals) that caused unnecessary waits, and unnecessarily long waits, before calling socket.accept()
2017-05-12 14:18:35 -07:00
Noah Levitt
fd770b71bc
revert stuff accidentally committed as part of eea582c6db9ed6d :(
2017-05-11 11:56:01 -07:00
Noah Levitt
621ebb91ea
use request count and payload size to specify length of benchmark run
2017-05-10 18:58:19 +00:00
Noah Levitt
2a0c8c28c9
improvements to run-benchmark.py, primarily to actually make multiple requests in parallel
2017-05-10 18:01:56 +00:00
Noah Levitt
eea582c6db
rewrite run-benchmarks.py for aiohttp2
2017-05-08 20:56:32 -07:00
Noah Levitt
c87ff90bc1
move more stuff in do_COMMAND inside the try block so that exceptions result in a 500 response
2017-05-05 13:44:46 -07:00
Noah Levitt
c642565ad8
bump up the socket backlog argument to try to stop kernel closing attempted connections on linux
2017-05-05 18:49:56 +00:00
Noah Levitt
b2f08535ae
set method when creating ProxyingRecordingHTTPResponse so that it knows when to close the connection, and HEAD requests don't sit around trying to read more data until socket timeout
2017-05-04 12:54:04 -07:00
Noah Levitt
11e11f4e68
early trace-level logging of the requestline
2017-05-03 18:39:57 -07:00
Noah Levitt
c0e6c219ca
python2 fixes
2017-04-28 11:12:17 -07:00
Noah Levitt
ca7625b18d
set via header on request and response, record request via in warc (because it is sent to the remote site), do not record response via in warc (because it is not sent by the remote site)
2017-04-28 11:07:33 -07:00
Noah Levitt
47680cc17d
let test_choose_a_port_for_me pass when service registry is missing, i.e. when not running with rethinkdb
2017-04-17 12:05:39 -07:00
Noah Levitt
3d87ed61be
whoops, stop warcprox and join thread in test_choose_a_port_for_me
2017-04-17 11:47:22 -07:00
Noah Levitt
1900dfac08
test choosing port 0 which means, let the system choose one for me, and fix a bug in service registry reporting of the port
2017-04-17 11:45:37 -07:00
Noah Levitt
21a9a26f51
fix some obsolete calls
2017-04-17 11:00:43 -07:00
Noah Levitt
e9d6a8fcf4
override mitmproxy.PooledMixIn.get_request to put a cap on the number of open file handles
2017-04-11 16:35:25 -07:00
Noah Levitt
cbefa37fd9
make --queue-size and --max-threads hidden options work
2017-04-11 16:29:57 -07:00
Noah Levitt
f17584836e
add another field to status api and service registry, "threads", the size of the proxy server thread pool
2017-03-30 16:18:50 -07:00
Noah Levitt
35d7ccd12e
add seconds_behind to service registry and status api, which is the length of time the next url to be written to warc has been waiting in the queue
2017-03-30 15:54:19 -07:00
Noah Levitt
da26b25ac3
accept failures from the tor test
2017-03-28 12:55:30 -07:00
Noah Levitt
1c035153de
shut down immediately on disk full error
2017-03-28 12:39:41 -07:00
Noah Levitt
73d934d0a4
turn down kafka log level
2017-03-27 22:42:46 +00:00