304 Commits

Author SHA1 Message Date
Noah Levitt
320df0565e support "soft limits" which result in a different response code (430) than regular (hard) limits (which result in a 420) 2016-06-27 16:07:20 -05:00
Noah Levitt
9df2ce0fbe convert command-line executables to entry_points console_scripts, best practice according to Python Packaging Authority (eases testing, etc) 2016-06-27 14:46:42 -05:00
Noah Levitt
84767af0f6 check if already started/stopped in WarcproxController.{start,shutdown}, fix bugs 2016-06-27 14:36:06 -05:00
Noah Levitt
6410e4c8c7 reorganize WarcproxController.run_until_shutdown, moving parts of it into new start() and shutdown() methods, for easier integration into a separate python program 2016-06-27 14:18:21 -05:00
Noah Levitt
2fe0c2f25b support for tallying substats of a configured bucket by host, and enforcing limits host limits using those stats, with tests 2016-06-24 20:04:27 -05:00
Noah Levitt
4bb3556709 implement enforcement of Warcprox-Meta header block rules; includes automated tests 2016-05-10 23:11:47 +00:00
Noah Levitt
4fd17be339 started adding some docstrings, and moved some of the more generally man-in-the-middle recording proxy code from warcproxy.py into mitmproxy.py 2016-05-10 01:11:17 -07:00
Noah Levitt
0809c78486 add Strict-Transport-Security to list of http response headers to swallow, to avoid some problems with HSTS when browsing through warcprox (doesn't solve the case of preloaded HSTS though) 2016-04-08 23:26:20 -07:00
Noah Levitt
6f10e2708d disable tor test to give travis build a chance to pass tests (waiting on https://github.com/travis-ci/apt-package-whitelist/issues/1753) 2016-04-06 19:39:28 -07:00
Noah Levitt
2c65ff89fa add license headers 2016-04-06 19:37:55 -07:00
Noah Levitt
6490583dd0 this brozzler branch will be warcprox 2.0, today it's 2.0.dev4 2016-03-18 02:07:29 +00:00
Noah Levitt
42a81d8f8f fix bug where two warc-payload-digest headers were written to revisit records 2016-03-15 06:27:21 +00:00
Noah Levitt
910cd062ee bump version number 2016-03-08 22:55:42 +00:00
Noah Levitt
89f965d1d3 use kafka-python 1.0 recommended api; use kafka capture feed specified in warcprox-meta header, if any 2016-03-03 18:58:52 +00:00
Noah Levitt
ee3ee5d621 call this 1.5.0.dev1 for now 2016-02-25 01:36:36 +00:00
Noah Levitt
00dc9eed84 new option --onion-tor-socks-proxy, host:port of tor socks proxy, used only to connect to .onion sites 2016-01-26 18:47:08 -08:00
Noah Levitt
2ecd2facd9 surt 0.3b2 is in pypi now, no need for devpi 2016-01-26 18:47:08 -08:00
Noah Levitt
95e611a5d0 update stats in RethinkDb asynchronously, since profiling shows this to be a bottleneck in WarcWriterThread (which in turn makes it a bottleneck for the whole app) 2016-01-26 18:47:08 -08:00
Noah Levitt
a41c426b0a giving up on using git revision in version number :( latest issue is when installing a package that calls git to compute a version number, but cwd is some other git project, you get the wrong thing 2016-01-26 18:47:08 -08:00
Noah Levitt
97a30eb319 back to setup.py now that we have devpi 2016-01-26 18:47:08 -08:00
Noah Levitt
c430f81883 some refactoring to prep for big rethinkdb capture table 2016-01-26 18:47:08 -08:00
Noah Levitt
e66dc3a9fb rethinkdb dedup 2016-01-26 18:46:13 -08:00
Noah Levitt
274a2f6b1d refactor warc writing, deduplication for somewhat cleaner separation of concerns 2016-01-26 18:45:36 -08:00
Ilya Kreymer
574f1f3f52 remove certauth.py and use the seperate certauth package release 2015-03-30 09:32:10 -07:00
Noah Levitt
016749a822 bump version since api has changed as a result of reorganization 2015-03-18 16:33:07 -07:00
Noah Levitt
5f84b061f3 make it work with python 2.7 again 2015-03-18 16:29:44 -07:00
Noah Levitt
b34edf8fb1 split into multiple files 2014-11-15 03:20:05 -08:00
Noah Levitt
16f21b2e76 https://github.com/internetarchive/warcprox/issues/9 record warcprox version in warcinfo metadata, and add --version command line option 2014-08-08 12:10:45 -07:00
Noah Levitt
b434e33fdd bump version number for updated submission to pypi 2014-08-05 19:04:07 -07:00
Noah Levitt
ccbe3522c5 timestamps in utc! 2014-08-01 16:00:53 -07:00
Kelsey Hawley
ae3a039d95 updated setup.py to use pytest (for compatilibity with dump-anydbm tests) 2014-01-17 12:13:39 -08:00
Noah Levitt
3bc4294227 oops, adding missing comma 2013-12-19 17:08:13 -08:00
Noah Levitt
115b7c03ee add some classifiers 2013-12-19 17:03:40 -08:00
Noah Levitt
f07437f64d since we depend on warctools trunk now, update the readme, and update the version number, so we can push latest to pypi 2013-12-19 16:48:28 -08:00
Noah Levitt
e880deddb6 oops, fix dependency_links warctools github url 2013-12-13 06:02:14 +00:00
Noah Levitt
81974bb014 warctools mainline has the good stuff now 2013-12-12 21:28:25 -08:00
Noah Levitt
313bc62bf1 gdbm not in pip, can't be listed as a requirement 2013-12-09 17:45:00 -08:00
Noah Levitt
e9e152ca7d tox (and travis ci?) were hiding the fact that the gdbm dependency was the problem 2013-12-07 00:27:59 -08:00
Noah Levitt
b6774da603 more fiddling trying to get test runs to work with various invocation methods, esp travis 2013-12-06 16:50:02 -08:00
Noah Levitt
9c6c18d274 nose.collector wasn't working 2013-12-06 15:22:29 -08:00
Noah Levitt
2dd9ecb718 not sure why tox wasn't working, but this fixes it 2013-12-04 17:50:55 -08:00
Noah Levitt
dc9fdc3412 tests pass with python2.7 and 3.2! (tox fails though oddly) 2013-12-04 17:25:45 -08:00
Noah Levitt
8ae164f8ca finish switch from README.md to README.rst 2013-11-28 01:28:59 -08:00
Noah Levitt
9c53f1b2d3 spec warctools dependency more precisely 2013-11-28 00:40:30 -08:00
Noah Levitt
0237a00f3f test_require requests>=2.0.1 for https://github.com/kennethreitz/requests/pull/1636 2013-11-20 16:28:34 -08:00
Noah Levitt
25464dee80 test_archive_and_playback_http_url 2013-11-20 12:06:29 -08:00
Noah Levitt
555517ab78 WarcproxController to ease use of warcprox as a module 2013-11-19 17:12:58 -08:00
Noah Levitt
b8ad8abffe working on packaging 2013-11-15 22:35:32 -08:00
Noah Levitt
556e969465 for now warcprox.py is just a command, not a module 2013-10-15 15:57:14 -07:00
Noah Levitt
a950d199d5 progress towards warc writing 2013-10-15 10:54:18 -07:00