73 Commits

Author SHA1 Message Date
Noah Levitt
719380e612 refactor some general mitm proxy stuff into mitmproxy.py 2016-10-19 15:32:58 -07:00
Noah Levitt
15eeaebde5 fix for connection hang on https urls missing a content-length http response header 2016-10-19 13:45:46 -07:00
Noah Levitt
314be33707 new test that reveals connection hang on https urls missing a content-length http response header (not chunked and server leaves connection open) -- reported by Alex Osborne 2016-10-19 13:43:44 -07:00
Noah Levitt
6000237c47 workaround for nasty python/ssl deadlock that has been affecting warcprox, same issue as https://github.com/pyca/cryptography/issues/2911 2016-09-23 15:54:31 +01:00
Noah Levitt
5d44859ba8 keep trying to connect to kafka and don't let connection failure interfere with other warcprox operations 2016-09-07 13:43:01 -07:00
Noah Levitt
504af2fb0f try to avoid ever blocking when sending messages to kafka 2016-09-07 13:01:11 -07:00
Noah Levitt
1ddebbc50e bump up to next dev version number 2016-07-21 19:12:46 -05:00
Noah Levitt
fdd6086d65 version 2.0b1 for upload to pypi 2016-07-21 19:09:35 -05:00
Noah Levitt
a5d6d634d8 enable pypy and pypy3 travis-ci tests, but allow failures 2016-07-11 11:23:53 -05:00
Noah Levitt
00f48d6566 less verbose logging about updating big captures table 2016-07-05 18:45:17 -05:00
Noah Levitt
5eed7061b1 do not require --kafka-capture-feed-topic to make the kafka capture feed work (it can be configured per job or per site) 2016-07-05 11:51:56 -05:00
Noah Levitt
b82d82b5f1 command line utility warcprox-ensure-rethinkdb-tables, creates rethinkdb tables if they don't already exist... warcprox normally creates them on demand at startup, but if multiple instances are starting up at the same time, you can end up with duplicate broken tables, so it's a good idea to use this utility when spinning up a cluster 2016-06-30 15:24:40 -05:00
Noah Levitt
46c24833ff emoji idn fails with python 2.7, so test with a BMP unicode character 2016-06-29 17:16:50 -05:00
Noah Levitt
33775d360a comment out segfaulting test 2016-06-29 16:47:54 -05:00
Noah Levitt
a59871e17b idn support, at least for domain limits (getting a segfault in tests on mac however, let's see what happens on travis-ci) 2016-06-29 15:54:40 -05:00
Noah Levitt
c9e403585b switching from host limits to domain limits, which apply in aggregate to the host and subdomains 2016-06-29 14:56:14 -05:00
Noah Levitt
2c8b194090 really only apply host limits to the host 2016-06-28 15:53:29 -05:00
Noah Levitt
04c4b63f03 renaming scope rule "host" to "domain" to make it a less confusing, since rules apply to subdomains as well 2016-06-28 15:35:02 -05:00
Noah Levitt
04c21408d7 fix typo 2016-06-27 23:13:00 +00:00
Noah Levitt
320df0565e support "soft limits" which result in a different response code (430) than regular (hard) limits (which result in a 420) 2016-06-27 16:07:20 -05:00
Noah Levitt
9df2ce0fbe convert command-line executables to entry_points console_scripts, best practice according to Python Packaging Authority (eases testing, etc) 2016-06-27 14:46:42 -05:00
Noah Levitt
84767af0f6 check if already started/stopped in WarcproxController.{start,shutdown}, fix bugs 2016-06-27 14:36:06 -05:00
Noah Levitt
6410e4c8c7 reorganize WarcproxController.run_until_shutdown, moving parts of it into new start() and shutdown() methods, for easier integration into a separate python program 2016-06-27 14:18:21 -05:00
Noah Levitt
2fe0c2f25b support for tallying substats of a configured bucket by host, and enforcing limits host limits using those stats, with tests 2016-06-24 20:04:27 -05:00
Noah Levitt
4bb3556709 implement enforcement of Warcprox-Meta header block rules; includes automated tests 2016-05-10 23:11:47 +00:00
Noah Levitt
4fd17be339 started adding some docstrings, and moved some of the more generally man-in-the-middle recording proxy code from warcproxy.py into mitmproxy.py 2016-05-10 01:11:17 -07:00
Noah Levitt
0809c78486 add Strict-Transport-Security to list of http response headers to swallow, to avoid some problems with HSTS when browsing through warcprox (doesn't solve the case of preloaded HSTS though) 2016-04-08 23:26:20 -07:00
Noah Levitt
6f10e2708d disable tor test to give travis build a chance to pass tests (waiting on https://github.com/travis-ci/apt-package-whitelist/issues/1753) 2016-04-06 19:39:28 -07:00
Noah Levitt
2c65ff89fa add license headers 2016-04-06 19:37:55 -07:00
Noah Levitt
6490583dd0 this brozzler branch will be warcprox 2.0, today it's 2.0.dev4 2016-03-18 02:07:29 +00:00
Noah Levitt
42a81d8f8f fix bug where two warc-payload-digest headers were written to revisit records 2016-03-15 06:27:21 +00:00
Noah Levitt
910cd062ee bump version number 2016-03-08 22:55:42 +00:00
Noah Levitt
89f965d1d3 use kafka-python 1.0 recommended api; use kafka capture feed specified in warcprox-meta header, if any 2016-03-03 18:58:52 +00:00
Noah Levitt
ee3ee5d621 call this 1.5.0.dev1 for now 2016-02-25 01:36:36 +00:00
Noah Levitt
00dc9eed84 new option --onion-tor-socks-proxy, host:port of tor socks proxy, used only to connect to .onion sites 2016-01-26 18:47:08 -08:00
Noah Levitt
2ecd2facd9 surt 0.3b2 is in pypi now, no need for devpi 2016-01-26 18:47:08 -08:00
Noah Levitt
95e611a5d0 update stats in RethinkDb asynchronously, since profiling shows this to be a bottleneck in WarcWriterThread (which in turn makes it a bottleneck for the whole app) 2016-01-26 18:47:08 -08:00
Noah Levitt
a41c426b0a giving up on using git revision in version number :( latest issue is when installing a package that calls git to compute a version number, but cwd is some other git project, you get the wrong thing 2016-01-26 18:47:08 -08:00
Noah Levitt
97a30eb319 back to setup.py now that we have devpi 2016-01-26 18:47:08 -08:00
Noah Levitt
c430f81883 some refactoring to prep for big rethinkdb capture table 2016-01-26 18:47:08 -08:00
Noah Levitt
e66dc3a9fb rethinkdb dedup 2016-01-26 18:46:13 -08:00
Noah Levitt
274a2f6b1d refactor warc writing, deduplication for somewhat cleaner separation of concerns 2016-01-26 18:45:36 -08:00
Ilya Kreymer
574f1f3f52 remove certauth.py and use the seperate certauth package release 2015-03-30 09:32:10 -07:00
Noah Levitt
016749a822 bump version since api has changed as a result of reorganization 2015-03-18 16:33:07 -07:00
Noah Levitt
5f84b061f3 make it work with python 2.7 again 2015-03-18 16:29:44 -07:00
Noah Levitt
b34edf8fb1 split into multiple files 2014-11-15 03:20:05 -08:00
Noah Levitt
16f21b2e76 https://github.com/internetarchive/warcprox/issues/9 record warcprox version in warcinfo metadata, and add --version command line option 2014-08-08 12:10:45 -07:00
Noah Levitt
b434e33fdd bump version number for updated submission to pypi 2014-08-05 19:04:07 -07:00
Noah Levitt
ccbe3522c5 timestamps in utc! 2014-08-01 16:00:53 -07:00
Kelsey Hawley
ae3a039d95 updated setup.py to use pytest (for compatilibity with dump-anydbm tests) 2014-01-17 12:13:39 -08:00