Noah Levitt
|
719380e612
|
refactor some general mitm proxy stuff into mitmproxy.py
|
2016-10-19 15:32:58 -07:00 |
|
Noah Levitt
|
15eeaebde5
|
fix for connection hang on https urls missing a content-length http response header
|
2016-10-19 13:45:46 -07:00 |
|
Noah Levitt
|
314be33707
|
new test that reveals connection hang on https urls missing a content-length http response header (not chunked and server leaves connection open) -- reported by Alex Osborne
|
2016-10-19 13:43:44 -07:00 |
|
Noah Levitt
|
6000237c47
|
workaround for nasty python/ssl deadlock that has been affecting warcprox, same issue as https://github.com/pyca/cryptography/issues/2911
|
2016-09-23 15:54:31 +01:00 |
|
Noah Levitt
|
5d44859ba8
|
keep trying to connect to kafka and don't let connection failure interfere with other warcprox operations
|
2016-09-07 13:43:01 -07:00 |
|
Noah Levitt
|
504af2fb0f
|
try to avoid ever blocking when sending messages to kafka
|
2016-09-07 13:01:11 -07:00 |
|
Noah Levitt
|
1ddebbc50e
|
bump up to next dev version number
|
2016-07-21 19:12:46 -05:00 |
|
Noah Levitt
|
fdd6086d65
|
version 2.0b1 for upload to pypi
|
2016-07-21 19:09:35 -05:00 |
|
Noah Levitt
|
a5d6d634d8
|
enable pypy and pypy3 travis-ci tests, but allow failures
|
2016-07-11 11:23:53 -05:00 |
|
Noah Levitt
|
00f48d6566
|
less verbose logging about updating big captures table
|
2016-07-05 18:45:17 -05:00 |
|
Noah Levitt
|
5eed7061b1
|
do not require --kafka-capture-feed-topic to make the kafka capture feed work (it can be configured per job or per site)
|
2016-07-05 11:51:56 -05:00 |
|
Noah Levitt
|
b82d82b5f1
|
command line utility warcprox-ensure-rethinkdb-tables, creates rethinkdb tables if they don't already exist... warcprox normally creates them on demand at startup, but if multiple instances are starting up at the same time, you can end up with duplicate broken tables, so it's a good idea to use this utility when spinning up a cluster
|
2016-06-30 15:24:40 -05:00 |
|
Noah Levitt
|
46c24833ff
|
emoji idn fails with python 2.7, so test with a BMP unicode character
|
2016-06-29 17:16:50 -05:00 |
|
Noah Levitt
|
33775d360a
|
comment out segfaulting test
|
2016-06-29 16:47:54 -05:00 |
|
Noah Levitt
|
a59871e17b
|
idn support, at least for domain limits (getting a segfault in tests on mac however, let's see what happens on travis-ci)
|
2016-06-29 15:54:40 -05:00 |
|
Noah Levitt
|
c9e403585b
|
switching from host limits to domain limits, which apply in aggregate to the host and subdomains
|
2016-06-29 14:56:14 -05:00 |
|
Noah Levitt
|
2c8b194090
|
really only apply host limits to the host
|
2016-06-28 15:53:29 -05:00 |
|
Noah Levitt
|
04c4b63f03
|
renaming scope rule "host" to "domain" to make it a less confusing, since rules apply to subdomains as well
|
2016-06-28 15:35:02 -05:00 |
|
Noah Levitt
|
04c21408d7
|
fix typo
|
2016-06-27 23:13:00 +00:00 |
|
Noah Levitt
|
320df0565e
|
support "soft limits" which result in a different response code (430) than regular (hard) limits (which result in a 420)
|
2016-06-27 16:07:20 -05:00 |
|
Noah Levitt
|
9df2ce0fbe
|
convert command-line executables to entry_points console_scripts, best practice according to Python Packaging Authority (eases testing, etc)
|
2016-06-27 14:46:42 -05:00 |
|
Noah Levitt
|
84767af0f6
|
check if already started/stopped in WarcproxController.{start,shutdown}, fix bugs
|
2016-06-27 14:36:06 -05:00 |
|
Noah Levitt
|
6410e4c8c7
|
reorganize WarcproxController.run_until_shutdown, moving parts of it into new start() and shutdown() methods, for easier integration into a separate python program
|
2016-06-27 14:18:21 -05:00 |
|
Noah Levitt
|
2fe0c2f25b
|
support for tallying substats of a configured bucket by host, and enforcing limits host limits using those stats, with tests
|
2016-06-24 20:04:27 -05:00 |
|
Noah Levitt
|
4bb3556709
|
implement enforcement of Warcprox-Meta header block rules; includes automated tests
|
2016-05-10 23:11:47 +00:00 |
|
Noah Levitt
|
4fd17be339
|
started adding some docstrings, and moved some of the more generally man-in-the-middle recording proxy code from warcproxy.py into mitmproxy.py
|
2016-05-10 01:11:17 -07:00 |
|
Noah Levitt
|
0809c78486
|
add Strict-Transport-Security to list of http response headers to swallow, to avoid some problems with HSTS when browsing through warcprox (doesn't solve the case of preloaded HSTS though)
|
2016-04-08 23:26:20 -07:00 |
|
Noah Levitt
|
6f10e2708d
|
disable tor test to give travis build a chance to pass tests (waiting on https://github.com/travis-ci/apt-package-whitelist/issues/1753)
|
2016-04-06 19:39:28 -07:00 |
|
Noah Levitt
|
2c65ff89fa
|
add license headers
|
2016-04-06 19:37:55 -07:00 |
|
Noah Levitt
|
6490583dd0
|
this brozzler branch will be warcprox 2.0, today it's 2.0.dev4
|
2016-03-18 02:07:29 +00:00 |
|
Noah Levitt
|
42a81d8f8f
|
fix bug where two warc-payload-digest headers were written to revisit records
|
2016-03-15 06:27:21 +00:00 |
|
Noah Levitt
|
910cd062ee
|
bump version number
|
2016-03-08 22:55:42 +00:00 |
|
Noah Levitt
|
89f965d1d3
|
use kafka-python 1.0 recommended api; use kafka capture feed specified in warcprox-meta header, if any
|
2016-03-03 18:58:52 +00:00 |
|
Noah Levitt
|
ee3ee5d621
|
call this 1.5.0.dev1 for now
|
2016-02-25 01:36:36 +00:00 |
|
Noah Levitt
|
00dc9eed84
|
new option --onion-tor-socks-proxy, host:port of tor socks proxy, used only to connect to .onion sites
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
2ecd2facd9
|
surt 0.3b2 is in pypi now, no need for devpi
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
95e611a5d0
|
update stats in RethinkDb asynchronously, since profiling shows this to be a bottleneck in WarcWriterThread (which in turn makes it a bottleneck for the whole app)
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
a41c426b0a
|
giving up on using git revision in version number :( latest issue is when installing a package that calls git to compute a version number, but cwd is some other git project, you get the wrong thing
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
97a30eb319
|
back to setup.py now that we have devpi
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
c430f81883
|
some refactoring to prep for big rethinkdb capture table
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
e66dc3a9fb
|
rethinkdb dedup
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
274a2f6b1d
|
refactor warc writing, deduplication for somewhat cleaner separation of concerns
|
2016-01-26 18:45:36 -08:00 |
|
Ilya Kreymer
|
574f1f3f52
|
remove certauth.py and use the seperate certauth package release
|
2015-03-30 09:32:10 -07:00 |
|
Noah Levitt
|
016749a822
|
bump version since api has changed as a result of reorganization
|
2015-03-18 16:33:07 -07:00 |
|
Noah Levitt
|
5f84b061f3
|
make it work with python 2.7 again
|
2015-03-18 16:29:44 -07:00 |
|
Noah Levitt
|
b34edf8fb1
|
split into multiple files
|
2014-11-15 03:20:05 -08:00 |
|
Noah Levitt
|
16f21b2e76
|
https://github.com/internetarchive/warcprox/issues/9 record warcprox version in warcinfo metadata, and add --version command line option
|
2014-08-08 12:10:45 -07:00 |
|
Noah Levitt
|
b434e33fdd
|
bump version number for updated submission to pypi
|
2014-08-05 19:04:07 -07:00 |
|
Noah Levitt
|
ccbe3522c5
|
timestamps in utc!
|
2014-08-01 16:00:53 -07:00 |
|
Kelsey Hawley
|
ae3a039d95
|
updated setup.py to use pytest (for compatilibity with dump-anydbm tests)
|
2014-01-17 12:13:39 -08:00 |
|