Noah Levitt
|
5eed7061b1
|
do not require --kafka-capture-feed-topic to make the kafka capture feed work (it can be configured per job or per site)
|
2016-07-05 11:51:56 -05:00 |
|
Noah Levitt
|
b82d82b5f1
|
command line utility warcprox-ensure-rethinkdb-tables, creates rethinkdb tables if they don't already exist... warcprox normally creates them on demand at startup, but if multiple instances are starting up at the same time, you can end up with duplicate broken tables, so it's a good idea to use this utility when spinning up a cluster
|
2016-06-30 15:24:40 -05:00 |
|
Noah Levitt
|
46c24833ff
|
emoji idn fails with python 2.7, so test with a BMP unicode character
|
2016-06-29 17:16:50 -05:00 |
|
Noah Levitt
|
33775d360a
|
comment out segfaulting test
|
2016-06-29 16:47:54 -05:00 |
|
Noah Levitt
|
a59871e17b
|
idn support, at least for domain limits (getting a segfault in tests on mac however, let's see what happens on travis-ci)
|
2016-06-29 15:54:40 -05:00 |
|
Noah Levitt
|
c9e403585b
|
switching from host limits to domain limits, which apply in aggregate to the host and subdomains
|
2016-06-29 14:56:14 -05:00 |
|
Noah Levitt
|
2c8b194090
|
really only apply host limits to the host
|
2016-06-28 15:53:29 -05:00 |
|
Noah Levitt
|
04c4b63f03
|
renaming scope rule "host" to "domain" to make it a less confusing, since rules apply to subdomains as well
|
2016-06-28 15:35:02 -05:00 |
|
Noah Levitt
|
04c21408d7
|
fix typo
|
2016-06-27 23:13:00 +00:00 |
|
Noah Levitt
|
320df0565e
|
support "soft limits" which result in a different response code (430) than regular (hard) limits (which result in a 420)
|
2016-06-27 16:07:20 -05:00 |
|
Noah Levitt
|
9df2ce0fbe
|
convert command-line executables to entry_points console_scripts, best practice according to Python Packaging Authority (eases testing, etc)
|
2016-06-27 14:46:42 -05:00 |
|
Noah Levitt
|
84767af0f6
|
check if already started/stopped in WarcproxController.{start,shutdown}, fix bugs
|
2016-06-27 14:36:06 -05:00 |
|
Noah Levitt
|
6410e4c8c7
|
reorganize WarcproxController.run_until_shutdown, moving parts of it into new start() and shutdown() methods, for easier integration into a separate python program
|
2016-06-27 14:18:21 -05:00 |
|
Noah Levitt
|
2fe0c2f25b
|
support for tallying substats of a configured bucket by host, and enforcing limits host limits using those stats, with tests
|
2016-06-24 20:04:27 -05:00 |
|
Noah Levitt
|
4bb3556709
|
implement enforcement of Warcprox-Meta header block rules; includes automated tests
|
2016-05-10 23:11:47 +00:00 |
|
Noah Levitt
|
4fd17be339
|
started adding some docstrings, and moved some of the more generally man-in-the-middle recording proxy code from warcproxy.py into mitmproxy.py
|
2016-05-10 01:11:17 -07:00 |
|
Noah Levitt
|
0809c78486
|
add Strict-Transport-Security to list of http response headers to swallow, to avoid some problems with HSTS when browsing through warcprox (doesn't solve the case of preloaded HSTS though)
|
2016-04-08 23:26:20 -07:00 |
|
Noah Levitt
|
6f10e2708d
|
disable tor test to give travis build a chance to pass tests (waiting on https://github.com/travis-ci/apt-package-whitelist/issues/1753)
|
2016-04-06 19:39:28 -07:00 |
|
Noah Levitt
|
2c65ff89fa
|
add license headers
|
2016-04-06 19:37:55 -07:00 |
|
Noah Levitt
|
6490583dd0
|
this brozzler branch will be warcprox 2.0, today it's 2.0.dev4
|
2016-03-18 02:07:29 +00:00 |
|
Noah Levitt
|
42a81d8f8f
|
fix bug where two warc-payload-digest headers were written to revisit records
|
2016-03-15 06:27:21 +00:00 |
|
Noah Levitt
|
910cd062ee
|
bump version number
|
2016-03-08 22:55:42 +00:00 |
|
Noah Levitt
|
89f965d1d3
|
use kafka-python 1.0 recommended api; use kafka capture feed specified in warcprox-meta header, if any
|
2016-03-03 18:58:52 +00:00 |
|
Noah Levitt
|
ee3ee5d621
|
call this 1.5.0.dev1 for now
|
2016-02-25 01:36:36 +00:00 |
|
Noah Levitt
|
00dc9eed84
|
new option --onion-tor-socks-proxy, host:port of tor socks proxy, used only to connect to .onion sites
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
2ecd2facd9
|
surt 0.3b2 is in pypi now, no need for devpi
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
95e611a5d0
|
update stats in RethinkDb asynchronously, since profiling shows this to be a bottleneck in WarcWriterThread (which in turn makes it a bottleneck for the whole app)
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
a41c426b0a
|
giving up on using git revision in version number :( latest issue is when installing a package that calls git to compute a version number, but cwd is some other git project, you get the wrong thing
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
97a30eb319
|
back to setup.py now that we have devpi
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
c430f81883
|
some refactoring to prep for big rethinkdb capture table
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
e66dc3a9fb
|
rethinkdb dedup
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
274a2f6b1d
|
refactor warc writing, deduplication for somewhat cleaner separation of concerns
|
2016-01-26 18:45:36 -08:00 |
|
Ilya Kreymer
|
574f1f3f52
|
remove certauth.py and use the seperate certauth package release
|
2015-03-30 09:32:10 -07:00 |
|
Noah Levitt
|
016749a822
|
bump version since api has changed as a result of reorganization
|
2015-03-18 16:33:07 -07:00 |
|
Noah Levitt
|
5f84b061f3
|
make it work with python 2.7 again
|
2015-03-18 16:29:44 -07:00 |
|
Noah Levitt
|
b34edf8fb1
|
split into multiple files
|
2014-11-15 03:20:05 -08:00 |
|
Noah Levitt
|
16f21b2e76
|
https://github.com/internetarchive/warcprox/issues/9 record warcprox version in warcinfo metadata, and add --version command line option
|
2014-08-08 12:10:45 -07:00 |
|
Noah Levitt
|
b434e33fdd
|
bump version number for updated submission to pypi
|
2014-08-05 19:04:07 -07:00 |
|
Noah Levitt
|
ccbe3522c5
|
timestamps in utc!
|
2014-08-01 16:00:53 -07:00 |
|
Kelsey Hawley
|
ae3a039d95
|
updated setup.py to use pytest (for compatilibity with dump-anydbm tests)
|
2014-01-17 12:13:39 -08:00 |
|
Noah Levitt
|
3bc4294227
|
oops, adding missing comma
|
2013-12-19 17:08:13 -08:00 |
|
Noah Levitt
|
115b7c03ee
|
add some classifiers
|
2013-12-19 17:03:40 -08:00 |
|
Noah Levitt
|
f07437f64d
|
since we depend on warctools trunk now, update the readme, and update the version number, so we can push latest to pypi
|
2013-12-19 16:48:28 -08:00 |
|
Noah Levitt
|
e880deddb6
|
oops, fix dependency_links warctools github url
|
2013-12-13 06:02:14 +00:00 |
|
Noah Levitt
|
81974bb014
|
warctools mainline has the good stuff now
|
2013-12-12 21:28:25 -08:00 |
|
Noah Levitt
|
313bc62bf1
|
gdbm not in pip, can't be listed as a requirement
|
2013-12-09 17:45:00 -08:00 |
|
Noah Levitt
|
e9e152ca7d
|
tox (and travis ci?) were hiding the fact that the gdbm dependency was the problem
|
2013-12-07 00:27:59 -08:00 |
|
Noah Levitt
|
b6774da603
|
more fiddling trying to get test runs to work with various invocation methods, esp travis
|
2013-12-06 16:50:02 -08:00 |
|
Noah Levitt
|
9c6c18d274
|
nose.collector wasn't working
|
2013-12-06 15:22:29 -08:00 |
|
Noah Levitt
|
2dd9ecb718
|
not sure why tox wasn't working, but this fixes it
|
2013-12-04 17:50:55 -08:00 |
|