207 Commits

Author SHA1 Message Date
Noah Levitt
cc71c331a1 modify response headers from server, always send connection:close to proxy client 2016-01-26 18:47:08 -08:00
Noah Levitt
f000d413a2 quiet stats logging 2016-01-26 18:46:13 -08:00
Noah Levitt
df38cf856d rethinkdb for stats 2016-01-26 18:46:13 -08:00
Noah Levitt
788bc69f47 set up fixtures once for all tests 2016-01-26 18:46:13 -08:00
Noah Levitt
3d90b9c2e9 py.test option --rethinkdb-servers to run tests using rethinkdb 2016-01-26 18:46:13 -08:00
Noah Levitt
e66dc3a9fb rethinkdb dedup 2016-01-26 18:46:13 -08:00
Noah Levitt
0e7a7fdd69 remove unusued method; fix exception at shutdown time 2016-01-26 18:46:13 -08:00
Noah Levitt
3073d59303 skip stack trace for normal-ish problems 2016-01-26 18:46:13 -08:00
Noah Levitt
d3df48b97e shorten warc filename template 2016-01-26 18:46:13 -08:00
Noah Levitt
0ce8022ea9 better(?) handling of exceptions raised while proxying urls 2016-01-26 18:46:13 -08:00
Noah Levitt
89e5991f7b move limits to toplevel of warcprox-meta json object 2016-01-26 18:46:13 -08:00
Noah Levitt
a876152026 fix exception, make some tweaks 2016-01-26 18:46:13 -08:00
Noah Levitt
aa36ff2958 include Warcprox-Meta response header with relevant info json, and an informative text/plain body, in "420 Limit reached" response 2016-01-26 18:46:13 -08:00
Noah Levitt
4ce89e6d03 basic limits enforcement is working 2016-01-26 18:46:13 -08:00
Noah Levitt
d37d2d71e3 meant to remove warcprox.py 2016-01-26 18:46:13 -08:00
Noah Levitt
03c0fc848c fix old tests to work with refactored code; new test test_limits() (fails now, limits not implemented) 2016-01-26 18:45:36 -08:00
Noah Levitt
1f864515ce refactor warc writing, deduplication for somewhat cleaner separation of concerns 2016-01-26 18:45:36 -08:00
Noah Levitt
274a2f6b1d refactor warc writing, deduplication for somewhat cleaner separation of concerns 2016-01-26 18:45:36 -08:00
Noah Levitt
10c724637f factor out warc record building into its own class 2016-01-26 18:45:36 -08:00
Noah Levitt
89fab33295 remove old unused, commented out tearDown method 2016-01-26 18:45:36 -08:00
Noah Levitt
d3d23f9878 convert test_warcprox.py to py.test with fixtures 2016-01-26 18:45:36 -08:00
Noah Levitt
d38ab08086 close connection to proxy client after proxying the request, seems to solve hanging connection issue (see comment in code) 2016-01-26 18:45:36 -08:00
Noah Levitt
771383d0a6 refactor proxy handler to use do_* methods for custom http verbs; refactor warc writer thread to use new WarcWriterPool class 2016-01-26 18:45:36 -08:00
Noah Levitt
084bd75ed6 dump thread tracebacks on sigquit, more logging and exception handling tweaks 2016-01-26 18:45:12 -08:00
Noah Levitt
86eab2119a logging and exception handling tweaks 2016-01-26 18:45:12 -08:00
Noah Levitt
eb7de9d3f9 catch exception handling special request (currently that means PUTMETA) 2016-01-26 18:45:12 -08:00
Noah Levitt
f00602b764 some logging tweaks, etc 2016-01-26 18:44:34 -08:00
Noah Levitt
0647c0c76d support for writing to different warcs based on Warcprox-Meta http request header warc-prefix setting 2016-01-26 18:44:16 -08:00
Noah Levitt
403404f590 custom PUTMETA http verb for writing warc metadata records; code borrowed from Ilya's fork https://github.com/ikreymer/warcprox 2016-01-26 18:44:16 -08:00
Noah Levitt
f79e744823 Merge pull request #16 from jcushman/proxy-request
Return recorded_url from _proxy_request.
2016-01-04 21:27:02 -08:00
Jack Cushman
4622a6ca52 Return recorded_url from _proxy_request. 2015-10-23 15:15:45 -04:00
Noah Levitt
67f2ceb717 make sure timestamp17(), which is part of warc name, always returns a 17 digit timestamp (even if millisecond part is <100) 2015-07-17 13:31:04 -07:00
Noah Levitt
8dfcf0401c bump up socket timeout setting on connection to remote server, and send appropriate error 504 on timeout 2015-06-30 17:45:19 -07:00
Noah Levitt
b07f194c63 send requested hostname to remote server if python ssl version supports SNI, fixes ssl handshake error for some servers 2015-06-30 17:38:45 -07:00
Noah Levitt
1abe98c99b Merge pull request #12 from ikreymer/dev.use-certauth-pkg
remove certauth.py and use the seperate certauth package release
2015-03-30 17:48:54 -07:00
Ilya Kreymer
c045369dcd change 'get_cert_for_host' -> 'cert_for_host' 2015-03-30 15:46:31 -07:00
Ilya Kreymer
574f1f3f52 remove certauth.py and use the seperate certauth package release 2015-03-30 09:32:10 -07:00
Noah Levitt
965853f4ab add payload digest header to revisit records 2015-03-26 15:17:46 -07:00
Noah Levitt
0eb2917e50 update tox and travis config for supported python versions 2.7 and 3.4 2015-03-18 16:36:24 -07:00
Noah Levitt
016749a822 bump version since api has changed as a result of reorganization 2015-03-18 16:33:07 -07:00
Noah Levitt
5f84b061f3 make it work with python 2.7 again 2015-03-18 16:29:44 -07:00
Noah Levitt
1e3dd0b910 swallow request headers that don't make sense to send on to the destination, i.e. most hop-by-hop headers; parse and save Warcprox-Meta header (nothing is done with it yet) 2014-11-20 03:26:42 -08:00
Noah Levitt
a2c25d4242 split into even more source files 2014-11-20 00:04:43 -08:00
Noah Levitt
9b8ffbbb51 separate WarcWriter and WarcWriterThread 2014-11-15 04:47:26 -08:00
Noah Levitt
b34edf8fb1 split into multiple files 2014-11-15 03:20:05 -08:00
Noah Levitt
e8438dc8ad Merge pull request #10 from ikreymer/dev.fix-sync
check if 'sync' method exists before calling it
2014-10-28 13:48:53 -07:00
Ilya Kreymer
a139465512 check if 'sync' method actually exists before calling -- anydbm does not have sync()
method
2014-10-28 12:49:02 -07:00
Noah Levitt
a0a5ef2355 fix formatting of Install section again so it looks right on pypi! (test with rst2html) 2014-08-08 12:53:16 -07:00
Noah Levitt
b2479d39f6 fix formatting of Install section so it looks right on github 2014-08-08 12:29:12 -07:00
Noah Levitt
9562338d01 add Install section to readme, update the --help dump 2014-08-08 12:22:33 -07:00