257 Commits

Author SHA1 Message Date
Noah Levitt
aa36ff2958 include Warcprox-Meta response header with relevant info json, and an informative text/plain body, in "420 Limit reached" response 2016-01-26 18:46:13 -08:00
Noah Levitt
4ce89e6d03 basic limits enforcement is working 2016-01-26 18:46:13 -08:00
Noah Levitt
d37d2d71e3 meant to remove warcprox.py 2016-01-26 18:46:13 -08:00
Noah Levitt
03c0fc848c fix old tests to work with refactored code; new test test_limits() (fails now, limits not implemented) 2016-01-26 18:45:36 -08:00
Noah Levitt
1f864515ce refactor warc writing, deduplication for somewhat cleaner separation of concerns 2016-01-26 18:45:36 -08:00
Noah Levitt
274a2f6b1d refactor warc writing, deduplication for somewhat cleaner separation of concerns 2016-01-26 18:45:36 -08:00
Noah Levitt
10c724637f factor out warc record building into its own class 2016-01-26 18:45:36 -08:00
Noah Levitt
89fab33295 remove old unused, commented out tearDown method 2016-01-26 18:45:36 -08:00
Noah Levitt
d3d23f9878 convert test_warcprox.py to py.test with fixtures 2016-01-26 18:45:36 -08:00
Noah Levitt
d38ab08086 close connection to proxy client after proxying the request, seems to solve hanging connection issue (see comment in code) 2016-01-26 18:45:36 -08:00
Noah Levitt
771383d0a6 refactor proxy handler to use do_* methods for custom http verbs; refactor warc writer thread to use new WarcWriterPool class 2016-01-26 18:45:36 -08:00
Noah Levitt
084bd75ed6 dump thread tracebacks on sigquit, more logging and exception handling tweaks 2016-01-26 18:45:12 -08:00
Noah Levitt
86eab2119a logging and exception handling tweaks 2016-01-26 18:45:12 -08:00
Noah Levitt
eb7de9d3f9 catch exception handling special request (currently that means PUTMETA) 2016-01-26 18:45:12 -08:00
Noah Levitt
f00602b764 some logging tweaks, etc 2016-01-26 18:44:34 -08:00
Noah Levitt
0647c0c76d support for writing to different warcs based on Warcprox-Meta http request header warc-prefix setting 2016-01-26 18:44:16 -08:00
Noah Levitt
403404f590 custom PUTMETA http verb for writing warc metadata records; code borrowed from Ilya's fork https://github.com/ikreymer/warcprox 2016-01-26 18:44:16 -08:00
Jack Cushman
4622a6ca52 Return recorded_url from _proxy_request. 2015-10-23 15:15:45 -04:00
Noah Levitt
67f2ceb717 make sure timestamp17(), which is part of warc name, always returns a 17 digit timestamp (even if millisecond part is <100) 2015-07-17 13:31:04 -07:00
Noah Levitt
8dfcf0401c bump up socket timeout setting on connection to remote server, and send appropriate error 504 on timeout 2015-06-30 17:45:19 -07:00
Noah Levitt
b07f194c63 send requested hostname to remote server if python ssl version supports SNI, fixes ssl handshake error for some servers 2015-06-30 17:38:45 -07:00
Ilya Kreymer
c045369dcd change 'get_cert_for_host' -> 'cert_for_host' 2015-03-30 15:46:31 -07:00
Ilya Kreymer
574f1f3f52 remove certauth.py and use the seperate certauth package release 2015-03-30 09:32:10 -07:00
Noah Levitt
965853f4ab add payload digest header to revisit records 2015-03-26 15:17:46 -07:00
Noah Levitt
5f84b061f3 make it work with python 2.7 again 2015-03-18 16:29:44 -07:00
Noah Levitt
1e3dd0b910 swallow request headers that don't make sense to send on to the destination, i.e. most hop-by-hop headers; parse and save Warcprox-Meta header (nothing is done with it yet) 2014-11-20 03:26:42 -08:00
Noah Levitt
a2c25d4242 split into even more source files 2014-11-20 00:04:43 -08:00
Noah Levitt
9b8ffbbb51 separate WarcWriter and WarcWriterThread 2014-11-15 04:47:26 -08:00
Noah Levitt
b34edf8fb1 split into multiple files 2014-11-15 03:20:05 -08:00
Ilya Kreymer
a139465512 check if 'sync' method actually exists before calling -- anydbm does not have sync()
method
2014-10-28 12:49:02 -07:00
Noah Levitt
16f21b2e76 https://github.com/internetarchive/warcprox/issues/9 record warcprox version in warcinfo metadata, and add --version command line option 2014-08-08 12:10:45 -07:00
Noah Levitt
7b66f27758 since ndbm creates different files on different platforms, glob them all and delete them 2014-08-01 17:40:34 -07:00
Noah Levitt
1cdc013c75 some debugging to try to figure out what the hell is up with tox saying OSError: [Errno 2] No such file or directory: /tmp/tmpnz51j6.db 2014-08-01 17:32:16 -07:00
Noah Levitt
ccbe3522c5 timestamps in utc! 2014-08-01 16:00:53 -07:00
Noah Levitt
e79cdb84cb set x509 cert version correctly fixes problem with firefox 31; set_version(2) really means version 3, because 0 is understood to mean version 1 (wtf) 2014-08-01 12:35:34 -07:00
Jack Cushman
4488c04e5e If gdbm is not available, fall back to anydbm. 2014-01-30 19:07:05 -05:00
Kelsey Hawley
c0fbd61507 changed the way I was retrieving the python version 2014-01-17 16:20:16 -08:00
Kelsey Hawley
a87a5dd972 updated test to directly use the specified py version & access the file path to dump-anydbm directly. Also added some more helpful print error statements 2014-01-17 15:35:25 -08:00
Noah Levitt
f69ec424fb minor cleanup 2014-01-06 17:22:49 -08:00
Kelsey Hawley
b6ea681c2b changed file creation and deletion to use temporaryfile. Still needed to use os to delete the 'extra' files that ndbm & dumbdbm created. Also did not explicitly state the file name in checking the output statements, as now they are random everytime. 2014-01-02 18:18:46 -08:00
Kelsey Hawley
1b69aea7ed removed the string splicing and replaced with one clear assert statement based on the script output for each test. simplifies and clarifies the test 2014-01-02 17:05:45 -08:00
Kelsey Hawley
4b0ab0ff72 updated file to PEP 8, as editor was complaining, and tabs are generally bad 2014-01-02 16:29:15 -08:00
Kelsey Hawley
d643be1c8c moved dump-anydbm test file to be in the existing test folder, as proximity to dump-anydbm script is not necessary 2013-12-20 14:01:42 -08:00
Noah Levitt
0cb0f0e448 ensure request headers always use \r\n (some servers barf if not, e.g. http://cleftomaniacsnyu.wix.com 2013-12-13 19:36:22 -08:00
Noah Levitt
9041fe00e6 use hashlib.algorithms_guaranteed to replace missing hashlib.algorithms in python3 2013-12-12 21:59:43 -08:00
Noah Levitt
2b5ab3b70a shorter CN for CA cert to avoid OpenSSL.crypto.Error: [('asn1 encoding routines', 'ASN1_mbstring_ncopy', 'string too long')] 2013-12-09 17:56:47 -08:00
Noah Levitt
f2b501ca35 python3.3 http.client wants ProxyingRecord.readinto 2013-12-06 17:09:59 -08:00
Noah Levitt
cae9ee6911 fix misnomer 2013-12-04 17:26:13 -08:00
Noah Levitt
dc9fdc3412 tests pass with python2.7 and 3.2! (tox fails though oddly) 2013-12-04 17:25:45 -08:00
Noah Levitt
6fbae16a31 test dedup of same url 2013-11-22 11:20:19 -08:00