317 Commits

Author SHA1 Message Date
Noah Levitt
df38cf856d rethinkdb for stats 2016-01-26 18:46:13 -08:00
Noah Levitt
788bc69f47 set up fixtures once for all tests 2016-01-26 18:46:13 -08:00
Noah Levitt
3d90b9c2e9 py.test option --rethinkdb-servers to run tests using rethinkdb 2016-01-26 18:46:13 -08:00
Noah Levitt
e66dc3a9fb rethinkdb dedup 2016-01-26 18:46:13 -08:00
Noah Levitt
0e7a7fdd69 remove unusued method; fix exception at shutdown time 2016-01-26 18:46:13 -08:00
Noah Levitt
3073d59303 skip stack trace for normal-ish problems 2016-01-26 18:46:13 -08:00
Noah Levitt
d3df48b97e shorten warc filename template 2016-01-26 18:46:13 -08:00
Noah Levitt
0ce8022ea9 better(?) handling of exceptions raised while proxying urls 2016-01-26 18:46:13 -08:00
Noah Levitt
89e5991f7b move limits to toplevel of warcprox-meta json object 2016-01-26 18:46:13 -08:00
Noah Levitt
a876152026 fix exception, make some tweaks 2016-01-26 18:46:13 -08:00
Noah Levitt
aa36ff2958 include Warcprox-Meta response header with relevant info json, and an informative text/plain body, in "420 Limit reached" response 2016-01-26 18:46:13 -08:00
Noah Levitt
4ce89e6d03 basic limits enforcement is working 2016-01-26 18:46:13 -08:00
Noah Levitt
d37d2d71e3 meant to remove warcprox.py 2016-01-26 18:46:13 -08:00
Noah Levitt
03c0fc848c fix old tests to work with refactored code; new test test_limits() (fails now, limits not implemented) 2016-01-26 18:45:36 -08:00
Noah Levitt
1f864515ce refactor warc writing, deduplication for somewhat cleaner separation of concerns 2016-01-26 18:45:36 -08:00
Noah Levitt
274a2f6b1d refactor warc writing, deduplication for somewhat cleaner separation of concerns 2016-01-26 18:45:36 -08:00
Noah Levitt
10c724637f factor out warc record building into its own class 2016-01-26 18:45:36 -08:00
Noah Levitt
89fab33295 remove old unused, commented out tearDown method 2016-01-26 18:45:36 -08:00
Noah Levitt
d3d23f9878 convert test_warcprox.py to py.test with fixtures 2016-01-26 18:45:36 -08:00
Noah Levitt
d38ab08086 close connection to proxy client after proxying the request, seems to solve hanging connection issue (see comment in code) 2016-01-26 18:45:36 -08:00
Noah Levitt
771383d0a6 refactor proxy handler to use do_* methods for custom http verbs; refactor warc writer thread to use new WarcWriterPool class 2016-01-26 18:45:36 -08:00
Noah Levitt
084bd75ed6 dump thread tracebacks on sigquit, more logging and exception handling tweaks 2016-01-26 18:45:12 -08:00
Noah Levitt
86eab2119a logging and exception handling tweaks 2016-01-26 18:45:12 -08:00
Noah Levitt
eb7de9d3f9 catch exception handling special request (currently that means PUTMETA) 2016-01-26 18:45:12 -08:00
Noah Levitt
f00602b764 some logging tweaks, etc 2016-01-26 18:44:34 -08:00
Noah Levitt
0647c0c76d support for writing to different warcs based on Warcprox-Meta http request header warc-prefix setting 2016-01-26 18:44:16 -08:00
Noah Levitt
403404f590 custom PUTMETA http verb for writing warc metadata records; code borrowed from Ilya's fork https://github.com/ikreymer/warcprox 2016-01-26 18:44:16 -08:00
Jack Cushman
4622a6ca52 Return recorded_url from _proxy_request. 2015-10-23 15:15:45 -04:00
Noah Levitt
67f2ceb717 make sure timestamp17(), which is part of warc name, always returns a 17 digit timestamp (even if millisecond part is <100) 2015-07-17 13:31:04 -07:00
Noah Levitt
8dfcf0401c bump up socket timeout setting on connection to remote server, and send appropriate error 504 on timeout 2015-06-30 17:45:19 -07:00
Noah Levitt
b07f194c63 send requested hostname to remote server if python ssl version supports SNI, fixes ssl handshake error for some servers 2015-06-30 17:38:45 -07:00
Ilya Kreymer
c045369dcd change 'get_cert_for_host' -> 'cert_for_host' 2015-03-30 15:46:31 -07:00
Ilya Kreymer
574f1f3f52 remove certauth.py and use the seperate certauth package release 2015-03-30 09:32:10 -07:00
Noah Levitt
965853f4ab add payload digest header to revisit records 2015-03-26 15:17:46 -07:00
Noah Levitt
5f84b061f3 make it work with python 2.7 again 2015-03-18 16:29:44 -07:00
Noah Levitt
1e3dd0b910 swallow request headers that don't make sense to send on to the destination, i.e. most hop-by-hop headers; parse and save Warcprox-Meta header (nothing is done with it yet) 2014-11-20 03:26:42 -08:00
Noah Levitt
a2c25d4242 split into even more source files 2014-11-20 00:04:43 -08:00
Noah Levitt
9b8ffbbb51 separate WarcWriter and WarcWriterThread 2014-11-15 04:47:26 -08:00
Noah Levitt
b34edf8fb1 split into multiple files 2014-11-15 03:20:05 -08:00
Ilya Kreymer
a139465512 check if 'sync' method actually exists before calling -- anydbm does not have sync()
method
2014-10-28 12:49:02 -07:00
Noah Levitt
16f21b2e76 https://github.com/internetarchive/warcprox/issues/9 record warcprox version in warcinfo metadata, and add --version command line option 2014-08-08 12:10:45 -07:00
Noah Levitt
7b66f27758 since ndbm creates different files on different platforms, glob them all and delete them 2014-08-01 17:40:34 -07:00
Noah Levitt
1cdc013c75 some debugging to try to figure out what the hell is up with tox saying OSError: [Errno 2] No such file or directory: /tmp/tmpnz51j6.db 2014-08-01 17:32:16 -07:00
Noah Levitt
ccbe3522c5 timestamps in utc! 2014-08-01 16:00:53 -07:00
Noah Levitt
e79cdb84cb set x509 cert version correctly fixes problem with firefox 31; set_version(2) really means version 3, because 0 is understood to mean version 1 (wtf) 2014-08-01 12:35:34 -07:00
Jack Cushman
4488c04e5e If gdbm is not available, fall back to anydbm. 2014-01-30 19:07:05 -05:00
Kelsey Hawley
c0fbd61507 changed the way I was retrieving the python version 2014-01-17 16:20:16 -08:00
Kelsey Hawley
a87a5dd972 updated test to directly use the specified py version & access the file path to dump-anydbm directly. Also added some more helpful print error statements 2014-01-17 15:35:25 -08:00
Noah Levitt
f69ec424fb minor cleanup 2014-01-06 17:22:49 -08:00
Kelsey Hawley
b6ea681c2b changed file creation and deletion to use temporaryfile. Still needed to use os to delete the 'extra' files that ndbm & dumbdbm created. Also did not explicitly state the file name in checking the output statements, as now they are random everytime. 2014-01-02 18:18:46 -08:00