Noah Levitt
|
aa36ff2958
|
include Warcprox-Meta response header with relevant info json, and an informative text/plain body, in "420 Limit reached" response
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
4ce89e6d03
|
basic limits enforcement is working
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
d37d2d71e3
|
meant to remove warcprox.py
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
03c0fc848c
|
fix old tests to work with refactored code; new test test_limits() (fails now, limits not implemented)
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
1f864515ce
|
refactor warc writing, deduplication for somewhat cleaner separation of concerns
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
274a2f6b1d
|
refactor warc writing, deduplication for somewhat cleaner separation of concerns
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
10c724637f
|
factor out warc record building into its own class
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
89fab33295
|
remove old unused, commented out tearDown method
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
d3d23f9878
|
convert test_warcprox.py to py.test with fixtures
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
d38ab08086
|
close connection to proxy client after proxying the request, seems to solve hanging connection issue (see comment in code)
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
771383d0a6
|
refactor proxy handler to use do_* methods for custom http verbs; refactor warc writer thread to use new WarcWriterPool class
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
084bd75ed6
|
dump thread tracebacks on sigquit, more logging and exception handling tweaks
|
2016-01-26 18:45:12 -08:00 |
|
Noah Levitt
|
86eab2119a
|
logging and exception handling tweaks
|
2016-01-26 18:45:12 -08:00 |
|
Noah Levitt
|
eb7de9d3f9
|
catch exception handling special request (currently that means PUTMETA)
|
2016-01-26 18:45:12 -08:00 |
|
Noah Levitt
|
f00602b764
|
some logging tweaks, etc
|
2016-01-26 18:44:34 -08:00 |
|
Noah Levitt
|
0647c0c76d
|
support for writing to different warcs based on Warcprox-Meta http request header warc-prefix setting
|
2016-01-26 18:44:16 -08:00 |
|
Noah Levitt
|
403404f590
|
custom PUTMETA http verb for writing warc metadata records; code borrowed from Ilya's fork https://github.com/ikreymer/warcprox
|
2016-01-26 18:44:16 -08:00 |
|
Jack Cushman
|
4622a6ca52
|
Return recorded_url from _proxy_request.
|
2015-10-23 15:15:45 -04:00 |
|
Noah Levitt
|
67f2ceb717
|
make sure timestamp17(), which is part of warc name, always returns a 17 digit timestamp (even if millisecond part is <100)
|
2015-07-17 13:31:04 -07:00 |
|
Noah Levitt
|
8dfcf0401c
|
bump up socket timeout setting on connection to remote server, and send appropriate error 504 on timeout
|
2015-06-30 17:45:19 -07:00 |
|
Noah Levitt
|
b07f194c63
|
send requested hostname to remote server if python ssl version supports SNI, fixes ssl handshake error for some servers
|
2015-06-30 17:38:45 -07:00 |
|
Ilya Kreymer
|
c045369dcd
|
change 'get_cert_for_host' -> 'cert_for_host'
|
2015-03-30 15:46:31 -07:00 |
|
Ilya Kreymer
|
574f1f3f52
|
remove certauth.py and use the seperate certauth package release
|
2015-03-30 09:32:10 -07:00 |
|
Noah Levitt
|
965853f4ab
|
add payload digest header to revisit records
|
2015-03-26 15:17:46 -07:00 |
|
Noah Levitt
|
5f84b061f3
|
make it work with python 2.7 again
|
2015-03-18 16:29:44 -07:00 |
|
Noah Levitt
|
1e3dd0b910
|
swallow request headers that don't make sense to send on to the destination, i.e. most hop-by-hop headers; parse and save Warcprox-Meta header (nothing is done with it yet)
|
2014-11-20 03:26:42 -08:00 |
|
Noah Levitt
|
a2c25d4242
|
split into even more source files
|
2014-11-20 00:04:43 -08:00 |
|
Noah Levitt
|
9b8ffbbb51
|
separate WarcWriter and WarcWriterThread
|
2014-11-15 04:47:26 -08:00 |
|
Noah Levitt
|
b34edf8fb1
|
split into multiple files
|
2014-11-15 03:20:05 -08:00 |
|
Ilya Kreymer
|
a139465512
|
check if 'sync' method actually exists before calling -- anydbm does not have sync()
method
|
2014-10-28 12:49:02 -07:00 |
|
Noah Levitt
|
16f21b2e76
|
https://github.com/internetarchive/warcprox/issues/9 record warcprox version in warcinfo metadata, and add --version command line option
|
2014-08-08 12:10:45 -07:00 |
|
Noah Levitt
|
7b66f27758
|
since ndbm creates different files on different platforms, glob them all and delete them
|
2014-08-01 17:40:34 -07:00 |
|
Noah Levitt
|
1cdc013c75
|
some debugging to try to figure out what the hell is up with tox saying OSError: [Errno 2] No such file or directory: /tmp/tmpnz51j6.db
|
2014-08-01 17:32:16 -07:00 |
|
Noah Levitt
|
ccbe3522c5
|
timestamps in utc!
|
2014-08-01 16:00:53 -07:00 |
|
Noah Levitt
|
e79cdb84cb
|
set x509 cert version correctly fixes problem with firefox 31; set_version(2) really means version 3, because 0 is understood to mean version 1 (wtf)
|
2014-08-01 12:35:34 -07:00 |
|
Jack Cushman
|
4488c04e5e
|
If gdbm is not available, fall back to anydbm.
|
2014-01-30 19:07:05 -05:00 |
|
Kelsey Hawley
|
c0fbd61507
|
changed the way I was retrieving the python version
|
2014-01-17 16:20:16 -08:00 |
|
Kelsey Hawley
|
a87a5dd972
|
updated test to directly use the specified py version & access the file path to dump-anydbm directly. Also added some more helpful print error statements
|
2014-01-17 15:35:25 -08:00 |
|
Noah Levitt
|
f69ec424fb
|
minor cleanup
|
2014-01-06 17:22:49 -08:00 |
|
Kelsey Hawley
|
b6ea681c2b
|
changed file creation and deletion to use temporaryfile. Still needed to use os to delete the 'extra' files that ndbm & dumbdbm created. Also did not explicitly state the file name in checking the output statements, as now they are random everytime.
|
2014-01-02 18:18:46 -08:00 |
|
Kelsey Hawley
|
1b69aea7ed
|
removed the string splicing and replaced with one clear assert statement based on the script output for each test. simplifies and clarifies the test
|
2014-01-02 17:05:45 -08:00 |
|
Kelsey Hawley
|
4b0ab0ff72
|
updated file to PEP 8, as editor was complaining, and tabs are generally bad
|
2014-01-02 16:29:15 -08:00 |
|
Kelsey Hawley
|
d643be1c8c
|
moved dump-anydbm test file to be in the existing test folder, as proximity to dump-anydbm script is not necessary
|
2013-12-20 14:01:42 -08:00 |
|
Noah Levitt
|
0cb0f0e448
|
ensure request headers always use \r\n (some servers barf if not, e.g. http://cleftomaniacsnyu.wix.com
|
2013-12-13 19:36:22 -08:00 |
|
Noah Levitt
|
9041fe00e6
|
use hashlib.algorithms_guaranteed to replace missing hashlib.algorithms in python3
|
2013-12-12 21:59:43 -08:00 |
|
Noah Levitt
|
2b5ab3b70a
|
shorter CN for CA cert to avoid OpenSSL.crypto.Error: [('asn1 encoding routines', 'ASN1_mbstring_ncopy', 'string too long')]
|
2013-12-09 17:56:47 -08:00 |
|
Noah Levitt
|
f2b501ca35
|
python3.3 http.client wants ProxyingRecord.readinto
|
2013-12-06 17:09:59 -08:00 |
|
Noah Levitt
|
cae9ee6911
|
fix misnomer
|
2013-12-04 17:26:13 -08:00 |
|
Noah Levitt
|
dc9fdc3412
|
tests pass with python2.7 and 3.2! (tox fails though oddly)
|
2013-12-04 17:25:45 -08:00 |
|
Noah Levitt
|
6fbae16a31
|
test dedup of same url
|
2013-11-22 11:20:19 -08:00 |
|