Noah Levitt
|
e66dc3a9fb
|
rethinkdb dedup
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
0e7a7fdd69
|
remove unusued method; fix exception at shutdown time
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
3073d59303
|
skip stack trace for normal-ish problems
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
d3df48b97e
|
shorten warc filename template
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
0ce8022ea9
|
better(?) handling of exceptions raised while proxying urls
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
89e5991f7b
|
move limits to toplevel of warcprox-meta json object
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
a876152026
|
fix exception, make some tweaks
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
aa36ff2958
|
include Warcprox-Meta response header with relevant info json, and an informative text/plain body, in "420 Limit reached" response
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
4ce89e6d03
|
basic limits enforcement is working
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
d37d2d71e3
|
meant to remove warcprox.py
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
03c0fc848c
|
fix old tests to work with refactored code; new test test_limits() (fails now, limits not implemented)
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
1f864515ce
|
refactor warc writing, deduplication for somewhat cleaner separation of concerns
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
274a2f6b1d
|
refactor warc writing, deduplication for somewhat cleaner separation of concerns
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
10c724637f
|
factor out warc record building into its own class
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
89fab33295
|
remove old unused, commented out tearDown method
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
d3d23f9878
|
convert test_warcprox.py to py.test with fixtures
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
d38ab08086
|
close connection to proxy client after proxying the request, seems to solve hanging connection issue (see comment in code)
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
771383d0a6
|
refactor proxy handler to use do_* methods for custom http verbs; refactor warc writer thread to use new WarcWriterPool class
|
2016-01-26 18:45:36 -08:00 |
|
Noah Levitt
|
084bd75ed6
|
dump thread tracebacks on sigquit, more logging and exception handling tweaks
|
2016-01-26 18:45:12 -08:00 |
|
Noah Levitt
|
86eab2119a
|
logging and exception handling tweaks
|
2016-01-26 18:45:12 -08:00 |
|
Noah Levitt
|
eb7de9d3f9
|
catch exception handling special request (currently that means PUTMETA)
|
2016-01-26 18:45:12 -08:00 |
|
Noah Levitt
|
f00602b764
|
some logging tweaks, etc
|
2016-01-26 18:44:34 -08:00 |
|
Noah Levitt
|
0647c0c76d
|
support for writing to different warcs based on Warcprox-Meta http request header warc-prefix setting
|
2016-01-26 18:44:16 -08:00 |
|
Noah Levitt
|
403404f590
|
custom PUTMETA http verb for writing warc metadata records; code borrowed from Ilya's fork https://github.com/ikreymer/warcprox
|
2016-01-26 18:44:16 -08:00 |
|
Noah Levitt
|
f79e744823
|
Merge pull request #16 from jcushman/proxy-request
Return recorded_url from _proxy_request.
|
2016-01-04 21:27:02 -08:00 |
|
Jack Cushman
|
4622a6ca52
|
Return recorded_url from _proxy_request.
|
2015-10-23 15:15:45 -04:00 |
|
Noah Levitt
|
67f2ceb717
|
make sure timestamp17(), which is part of warc name, always returns a 17 digit timestamp (even if millisecond part is <100)
|
2015-07-17 13:31:04 -07:00 |
|
Noah Levitt
|
8dfcf0401c
|
bump up socket timeout setting on connection to remote server, and send appropriate error 504 on timeout
|
2015-06-30 17:45:19 -07:00 |
|
Noah Levitt
|
b07f194c63
|
send requested hostname to remote server if python ssl version supports SNI, fixes ssl handshake error for some servers
|
2015-06-30 17:38:45 -07:00 |
|
Noah Levitt
|
1abe98c99b
|
Merge pull request #12 from ikreymer/dev.use-certauth-pkg
remove certauth.py and use the seperate certauth package release
|
2015-03-30 17:48:54 -07:00 |
|
Ilya Kreymer
|
c045369dcd
|
change 'get_cert_for_host' -> 'cert_for_host'
|
2015-03-30 15:46:31 -07:00 |
|
Ilya Kreymer
|
574f1f3f52
|
remove certauth.py and use the seperate certauth package release
|
2015-03-30 09:32:10 -07:00 |
|
Noah Levitt
|
965853f4ab
|
add payload digest header to revisit records
|
2015-03-26 15:17:46 -07:00 |
|
Noah Levitt
|
0eb2917e50
|
update tox and travis config for supported python versions 2.7 and 3.4
|
2015-03-18 16:36:24 -07:00 |
|
Noah Levitt
|
016749a822
|
bump version since api has changed as a result of reorganization
|
2015-03-18 16:33:07 -07:00 |
|
Noah Levitt
|
5f84b061f3
|
make it work with python 2.7 again
|
2015-03-18 16:29:44 -07:00 |
|
Noah Levitt
|
1e3dd0b910
|
swallow request headers that don't make sense to send on to the destination, i.e. most hop-by-hop headers; parse and save Warcprox-Meta header (nothing is done with it yet)
|
2014-11-20 03:26:42 -08:00 |
|
Noah Levitt
|
a2c25d4242
|
split into even more source files
|
2014-11-20 00:04:43 -08:00 |
|
Noah Levitt
|
9b8ffbbb51
|
separate WarcWriter and WarcWriterThread
|
2014-11-15 04:47:26 -08:00 |
|
Noah Levitt
|
b34edf8fb1
|
split into multiple files
|
2014-11-15 03:20:05 -08:00 |
|
Noah Levitt
|
e8438dc8ad
|
Merge pull request #10 from ikreymer/dev.fix-sync
check if 'sync' method exists before calling it
|
2014-10-28 13:48:53 -07:00 |
|
Ilya Kreymer
|
a139465512
|
check if 'sync' method actually exists before calling -- anydbm does not have sync()
method
|
2014-10-28 12:49:02 -07:00 |
|
Noah Levitt
|
a0a5ef2355
|
fix formatting of Install section again so it looks right on pypi! (test with rst2html)
|
2014-08-08 12:53:16 -07:00 |
|
Noah Levitt
|
b2479d39f6
|
fix formatting of Install section so it looks right on github
|
2014-08-08 12:29:12 -07:00 |
|
Noah Levitt
|
9562338d01
|
add Install section to readme, update the --help dump
|
2014-08-08 12:22:33 -07:00 |
|
Noah Levitt
|
16f21b2e76
|
https://github.com/internetarchive/warcprox/issues/9 record warcprox version in warcinfo metadata, and add --version command line option
|
2014-08-08 12:10:45 -07:00 |
|
Noah Levitt
|
b434e33fdd
|
bump version number for updated submission to pypi
|
2014-08-05 19:04:07 -07:00 |
|
Noah Levitt
|
7b66f27758
|
since ndbm creates different files on different platforms, glob them all and delete them
|
2014-08-01 17:40:34 -07:00 |
|
Noah Levitt
|
1cdc013c75
|
some debugging to try to figure out what the hell is up with tox saying OSError: [Errno 2] No such file or directory: /tmp/tmpnz51j6.db
|
2014-08-01 17:32:16 -07:00 |
|
Noah Levitt
|
111c678cee
|
add python3.4 to travis, tox test list; remove apt-get install python3.3-gdbm from travis configuration to fix travis error "Unable to locate package python3.3-gdbm"
|
2014-08-01 16:43:00 -07:00 |
|