Noah Levitt
|
c76d9b88d3
|
test https server, and request handler... next step is to use them for actual tests
|
2013-11-19 18:12:16 -08:00 |
|
Noah Levitt
|
bfd1cf432e
|
warcprox command
|
2013-11-19 17:16:04 -08:00 |
|
Noah Levitt
|
512957e7b8
|
update .gitignore for the stuff i want to ignore
|
2013-11-19 17:15:54 -08:00 |
|
Noah Levitt
|
555517ab78
|
WarcproxController to ease use of warcprox as a module
|
2013-11-19 17:12:58 -08:00 |
|
Noah Levitt
|
b8ad8abffe
|
working on packaging
|
2013-11-15 22:35:32 -08:00 |
|
Noah Levitt
|
5652b322de
|
playback uses warctools streaming api, see https://github.com/internetarchive/warctools/pull/6
|
2013-11-15 03:16:55 -08:00 |
|
Noah Levitt
|
8b8124503a
|
use gdbm instead of anydbm, since gdbm has sync() and hopefully is available everywhere(?)
|
2013-11-05 18:39:51 -08:00 |
|
Noah Levitt
|
41b1db79e5
|
logging tweaks
|
2013-11-01 19:42:37 -07:00 |
|
Noah Levitt
|
b07118159e
|
more updates to readme
|
2013-11-01 19:11:39 -07:00 |
|
Noah Levitt
|
121ecca830
|
support revisit records in playback proxy
|
2013-11-01 19:06:03 -07:00 |
|
Noah Levitt
|
77d33f21a8
|
instant playback partially working
|
2013-11-01 12:42:40 -07:00 |
|
Noah Levitt
|
dab8a956c2
|
more todo list updates
|
2013-10-31 22:47:53 -07:00 |
|
Noah Levitt
|
c4d06b1564
|
log all requests, not just CONNECT
|
2013-10-30 18:16:56 -07:00 |
|
Noah Levitt
|
630779ff0b
|
since aborting the connection is normal behavior in many circumstances for browsers, handle it gracefully, continuing to download and archive the url from the remote server
|
2013-10-30 17:57:59 -07:00 |
|
Noah Levitt
|
534c61a4c1
|
utility for inspecting deduplication database (or any dbm database)
|
2013-10-30 17:54:47 -07:00 |
|
Noah Levitt
|
03fe7179f8
|
-g DIGEST_ALGORITHM, --digest-algorithm DIGEST_ALGORITHM digest algorithm, one of md5, sha1, sha224, sha256, sha384, sha512 (default: sha1)
|
2013-10-30 14:16:30 -07:00 |
|
Noah Levitt
|
e370ec6fe2
|
refactor so that warc records are constructed in the warc writer thread; this way the disk-based dedup lookup, to decide whether to write a revisit record, happens out-of-band; and maybe more importantly, now all dedup db reading and writing happens in a single thread, so we don't have to worry about dbm thread safety; also, dedup info is not saved or looked up for urls with empty payload
|
2013-10-30 13:36:32 -07:00 |
|
Noah Levitt
|
1967b6aabf
|
persistent dedup database using anydbm
|
2013-10-30 00:54:35 -07:00 |
|
Noah Levitt
|
975657c74b
|
basic deduplication on payload digest using in-memory store
|
2013-10-29 18:59:21 -07:00 |
|
Noah Levitt
|
57c21920bd
|
--base32 write SHA1 digests in Base32 instead of hex (default: False)
|
2013-10-28 19:30:02 -07:00 |
|
Noah Levitt
|
1ab5c1f683
|
fix error when --rollover-idle-time not specified
|
2013-10-24 20:20:14 -07:00 |
|
Noah Levitt
|
1e74ce4f64
|
CA specific to host
|
2013-10-22 15:08:41 -07:00 |
|
Noah Levitt
|
bb148cce4c
|
Merge branch 'master' of github.com:nlevitt/warcprox
|
2013-10-21 15:09:05 -07:00 |
|
Noah Levitt
|
85900d05aa
|
shutdown should be faster in this order
|
2013-10-21 12:58:21 -07:00 |
|
Noah Levitt
|
a1d69a9cae
|
todo list thoughts
|
2013-10-19 15:26:13 -07:00 |
|
Noah Levitt
|
ebb9b6d625
|
new option --rollover-idle-time - WARC file rollover idle time threshold in seconds (so that Friday's last open WARC doesn't sit there all weekend waiting for more data) (default: None)
|
2013-10-19 15:25:42 -07:00 |
|
Noah Levitt
|
7367620dae
|
write WARC-IP-Address header on response record
|
2013-10-19 14:36:15 -07:00 |
|
Noah Levitt
|
980ba13d10
|
add todo list
|
2013-10-18 11:14:36 -07:00 |
|
Noah Levitt
|
f7cf10933b
|
include current --help output in readme
|
2013-10-17 18:39:16 -07:00 |
|
Noah Levitt
|
e01691c1f2
|
fix bugs, improve logging of each warc record
|
2013-10-17 18:35:11 -07:00 |
|
Noah Levitt
|
568df5360d
|
some refactoring for clarity and modularity
|
2013-10-17 18:12:33 -07:00 |
|
Noah Levitt
|
a0ff2bc8b2
|
mention dependency on warctools fork
|
2013-10-17 13:03:16 -07:00 |
|
Noah Levitt
|
e6a897412b
|
use tempfile.SpooledTemporaryFile to overflow recorded response to disk
|
2013-10-17 12:58:17 -07:00 |
|
Noah Levitt
|
039f892024
|
--verbose and --quiet
|
2013-10-17 02:51:51 -07:00 |
|
Noah Levitt
|
fc139f1f4e
|
send raw bytes from server response back to proxy client (not unchunked)
|
2013-10-17 02:47:55 -07:00 |
|
Noah Levitt
|
5f90e76ca6
|
shut down cleaning on sigterm
|
2013-10-17 01:58:07 -07:00 |
|
Noah Levitt
|
72f141fec3
|
calculate payload sha1
|
2013-10-16 19:10:04 -07:00 |
|
Noah Levitt
|
9d176a408b
|
working on proof of concept streaming support
|
2013-10-16 18:13:56 -07:00 |
|
Noah Levitt
|
6f12a9e3bf
|
--certs-dir option
|
2013-10-16 15:36:53 -07:00 |
|
Noah Levitt
|
98fced4cd9
|
explain about CA trust in readme
|
2013-10-16 14:50:08 -07:00 |
|
Noah Levitt
|
096cb0a2b6
|
restore CA
|
2013-10-16 14:36:19 -07:00 |
|
Noah Levitt
|
bde9b54cd8
|
simplify readme
|
2013-10-16 12:31:14 -07:00 |
|
Noah Levitt
|
9b394ee860
|
logging fix
|
2013-10-16 12:25:15 -07:00 |
|
Noah Levitt
|
b61b818baa
|
randomize generated cert serial to avoid error from browser
|
2013-10-16 01:05:06 -07:00 |
|
Noah Levitt
|
9140b16a6a
|
write request records
|
2013-10-15 18:37:26 -07:00 |
|
Noah Levitt
|
b3b6406e71
|
warcinfo record
|
2013-10-15 17:51:09 -07:00 |
|
Noah Levitt
|
556e969465
|
for now warcprox.py is just a command, not a module
|
2013-10-15 15:57:14 -07:00 |
|
Noah Levitt
|
b201801bd9
|
rename to warcprox.py
|
2013-10-15 15:52:48 -07:00 |
|
Noah Levitt
|
4367da7bbd
|
write warcs!
|
2013-10-15 15:52:26 -07:00 |
|
Noah Levitt
|
6345845b48
|
argv parsing
|
2013-10-15 14:11:31 -07:00 |
|