1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

558 Commits

Author SHA1 Message Date
Ilya Kreymer
6da27789eb live handler: allow live rewrite handler to be specified as one of the collections in pywb
by settings index_paths to '$liveweb'. When used, creates a RewriteHandler instead of WBHandler
Can also specify 'proxyhostport' to set the live rewrite to go through a proxy

fallback: allow fallback to a different handler (usually live rewrite) by specifying
'redir_fallback' with name of handler. Instead of 404, a not found response will
internally call the fallback handler to get a response
2014-07-20 16:42:00 -07:00
Ilya Kreymer
b785cd6f08 memento: use mp_ modifier to support memento with frame or non-frame replay
change memento test to use frame replay
2014-07-20 15:43:39 -07:00
Ilya Kreymer
96fcaab521 live-rewrite-server: add ability to specify http/https proxy for live fetching
(for example, for use with a recording proxy)
2014-07-19 14:43:28 -07:00
Ilya Kreymer
f80c27ec00 cookie: add test for 'document.cookie' rewriting 2014-07-15 12:57:02 -07:00
Ilya Kreymer
fa52e0126d cookies: support client side rewriting of document.cooke -> WB_wombat_cookie to rewrite cookie path, if present 2014-07-15 12:52:42 -07:00
Ilya Kreymer
e858b8faae rewrite: better fix for multiple ../ in urls, additional tests 2014-07-14 20:50:45 -07:00
Ilya Kreymer
7032160cf9 rewrite: fix rel url resolution to better handle parent rel path.
Explicitly resolve path when possible, remove only if at root level
2014-07-14 19:13:19 -07:00
Ilya Kreymer
1b1a1f8115 proxy: add 'proxy_coll_select' config which will require a proxy-auth to select a collection for proxy mode.
Otherwise, defaults to first available collection, though proxy-auth can still be sent to specify different collection
2014-07-14 19:12:30 -07:00
Ilya Kreymer
1317b2b10f route selection via proxy auth!
refactor poute request parsing to happen in the actual router class instead of in the route
in proxy mode, add support for picking a route via proxy-auth
improve test for 'top' rewriting
2014-07-10 21:54:23 -07:00
Ilya Kreymer
daffc7ff5d header rewrite: pass through 'content-range' header 2014-07-07 17:02:44 -07:00
Ilya Kreymer
02326a2b12 bump dev version to 0.4.8 2014-07-07 17:02:28 -07:00
Ilya Kreymer
7694bf0678 update README.rst for master 0.4.7 2014-07-01 16:22:38 -07:00
Ilya Kreymer
46b16c61d5 update changelist, version to 0.4.7 2014-07-01 16:15:25 -07:00
Ilya Kreymer
2a2240a23a fix 'bad.cdx' sorting order 2014-07-01 15:36:13 -07:00
Ilya Kreymer
1a42331e69 Merge branch 'develop' into binary-parse 2014-07-01 10:00:05 -07:00
Ilya Kreymer
1980b66127 warc indexing: in include_all mode, pass 'warcinfo' records to writer, allowing it to option to handle or ignore 2014-07-01 09:59:16 -07:00
Ilya Kreymer
57a38dedce Merge branch 'develop' into binary-parse 2014-06-28 11:53:50 -07:00
Ilya Kreymer
377ea33bc8 tests: add test for wombat top 2014-06-28 11:53:23 -07:00
Ilya Kreymer
b0f7fdbed8 regexrewrite: fix rewrite for 'top' 2014-06-28 11:50:11 -07:00
Ilya Kreymer
f2bfc96002 Merge branch 'develop' into binary-parse 2014-06-28 11:04:43 -07:00
Ilya Kreymer
83b69e8447 indexing: don't include records of type 'application/warc-fields' unless all records are being included 2014-06-28 11:03:44 -07:00
Ilya Kreymer
70b7e29b36 pass raw bytes to htmlparser, assuming ascii-compatibility
(todo: add tests for non-ascii compatible encodings)
improved rendering of certain pages, needs more testing

lxml: remove lxml and complexity associated with having the parser,
as its too unpredictable for older html, does its own decoding.
2014-06-27 19:03:06 -07:00
Ilya Kreymer
dd9f138bab disable decoding, by default, of content for html parser 2014-06-27 16:53:33 -07:00
Ilya Kreymer
fb07775d38 tests: add 'bad.cdx' for testing cdx lines with missing original for revisit,
missing/non-existant warc
2014-06-25 12:32:57 -07:00
Ilya Kreymer
913a1e9f31 warc: simplify recordloader a bit more, only response and request records
get parsed as http (excluding dns: and whois: uris)
All others have an '-' status and no headers parsing
tests: add test for zero-length revisits
2014-06-25 12:11:26 -07:00
Ilya Kreymer
6761f5697f indexing: refactor cdxindexer interface to better allow custom writers
record loader: skip whois: and dns: records, better skipping of arc headers
(todo: need more unit tests)
2014-06-24 17:08:10 -07:00
Ilya Kreymer
3965fad4dd cdx indexing: add support for 9-field cdx output,
request merge: store referer if available, check for record id matching
2014-06-19 16:51:23 -07:00
Ilya Kreymer
694b97e67f archive indexing: Refactor, split into ArchiveIterator generic iteration and cdx-indexer,
which writes out CDX specifically
recordloader: always load request, limit stream before headers are loaded
2014-06-19 13:37:42 -07:00
Ilya Kreymer
de65b68edc rules: additions to rules for FB 2014-06-18 16:45:54 -07:00
Ilya Kreymer
22a2da6e0c rewrite: for WB_wombat_top rewriting, select next-to-top instead of self 2014-06-16 19:42:15 -07:00
Ilya Kreymer
e1c1d23a9f framed replay: improved url update support, ensure update url is actually
the url of the frame (ignore ajax requests)
2014-06-16 18:46:01 -07:00
Ilya Kreymer
ac3efec4bc update develop to 0.4.6
improved regex for top -> WB_wombat_top rewriting
2014-06-16 15:57:22 -07:00
Ilya Kreymer
f26b0ddbe4 update setup.py version 2014-06-15 12:35:20 -07:00
Ilya Kreymer
987a9ee58f update README for master 2014-06-15 12:34:14 -07:00
Ilya Kreymer
c4e3f25f9a Merge branch 'develop' for 0.4.5 release 2014-06-15 12:32:47 -07:00
Ilya Kreymer
4767ab0fdd Update CHANGES.rst to 4.5 2014-06-15 12:09:10 -07:00
Ilya Kreymer
88d3e94b36 fixes for pep8, name fixes 2014-06-15 11:57:48 -07:00
Ilya Kreymer
073f1e142e test_config: test lxml parser still 2014-06-14 21:33:08 -07:00
Ilya Kreymer
80e80e97d3 replay: support 'framed_replay' option in config for both replay and live rewrite
split replay view into BaseContentView and ReplayView
refactor RewriteLiveHandler into RewriteLiveView
add additional tests for framed and non-framed mode
default to framed replay!
2014-06-14 18:26:19 -07:00
Ilya Kreymer
d21f8079ca cookie rewrite: remove max-age, add test 2014-06-14 10:04:31 -07:00
Ilya Kreymer
ceeb25a899 rewrite: fix unit tests, add extra closed check for 2.6 (not sure why its needed now) 2014-06-14 01:02:00 -07:00
Ilya Kreymer
028e274b22 rewrite tests: improve POST test, only add header if not empty 2014-06-14 00:18:35 -07:00
Ilya Kreymer
d7516f4cd7 rewrite: fix <base> rewriting, urlrewriter replacement
turn off lxml rewriter by default
2014-06-13 16:44:37 -07:00
Ilya Kreymer
0d3f663ef1 rewrite: disable refer-redirect in case of POST, handle request w/o redirect
(can't use 307 because of FF)
2014-06-13 16:23:11 -07:00
Ilya Kreymer
dfef05a74d rewrite: live rewrite: switch to including all headers rather than a whitelist for proxying 2014-06-13 16:22:18 -07:00
Ilya Kreymer
41e1809039 update wombat.js (support for write override, fill in WB_wombat_location on new iframe)
disable 307 redirects as FF always displays modal confirmation for these, even for same host
2014-06-11 20:12:05 -07:00
Ilya Kreymer
bdafe0938d remove accidental debug commits 2014-06-11 12:44:49 -07:00
Ilya Kreymer
14ed6c5898 remove accidental changes 2014-06-11 12:42:44 -07:00
Ilya Kreymer
0c9d88f032 POST replay: treat POST form data same as get query, no '&&&' marker
additional testing POST
2014-06-11 11:17:06 -07:00
Ilya Kreymer
e2349a74e2 replay: better POST support via post query append!
record_loader can optionally parse 'request' records
archiveindexer has -a flag to write all records ('request' included),
-p flag to append post query
post-test.warc.gz and cdx
POST redirects using 307
2014-06-10 19:21:46 -07:00