1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-24 06:59:52 +01:00

321 Commits

Author SHA1 Message Date
Ilya Kreymer
e858b8faae rewrite: better fix for multiple ../ in urls, additional tests 2014-07-14 20:50:45 -07:00
Ilya Kreymer
7032160cf9 rewrite: fix rel url resolution to better handle parent rel path.
Explicitly resolve path when possible, remove only if at root level
2014-07-14 19:13:19 -07:00
Ilya Kreymer
1317b2b10f route selection via proxy auth!
refactor poute request parsing to happen in the actual router class instead of in the route
in proxy mode, add support for picking a route via proxy-auth
improve test for 'top' rewriting
2014-07-10 21:54:23 -07:00
Ilya Kreymer
daffc7ff5d header rewrite: pass through 'content-range' header 2014-07-07 17:02:44 -07:00
Ilya Kreymer
57a38dedce Merge branch 'develop' into binary-parse 2014-06-28 11:53:50 -07:00
Ilya Kreymer
377ea33bc8 tests: add test for wombat top 2014-06-28 11:53:23 -07:00
Ilya Kreymer
b0f7fdbed8 regexrewrite: fix rewrite for 'top' 2014-06-28 11:50:11 -07:00
Ilya Kreymer
70b7e29b36 pass raw bytes to htmlparser, assuming ascii-compatibility
(todo: add tests for non-ascii compatible encodings)
improved rendering of certain pages, needs more testing

lxml: remove lxml and complexity associated with having the parser,
as its too unpredictable for older html, does its own decoding.
2014-06-27 19:03:06 -07:00
Ilya Kreymer
dd9f138bab disable decoding, by default, of content for html parser 2014-06-27 16:53:33 -07:00
Ilya Kreymer
ac3efec4bc update develop to 0.4.6
improved regex for top -> WB_wombat_top rewriting
2014-06-16 15:57:22 -07:00
Ilya Kreymer
d21f8079ca cookie rewrite: remove max-age, add test 2014-06-14 10:04:31 -07:00
Ilya Kreymer
ceeb25a899 rewrite: fix unit tests, add extra closed check for 2.6 (not sure why its needed now) 2014-06-14 01:02:00 -07:00
Ilya Kreymer
028e274b22 rewrite tests: improve POST test, only add header if not empty 2014-06-14 00:18:35 -07:00
Ilya Kreymer
d7516f4cd7 rewrite: fix <base> rewriting, urlrewriter replacement
turn off lxml rewriter by default
2014-06-13 16:44:37 -07:00
Ilya Kreymer
dfef05a74d rewrite: live rewrite: switch to including all headers rather than a whitelist for proxying 2014-06-13 16:22:18 -07:00
Ilya Kreymer
e2349a74e2 replay: better POST support via post query append!
record_loader can optionally parse 'request' records
archiveindexer has -a flag to write all records ('request' included),
-p flag to append post query
post-test.warc.gz and cdx
POST redirects using 307
2014-06-10 19:21:46 -07:00
Ilya Kreymer
52040127b3 update wombat.js to latest
rewrite live: add another rewrite live header,
use 307 for archival referer based redirects
2014-05-30 11:03:22 -07:00
Ilya Kreymer
9b732def93 cookie_rewriting: if domain is specified, apply cookie to coll root
rather than rewritten path.. needed in order for subdomain cookies to be
detected properly
2014-05-18 21:51:07 -07:00
Ilya Kreymer
1d674d97d8 pep8 pass! 2014-05-16 22:44:26 -07:00
Ilya Kreymer
923421d637 rewrite_content: add a few tests for cs_, js_, remove redundant except 2014-05-16 22:43:53 -07:00
Ilya Kreymer
5285723ccf cookie_rewriter: catch CookieError and ignore erroring cookies 2014-05-15 22:37:08 -07:00
Ilya Kreymer
1d8c68b745 rewrite: only translate non-empty header values 2014-05-13 17:42:55 -07:00
Ilya Kreymer
871cc26fa4 rewrite: add optional cookie_rewriter, created by urlrewriter and called from header_rewriter
cookie_rewriter works correctly with a concatenated set-cookie list, returns a list of rewritten 'set-cookie' headers
rewrite_live: add proxying of Host, Origin, additional headers
split header rewriter tests into test_header_rewriter, add test_cookie_rewriter
bump version to 0.4.0!
2014-05-13 17:07:41 -07:00
Ilya Kreymer
46449ac188 rewrite: pass wburl mod to rewritier, so that css/js rewriting
rules may override default content-type (in cases where it is incorrect)
allows for rule based cusomization (to be added later)
2014-05-05 22:12:45 -07:00
Ilya Kreymer
d2795dfdaa minor cleanup:
wburl: add is_url_query() check
views: add kwargs to J2HtmlCapturesView for better extensibility
query_handler: simplify make_cdx_response() arguments
2014-05-01 11:58:34 -07:00
Ilya Kreymer
9cf5327e88 bufferedreader cleanup:
* BufferedReader defaults to no decompression
* DecompressingBufferedReader defaults to gzip decomp
* ChunkedDataReader defaults to no gzip decomp, but decomp
can be set later via set_decomp().
This allow chunked responses to be de-chunked but not decompressed
(eg for non-text responses)
2014-04-28 20:15:31 -07:00
Ilya Kreymer
53ad67eb9c rewrite: disable one 'top' rewriting rule (should move to seperate mixin)
views: add urlsplit jinja2 filter
2014-04-27 01:04:20 -07:00
Ilya Kreymer
09653cf77e rewrite: more nuanced 'top' rewriting, fix wombat frame mode detection 2014-04-26 18:43:25 -07:00
Ilya Kreymer
53f0cb540f url rewriter: add optional 'full prefix', check and don't rewrite urls
if starting with prefix or full prefix
wbrequest: if no scheme present (shouldn't happen with wsgi) default to http
2014-04-24 10:44:08 -07:00
Ilya Kreymer
2ad41e2b94 rewrite: rewrite data-* attributes if they look like links (http, https, //) 2014-04-22 16:32:36 -07:00
Ilya Kreymer
e1e55ac061 minor tweaks: rewrite 'crossorigin' -> '_crossorigin' param to disable
crossorigin as it may interfere with loading rewritten content, add
tests for html and lxml parsers
add server_cls as optional param to QueryHandler.init_from_config()
for easier customization
views: dont create template if empty template file specified
2014-04-19 12:04:43 -07:00
Ilya Kreymer
23bb5bd175 rewrite: wombat update 2.0! Using Object.defineProperty() to better
override .href and .hash properties when possible.
.href returns original url, but on assignment rewrites before redirecting
.hash proxies to location.hash
Also added:
- window.top -> window.WB_wombat_top
- document.referrer -> document.WB_wombat_referrer
- <source> html tag rewriting
2014-04-18 19:30:48 -07:00
Ilya Kreymer
e011da43f2 live rewrite: use custom REL_REFERER field don't overrie HTTP_REFERER
if REL_REFERER not set, don't send any referrer
2014-04-15 16:44:02 -07:00
Ilya Kreymer
85593696fa remove rfc3987 validation, was rejecting valid urls
add extract_referer_wburl_str() to extract WbUrl str, if any,
from the referrer. Use that for live_rewrite_handler to override
default referrer
2014-04-15 16:38:53 -07:00
Ilya Kreymer
d8c9a803f6 add support for optional proxies (verify set to false for now) 2014-04-13 17:50:26 -07:00
Ilya Kreymer
7636c9d3f7 fix: when reading response, only readline() if previous read()
was non-empty
2014-04-09 16:44:45 -07:00
Ilya Kreymer
bfc2e63793 live rewriter: integrate handler with rewrite_live.py module,
clean up css, add unit and integration tests
clean up cli server now known as 'live-rewrite-server', which performs live rewrite using
iframe paradigm
2014-04-09 15:49:55 -07:00
Ilya Kreymer
19f2df4717 refactor:
- move is_identity(), is_embed() to wburl from wbrequest
- add is_mainpage() predicate
- add create_template() to each J2TemplateView to create itself
- add HeadInsertView to create a reusable head insert for
RewriteContent
- add 'mp_' as modifier for frames mode to be used as possible
  modifier with HTMLRewriter
2014-04-09 15:49:55 -07:00
Ilya Kreymer
2a318527df lxml: use lxml's parse interface instead of feed interface to allow
xml to handle decoding unicode data, better address #36
2014-04-07 17:13:43 -07:00
Ilya Kreymer
d6006acdc3 rewrite: when using lxml parser, just pass raw stream to lxml
without decoding. lxml parser expects to have raw bytes and will determine
encoding on its own. then serve back as utf-8 if no encoding specified.
should address #36
2014-04-06 09:47:34 -07:00
Ilya Kreymer
3aa4a4da7a rewrite: ensure lxml parser closes gracefully on no input 2014-04-03 13:00:22 -07:00
Ilya Kreymer
5dd586cf07 refactor: simplify rewrite_content and replay_views, remove
redundant code.. everything goes through rewrite_content(),
is sanitized (for transfer encoding) if needed
additional testing for decode_buff
fix failed_files bug in resolvingloader, add tests
2014-04-03 12:44:00 -07:00
Ilya Kreymer
91184426b7 test coverage pass:
refactor and cleanup to improve coverage for corner cases
2014-04-02 13:16:54 -07:00
Ilya Kreymer
da0623fbbb lxml: ensure lxml support is optional: if not available,
use_lxml_parser() will return false and doctests/pytest collection
won't test the lxml parser
2014-03-26 14:05:02 -07:00
Ilya Kreymer
2a605652c6 add memento timemap support (for archival mode only)
add timemap Link headers to timegate and memento responses
timemap accessible via /timemap/*/ path
2014-03-24 14:00:06 -07:00
Ilya Kreymer
9654c22bed rewrite: add doctype rewriting, more tests on various markup edge cases 2014-03-23 23:46:49 -07:00
Ilya Kreymer
ac0bf5a415 refactor: IndexReader -> QueryHandler, move query output support
to QueryHandler. allow for multiple query views in QueryHandler
2014-03-23 12:44:28 -07:00
Ilya Kreymer
53590537e0 Merge develop and lxml 2014-03-18 17:14:27 -07:00
Ilya Kreymer
a6b4ae4c47 chardet optimization: using chardet feed() approach to avoid passing in entire buffer 2014-03-17 20:53:42 -07:00
Ilya Kreymer
d1ad9b5e69 refactor: cleanup HTMLRewrtier/LXMLHTMLRewriter close path,
single close in base class delegeating to _internal_close()
Also, HTMLRewriter auto-terminates <script> and <style> tags
for consistency with lxml
2014-03-17 20:50:35 -07:00