1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-29 00:52:29 +01:00

20 Commits

Author SHA1 Message Date
Ilya Kreymer
e4bcef1c8b rewrite: default HTMLParser entityref and charref are treated as plain
data for HTMLRewriter, since they are never rewritten, and to avoid
semicolon ambiguity, since no way to determine if there is a ; or not
at end. Addresses #43
2014-11-04 12:14:00 -08:00
Ilya Kreymer
4f9310fe4d rewrite: add support for js rewriting ';http:\\/' urls
add 'parse_comments' rule options for parsing comment contents via regex
banner: simplify banner insertion check, only insert for top frame, and check
for canon_url matching current href at top before redirecting to top
replace em_ -> mp_ as default embedded mod
2014-08-05 01:47:52 -07:00
Ilya Kreymer
6e6688beb3 rewrite/testing: add additional test for live rewrite post, invalid post
htmlrewrite: annotate untestable sections (unimplemented, 2.6 only exceptions)
2014-08-04 22:51:43 -07:00
Ilya Kreymer
dd9f138bab disable decoding, by default, of content for html parser 2014-06-27 16:53:33 -07:00
Ilya Kreymer
d7516f4cd7 rewrite: fix <base> rewriting, urlrewriter replacement
turn off lxml rewriter by default
2014-06-13 16:44:37 -07:00
Ilya Kreymer
1d674d97d8 pep8 pass! 2014-05-16 22:44:26 -07:00
Ilya Kreymer
2ad41e2b94 rewrite: rewrite data-* attributes if they look like links (http, https, //) 2014-04-22 16:32:36 -07:00
Ilya Kreymer
e1e55ac061 minor tweaks: rewrite 'crossorigin' -> '_crossorigin' param to disable
crossorigin as it may interfere with loading rewritten content, add
tests for html and lxml parsers
add server_cls as optional param to QueryHandler.init_from_config()
for easier customization
views: dont create template if empty template file specified
2014-04-19 12:04:43 -07:00
Ilya Kreymer
23bb5bd175 rewrite: wombat update 2.0! Using Object.defineProperty() to better
override .href and .hash properties when possible.
.href returns original url, but on assignment rewrites before redirecting
.hash proxies to location.hash
Also added:
- window.top -> window.WB_wombat_top
- document.referrer -> document.WB_wombat_referrer
- <source> html tag rewriting
2014-04-18 19:30:48 -07:00
Ilya Kreymer
bfc2e63793 live rewriter: integrate handler with rewrite_live.py module,
clean up css, add unit and integration tests
clean up cli server now known as 'live-rewrite-server', which performs live rewrite using
iframe paradigm
2014-04-09 15:49:55 -07:00
Ilya Kreymer
19f2df4717 refactor:
- move is_identity(), is_embed() to wburl from wbrequest
- add is_mainpage() predicate
- add create_template() to each J2TemplateView to create itself
- add HeadInsertView to create a reusable head insert for
RewriteContent
- add 'mp_' as modifier for frames mode to be used as possible
  modifier with HTMLRewriter
2014-04-09 15:49:55 -07:00
Ilya Kreymer
d1ad9b5e69 refactor: cleanup HTMLRewrtier/LXMLHTMLRewriter close path,
single close in base class delegeating to _internal_close()
Also, HTMLRewriter auto-terminates <script> and <style> tags
for consistency with lxml
2014-03-17 20:50:35 -07:00
Ilya Kreymer
f35e82a4d5 ensure final output from close() is encoded!
add config option to 'use_lxml_parser' if available, if not,
will default to regular parser
testing on travis with lxml (not adding to dep yet)
2014-03-17 13:19:51 -07:00
Ilya Kreymer
1404177c6f fixes for unicode (doctests)
remove explicit </html> since lxml does not parse past the </html>
tag and adds one anyway (not ideal but only workaround for html after closing tag)
2014-03-17 11:55:45 -07:00
Ilya Kreymer
bd10c6c2d2 first pass -- lxml parser! 2014-03-16 23:12:04 -07:00
Ilya Kreymer
a69d565af5 make pywb.rewrite package pep8-compatible
move doctests to test subdir
2014-03-14 16:44:23 -07:00
Ilya Kreymer
584d826f05 rewrite: fix html rewriting, if forcing end </script>, </style>,
don't actually output to preserve original
wombat: copy over all Location settings
wburl: convert :/ -> :// if 2nd slash missing, only check for <scheme>:/
and ignore subsequent slashes
2014-03-08 15:10:35 -08:00
Ilya Kreymer
3718e1d21b rewrite fixes: html_rewriter do not unescape attrs!
rules: don't rewrite past end of block or line
2014-03-06 02:29:52 -08:00
Ilya Kreymer
cc22448cc5 fixes for 2.6 and pypy 2014-03-04 19:11:17 -08:00
Ilya Kreymer
5345459298 pywb 0.2!
move to distinct packages: pywb.utils, pywb.cdx, pywb.warc, pywb.util, pywb.rewrite!
each package will have its own README and tests
shared sample_data and install
2014-02-17 10:01:09 -08:00