1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-24 23:19:52 +01:00

12 Commits

Author SHA1 Message Date
Ilya Kreymer
4f9310fe4d rewrite: add support for js rewriting ';http:\\/' urls
add 'parse_comments' rule options for parsing comment contents via regex
banner: simplify banner insertion check, only insert for top frame, and check
for canon_url matching current href at top before redirecting to top
replace em_ -> mp_ as default embedded mod
2014-08-05 01:47:52 -07:00
Ilya Kreymer
de65b68edc rules: additions to rules for FB 2014-06-18 16:45:54 -07:00
Ilya Kreymer
e2349a74e2 replay: better POST support via post query append!
record_loader can optionally parse 'request' records
archiveindexer has -a flag to write all records ('request' included),
-p flag to append post query
post-test.warc.gz and cdx
POST redirects using 307
2014-06-10 19:21:46 -07:00
Ilya Kreymer
7d236af7d7 cdx: fix creation and add test for non-surt cdx (pywb-nonsurt/ test)
archiveindexer: -u option to generate non-surt cdx
tests: full test coverage for cdxdomainspecific (fuzzy and custom canon)
2014-05-16 21:16:50 -07:00
Ilya Kreymer
6eef0afb86 add new custom rewriting rule (flickr) 2014-04-20 21:40:27 -07:00
Ilya Kreymer
2c74ea9f23 fuzzy match: make filter string optionally overridable
setup.py: unset PYWB_CONFIG_ENV
2014-03-27 21:43:30 -07:00
Ilya Kreymer
68878fa72a update domain-specific rules to make flickr replay work better! 2014-03-08 15:53:52 -08:00
Ilya Kreymer
4e71a0b772 better rules.yaml fix 2014-03-06 02:51:54 -08:00
Ilya Kreymer
3718e1d21b rewrite fixes: html_rewriter do not unescape attrs!
rules: don't rewrite past end of block or line
2014-03-06 02:29:52 -08:00
Ilya Kreymer
1e3ef6ec5c cdx: add basic test for CustomUrlCanonicalizer for now
(will likely refactor this configuration)
2014-02-28 09:40:51 -08:00
Ilya Kreymer
22f1f78fca cdx: clean up filters, add '~' modifier for contains
rules: fix regex to be lazy not greedy, turn off unneeded custom
canonicalizer (need tests for custom canon)
cleanup fuzzy match query
fix data package in setup.py
2014-02-27 18:22:10 +00:00
Ilya Kreymer
5a41f59f39 new unified config system, via rules.yaml!
contains configs for cdx canon, fuzzy matching and rewriting!
rewriting: ability to add custom regexs per domain
also, ability to toggle js rewriting and custom rewriting file
(default is wombat.js)
2014-02-26 18:02:01 -08:00