1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 00:03:28 +01:00

1901 Commits

Author SHA1 Message Date
Ilya Kreymer
c1be7d4da5 rewrite system refactor:
- rewriter interface accepts RewriteInfo instance
- add StreamingRewriter adapter wraps html, regex rewriters to support rewriting streaming text from general rewriter interface
- add RewriteDASH, RewriteHLS as (non-streaming) rewriters. Need to read contents into buffer (for now)
- add RewriteAMF experimental AMF rewriter
- general rewriting system in BaseContentRewriter, default rewriters configured in DefaultRewriter
- tests: disable banner-only test as not currently support banner only (for now)
2017-05-22 18:52:17 -07:00
Ilya Kreymer
db9d0ae41a new rewriting system!
- new header rewriter
- new extensible content rewriter in urlrewrite.rewriter!
2017-05-22 18:52:17 -07:00
Ilya Kreymer
685804919a aggregator improvements:
- support for 'WARC-Provenance' header added to response
- aggregator supports source collection: if 'name:coll', coll parsed out and stored in 'param.<name>.src_coll' field,
available for use in remote index, included in provenance
- remoteindexsource: support interpolating '{src_coll}' in api_url and replay_url to allow handling src_coll
- recorder: CollectionFilter supports dict of prefixes to filter regexs, and catch-all '*' prefix
- recorder: provenance written to paired request record
- rename: ProxyIndexSource -> UpstreamIndexSource to avoid confusion with actual proxy
- autoapp: register_source() supports adding source classes at beginning of list
2017-05-21 05:37:58 +00:00
Ilya Kreymer
331320b17a aggregator improvements:
- support for 'WARC-Provenance' header added to response
- aggregator supports source collection: if 'name:coll', coll parsed out and stored in 'param.<name>.src_coll' field,
available for use in remote index, included in provenance
- remoteindexsource: support interpolating '{src_coll}' in api_url and replay_url to allow handling src_coll
- recorder: CollectionFilter supports dict of prefixes to filter regexs, and catch-all '*' prefix
- recorder: provenance written to paired request record
- rename: ProxyIndexSource -> UpstreamIndexSource to avoid confusion with actual proxy
- autoapp: register_source() supports adding source classes at beginning of list
2017-05-20 09:50:26 -07:00
Ilya Kreymer
d8f035642b fuzzymatching: add new ext based rule. fuzzy match if url has an ext except those on the 'not_ext' list (#218) 2017-05-19 10:53:09 -07:00
Ilya Kreymer
f0f274c0c9 wb_frame: allow "load" event to pushState() instead of replaceState() if window.pushStateOnLoad.
This is necessary to have working history when running in electron, which does not combine
iframe history into the top-frame history
2017-05-16 17:18:37 -07:00
Ilya Kreymer
d6cfb7cd2d wb_frame/wb.js: don't call push_state() if already on the current state,
eg. if two load events received for different readyState
add document.readyState to load event
2017-05-15 22:26:52 -07:00
Ilya Kreymer
762f669d13 rules: fuzzy match update:
- ignore all query args for flash files
- ignore cb= param for all urls
2017-05-12 08:55:03 -07:00
Ilya Kreymer
94262546d5 integration tests: add fixture to run all relevant tests in framed and non-framed mode
rename test_framed_inverse -> test_memento, remove unneeded test config
2017-05-03 20:05:07 -07:00
Ilya Kreymer
296b4ed94d client-side rewrite: remove WB_wombat_ from any id/class= in document.write() 2017-05-03 15:31:06 -07:00
Ilya Kreymer
7434cb619e config: ensure 'framed_replay' config is loaded again (default to true)
config template overrides: check config for overrides for all templates again
fixes #216
2017-05-02 10:05:11 -07:00
Ilya Kreymer
3fea5288b2 tests: fix memento not found test to use different timegate (webenact) 2017-05-01 21:51:59 -07:00
Ilya Kreymer
147c3217dd update to warcio==1.3
recorder: use ArcWarcRecordLoader() for parsing response record
multifilewarcwriter: ensure digest is computed before trying to lookup revisits
2017-05-01 21:50:39 -07:00
Ilya Kreymer
58f39f0558 setup: update to warcio==1.2
add ensure_http_headers=True when reading WARC records
tests: fix pytest warnings, use webtest.TestApp instead of TestApp
2017-04-29 13:47:54 -07:00
Ilya Kreymer
14af9287dc warc loading tests: use custom __repr__ to match results after latest warcio change (for now) 2017-04-28 15:56:58 -07:00
Ilya Kreymer
74e64e701d py27 fix: add to_native_str() for new url, header usage 2017-04-28 14:40:42 -07:00
Ilya Kreymer
40f4b6bd94 urlrewrite cleanup:
frontendapp: pass properly decoded url from router
rewriterapp: read upstream cdx from Webagg-Cdx header
cleanup unused code
2017-04-28 12:37:24 -07:00
Ilya Kreymer
46e2d27e54 webagg improvements:
- add _get_referrer() access to index source, can pass to loader via cdx['set_referrer']
- make MementoIndexSource more extensible
- move WAYBACK_ORIG_SUFFIX into BaseIndexSource for extensibility
- fix RemoteIndexSource 'closest' not being set, update template to use 'closest' instead of 'timestamp'
- update remote index tests to use 'closest' instead of 'timestamp'
- loader: set referrer via cdx['set_referrer']
- loader: pass cdx to downstream via Webagg-Cdx header
- utils: ParamFormatter also looks for unprefixed key in params
2017-04-28 12:32:45 -07:00
Ilya Kreymer
082487ab3c support per-collection assets again:
- wb-manager added metadata now loaded dynamically, cached, for search and index pages (#196)
- metadata updated w/o restart (#87)
- per-collection template overrides and per-template static file support
tests: test_auto_colls.py fully ported to new system
(per-collection config.yaml no longer supported)
2017-04-26 12:18:36 -07:00
Ilya Kreymer
52dc46fe6a remove obsolete code and tests!
disable test_auto_colls for now until fully supported in new system
2017-04-25 19:39:19 -07:00
Ilya Kreymer
24c968640d fuzzymatcher: better fix for mime-type matching if no mime 2017-04-25 14:48:09 -07:00
Ilya Kreymer
b3bc7765a1 fuzzymatcher fix: don't assume 'mime' is always present 2017-04-25 14:42:49 -07:00
Ilya Kreymer
d32c6d492b tests: disable webagg output tests until they can be stabilized 2017-04-24 16:34:53 -07:00
Ilya Kreymer
478600716d urllib3: use version from requests
coverage: use gevent concurrency
2017-04-24 16:32:23 -07:00
Ilya Kreymer
7ceeb32531 proxy support: update for wsgiprox==1.2, transfer-encoding/buffering support now part of wsgiprox
frame insert: set 'iframe_url' to full rewritten url, or in proxy mode, original url with scheme matching current scheme
2017-04-24 15:08:42 -07:00
Ilya Kreymer
15a7b15d44 proxy mode support via rewriterapp!
- check for 'wsgiprox.fixed_host' and use that as host_prefix if set
- don't include Connection/Proxy-Connection headers in upstram request
- ensure proxy response has length or is chunk-encoded
2017-04-22 18:17:41 -07:00
Ilya Kreymer
e060ea7b56 frontendapp: encapsulate, don't extend rewriterapp
rewriterapp: add 'Content-Location' if fuzzy match, or if using memento
tests: fix test to check for Content-Location for fuzzy match instead of redirect
2017-04-21 15:37:21 -07:00
Ilya Kreymer
4b055c9394 client-rewrite: support proper srcset= attr rewriting 2017-04-21 12:31:56 -07:00
Ilya Kreymer
45869eab42 server-side rewrite: experiment with JSONP rewriter, running on all json content #213
(previous json-rewriting defaulted to none)
2017-04-19 15:42:13 -07:00
Ilya Kreymer
3dd6c442ed client-side rewrite: unrewrite accessing Attr object value/nodeValue for href, src, poster attributes 2017-04-18 11:40:28 -07:00
Ilya Kreymer
8849eb494e client-side: init postMessage override on iframe access 2017-04-17 13:39:41 -07:00
Ilya Kreymer
0c833eb27e client-side rewrite fixes:
- rewrite-blob: more generic removal of postMessage override for worker scripts
- rewrite-style: wrap decodeURIComponent in exception handling
2017-04-15 23:37:07 -07:00
Ilya Kreymer
bc50b908b7 html rewrite: fix <base> tag rewriting
ensure 'rebased' urlrewriter is set to absolute url
tests: add test to verify <base> rewriting, relative and absolute
2017-04-15 12:32:16 -07:00
Ilya Kreymer
79a35bcf9c options: add check for 'enable_memento' option before adding memento headers
pass options to frontend app
2017-04-15 08:32:20 -07:00
Ilya Kreymer
bae9a09671 client-side Date override: override 'constructor' property so 'new Date().constructor == Date' 2017-04-14 09:21:29 -07:00
Ilya Kreymer
f593b5f80f trailing slash fix: add trailing slash, preserving query, if no slash present after hostname (#211) 2017-04-04 18:10:49 -07:00
Ilya Kreymer
7ca5795976 ensure trailing slash: redirect to ensure a host-only url has a trailing slash, eg. /live/http://example.com -> /live/http://example.com/ 2017-04-04 15:41:03 -07:00
Ilya Kreymer
26662f7df3 setup: generate current git_hash into autogenerated 'pywb.git_hash' file, add to .gitignore 2017-03-28 10:31:43 -07:00
Ilya Kreymer
69af57dedf js regex rewrite: fix tertiary op rewrite, remove commented out regexs, add a few more tests 2017-03-21 11:50:40 -07:00
Ilya Kreymer
15ad56c024 rewrite dash: support for using custom rewriting function (for FB)
rewrite_fb_dash() added for rewriting dash xml, embedded in js, embedded in html
todo: refactor to make more general support for custom rewriting functions
regex_rewriter: add ':' to exclude from rewrite again
2017-03-21 11:18:53 -07:00
Ilya Kreymer
a20480b9ab wombat rewrite: rewrite href="data:text/css" using rewrite_style()
rewrite_style fix: replace all 'WB_wombat_' in text not just first once
2017-03-21 11:17:15 -07:00
Ilya Kreymer
55def50de7 rewriterapp: readd range: only convert to 206 if response is 200 2017-03-21 18:13:34 +00:00
Ilya Kreymer
5671017e8f rewrite: add rewrite_dash.py for DASH and HLS rewriting 2017-03-20 15:15:00 -07:00
Ilya Kreymer
a82cfc1ab2 rewriter: add rewrite_dash for rewriting DASH and HLS manifests!
rewriter: refactor to use mixins to extend base rewriter (todo: more refactoring)
fuzzy-matcher: support for additional 'match_filters' to filter fuzzy results via optional regexes by mime type,
eg. allow more lenient fuzzy matching on DASH manifests than other resources (for now)
fuzzy-matching: add WebAgg-Fuzzy-Match response header if response is fuzzy matched, redirect to exact match in rewriterapp
2017-03-20 14:41:12 -07:00
Ilya Kreymer
22edb2f14b frontendapp: fix error response return 2017-03-18 16:52:13 -07:00
Ilya Kreymer
0937c2b58f recorder tests: fix revisit/skip tests by switching from httpbin.org/get to httpbin/user-agent,
as /get now inserting random request id and not returning any duplicates
2017-03-18 10:34:28 -07:00
Ilya Kreymer
037fca5b78 tests: fix rewrite test for srcset 2017-03-15 11:43:40 -07:00
Ilya Kreymer
c421b1c5ea html rewriter: srcset rewrite: don't add extra space 2017-03-15 11:15:20 -07:00
Ilya Kreymer
1344907032 wombat fixes: message listener fixes for multiple listeners
- don't reject multiple listeners
- create new WrappedListener() obj for each listener
- extract_orig() add current scheme if url starts with '//'
2017-03-15 11:14:04 -07:00
Ilya Kreymer
93f26452e5 wombat fixes:
- add service worker rewrite
- add documentURI rewrite
- allow history change from "about:blank"
2017-03-14 18:28:18 -07:00