1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

1940 Commits

Author SHA1 Message Date
Ilya Kreymer
3fea5288b2 tests: fix memento not found test to use different timegate (webenact) 2017-05-01 21:51:59 -07:00
Ilya Kreymer
147c3217dd update to warcio==1.3
recorder: use ArcWarcRecordLoader() for parsing response record
multifilewarcwriter: ensure digest is computed before trying to lookup revisits
2017-05-01 21:50:39 -07:00
Ilya Kreymer
58f39f0558 setup: update to warcio==1.2
add ensure_http_headers=True when reading WARC records
tests: fix pytest warnings, use webtest.TestApp instead of TestApp
2017-04-29 13:47:54 -07:00
Ilya Kreymer
14af9287dc warc loading tests: use custom __repr__ to match results after latest warcio change (for now) 2017-04-28 15:56:58 -07:00
Ilya Kreymer
74e64e701d py27 fix: add to_native_str() for new url, header usage 2017-04-28 14:40:42 -07:00
Ilya Kreymer
40f4b6bd94 urlrewrite cleanup:
frontendapp: pass properly decoded url from router
rewriterapp: read upstream cdx from Webagg-Cdx header
cleanup unused code
2017-04-28 12:37:24 -07:00
Ilya Kreymer
46e2d27e54 webagg improvements:
- add _get_referrer() access to index source, can pass to loader via cdx['set_referrer']
- make MementoIndexSource more extensible
- move WAYBACK_ORIG_SUFFIX into BaseIndexSource for extensibility
- fix RemoteIndexSource 'closest' not being set, update template to use 'closest' instead of 'timestamp'
- update remote index tests to use 'closest' instead of 'timestamp'
- loader: set referrer via cdx['set_referrer']
- loader: pass cdx to downstream via Webagg-Cdx header
- utils: ParamFormatter also looks for unprefixed key in params
2017-04-28 12:32:45 -07:00
Ilya Kreymer
082487ab3c support per-collection assets again:
- wb-manager added metadata now loaded dynamically, cached, for search and index pages (#196)
- metadata updated w/o restart (#87)
- per-collection template overrides and per-template static file support
tests: test_auto_colls.py fully ported to new system
(per-collection config.yaml no longer supported)
2017-04-26 12:18:36 -07:00
Ilya Kreymer
52dc46fe6a remove obsolete code and tests!
disable test_auto_colls for now until fully supported in new system
2017-04-25 19:39:19 -07:00
Ilya Kreymer
24c968640d fuzzymatcher: better fix for mime-type matching if no mime 2017-04-25 14:48:09 -07:00
Ilya Kreymer
b3bc7765a1 fuzzymatcher fix: don't assume 'mime' is always present 2017-04-25 14:42:49 -07:00
Ilya Kreymer
d32c6d492b tests: disable webagg output tests until they can be stabilized 2017-04-24 16:34:53 -07:00
Ilya Kreymer
478600716d urllib3: use version from requests
coverage: use gevent concurrency
2017-04-24 16:32:23 -07:00
Ilya Kreymer
7ceeb32531 proxy support: update for wsgiprox==1.2, transfer-encoding/buffering support now part of wsgiprox
frame insert: set 'iframe_url' to full rewritten url, or in proxy mode, original url with scheme matching current scheme
2017-04-24 15:08:42 -07:00
Ilya Kreymer
15a7b15d44 proxy mode support via rewriterapp!
- check for 'wsgiprox.fixed_host' and use that as host_prefix if set
- don't include Connection/Proxy-Connection headers in upstram request
- ensure proxy response has length or is chunk-encoded
2017-04-22 18:17:41 -07:00
Ilya Kreymer
e060ea7b56 frontendapp: encapsulate, don't extend rewriterapp
rewriterapp: add 'Content-Location' if fuzzy match, or if using memento
tests: fix test to check for Content-Location for fuzzy match instead of redirect
2017-04-21 15:37:21 -07:00
Ilya Kreymer
4b055c9394 client-rewrite: support proper srcset= attr rewriting 2017-04-21 12:31:56 -07:00
Ilya Kreymer
45869eab42 server-side rewrite: experiment with JSONP rewriter, running on all json content #213
(previous json-rewriting defaulted to none)
2017-04-19 15:42:13 -07:00
Ilya Kreymer
3dd6c442ed client-side rewrite: unrewrite accessing Attr object value/nodeValue for href, src, poster attributes 2017-04-18 11:40:28 -07:00
Ilya Kreymer
8849eb494e client-side: init postMessage override on iframe access 2017-04-17 13:39:41 -07:00
Ilya Kreymer
0c833eb27e client-side rewrite fixes:
- rewrite-blob: more generic removal of postMessage override for worker scripts
- rewrite-style: wrap decodeURIComponent in exception handling
2017-04-15 23:37:07 -07:00
Ilya Kreymer
bc50b908b7 html rewrite: fix <base> tag rewriting
ensure 'rebased' urlrewriter is set to absolute url
tests: add test to verify <base> rewriting, relative and absolute
2017-04-15 12:32:16 -07:00
Ilya Kreymer
79a35bcf9c options: add check for 'enable_memento' option before adding memento headers
pass options to frontend app
2017-04-15 08:32:20 -07:00
Ilya Kreymer
bae9a09671 client-side Date override: override 'constructor' property so 'new Date().constructor == Date' 2017-04-14 09:21:29 -07:00
Ilya Kreymer
f593b5f80f trailing slash fix: add trailing slash, preserving query, if no slash present after hostname (#211) 2017-04-04 18:10:49 -07:00
Ilya Kreymer
7ca5795976 ensure trailing slash: redirect to ensure a host-only url has a trailing slash, eg. /live/http://example.com -> /live/http://example.com/ 2017-04-04 15:41:03 -07:00
Ilya Kreymer
26662f7df3 setup: generate current git_hash into autogenerated 'pywb.git_hash' file, add to .gitignore 2017-03-28 10:31:43 -07:00
Ilya Kreymer
69af57dedf js regex rewrite: fix tertiary op rewrite, remove commented out regexs, add a few more tests 2017-03-21 11:50:40 -07:00
Ilya Kreymer
15ad56c024 rewrite dash: support for using custom rewriting function (for FB)
rewrite_fb_dash() added for rewriting dash xml, embedded in js, embedded in html
todo: refactor to make more general support for custom rewriting functions
regex_rewriter: add ':' to exclude from rewrite again
2017-03-21 11:18:53 -07:00
Ilya Kreymer
a20480b9ab wombat rewrite: rewrite href="data:text/css" using rewrite_style()
rewrite_style fix: replace all 'WB_wombat_' in text not just first once
2017-03-21 11:17:15 -07:00
Ilya Kreymer
55def50de7 rewriterapp: readd range: only convert to 206 if response is 200 2017-03-21 18:13:34 +00:00
Ilya Kreymer
5671017e8f rewrite: add rewrite_dash.py for DASH and HLS rewriting 2017-03-20 15:15:00 -07:00
Ilya Kreymer
a82cfc1ab2 rewriter: add rewrite_dash for rewriting DASH and HLS manifests!
rewriter: refactor to use mixins to extend base rewriter (todo: more refactoring)
fuzzy-matcher: support for additional 'match_filters' to filter fuzzy results via optional regexes by mime type,
eg. allow more lenient fuzzy matching on DASH manifests than other resources (for now)
fuzzy-matching: add WebAgg-Fuzzy-Match response header if response is fuzzy matched, redirect to exact match in rewriterapp
2017-03-20 14:41:12 -07:00
Ilya Kreymer
22edb2f14b frontendapp: fix error response return 2017-03-18 16:52:13 -07:00
Ilya Kreymer
0937c2b58f recorder tests: fix revisit/skip tests by switching from httpbin.org/get to httpbin/user-agent,
as /get now inserting random request id and not returning any duplicates
2017-03-18 10:34:28 -07:00
Ilya Kreymer
037fca5b78 tests: fix rewrite test for srcset 2017-03-15 11:43:40 -07:00
Ilya Kreymer
c421b1c5ea html rewriter: srcset rewrite: don't add extra space 2017-03-15 11:15:20 -07:00
Ilya Kreymer
1344907032 wombat fixes: message listener fixes for multiple listeners
- don't reject multiple listeners
- create new WrappedListener() obj for each listener
- extract_orig() add current scheme if url starts with '//'
2017-03-15 11:14:04 -07:00
Ilya Kreymer
93f26452e5 wombat fixes:
- add service worker rewrite
- add documentURI rewrite
- allow history change from "about:blank"
2017-03-14 18:28:18 -07:00
Ilya Kreymer
20e49c7391 karma fixes: avoid accessing undef var 2017-03-14 12:28:13 -07:00
Ilya Kreymer
8ddf43684f karma: add stack trace 2017-03-14 12:14:04 -07:00
Ilya Kreymer
09a0779abb fix karma test for wombat change 2017-03-14 11:59:28 -07:00
Ilya Kreymer
a76dbefec2 regex rewrite: loosen rules for top & location rewrite, add tests
.WB_wombat_location and .WB_wombat_top overrides should help with less strict rewriting
2017-03-14 11:44:15 -07:00
Ilya Kreymer
0f0c20a03a fuzzy matching: new, clean fuzzy matcher implementation for webagg
rules: default rule: fuzzy match urls ignoring prefix match (needs more testing)
tests: update tests for new broad fuzzy match rule
2017-03-14 11:44:15 -07:00
Ilya Kreymer
e0878f0f67 wombat: reinit paths if inited via new window creation/iframe to reflect correct url!
refactor wombat into single _WBWombat object
2017-03-14 11:44:09 -07:00
Ilya Kreymer
8fe2c1b5bd apps & cli: remove old apps, keep:
- webagg-server
- wayback
- live-rewrite-server
support adding custom settings to AutoApp
support for --live flag that automatically adds live-web source at '/live'
tests: disable cdx_server tests as old cdx_server removed
2017-03-12 12:21:54 -07:00
Ilya Kreymer
ac84dcc2e3 setup: cleanup deps: remove urllib3 (installed by requests), add werkzeug to core deps 2017-03-12 12:21:23 -07:00
Ilya Kreymer
57eba8fcde client side rewrite: add override for window.frames access 2017-03-12 09:47:29 -07:00
Ilya Kreymer
cab1c43473 live: switch live-rewrite-server to new arch, remove old live_rewrite_server.py 2017-03-10 14:15:02 -08:00
Ilya Kreymer
544df71302 setup: use latest webtest again
tests: use geventwebserver for LiveServerTests instead of separate process
2017-03-10 11:19:27 -08:00