1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

1708 Commits

Author SHA1 Message Date
Ilya Kreymer
a82cfc1ab2 rewriter: add rewrite_dash for rewriting DASH and HLS manifests!
rewriter: refactor to use mixins to extend base rewriter (todo: more refactoring)
fuzzy-matcher: support for additional 'match_filters' to filter fuzzy results via optional regexes by mime type,
eg. allow more lenient fuzzy matching on DASH manifests than other resources (for now)
fuzzy-matching: add WebAgg-Fuzzy-Match response header if response is fuzzy matched, redirect to exact match in rewriterapp
2017-03-20 14:41:12 -07:00
Ilya Kreymer
22edb2f14b frontendapp: fix error response return 2017-03-18 16:52:13 -07:00
Ilya Kreymer
0937c2b58f recorder tests: fix revisit/skip tests by switching from httpbin.org/get to httpbin/user-agent,
as /get now inserting random request id and not returning any duplicates
2017-03-18 10:34:28 -07:00
Ilya Kreymer
037fca5b78 tests: fix rewrite test for srcset 2017-03-15 11:43:40 -07:00
Ilya Kreymer
c421b1c5ea html rewriter: srcset rewrite: don't add extra space 2017-03-15 11:15:20 -07:00
Ilya Kreymer
1344907032 wombat fixes: message listener fixes for multiple listeners
- don't reject multiple listeners
- create new WrappedListener() obj for each listener
- extract_orig() add current scheme if url starts with '//'
2017-03-15 11:14:04 -07:00
Ilya Kreymer
93f26452e5 wombat fixes:
- add service worker rewrite
- add documentURI rewrite
- allow history change from "about:blank"
2017-03-14 18:28:18 -07:00
Ilya Kreymer
20e49c7391 karma fixes: avoid accessing undef var 2017-03-14 12:28:13 -07:00
Ilya Kreymer
8ddf43684f karma: add stack trace 2017-03-14 12:14:04 -07:00
Ilya Kreymer
09a0779abb fix karma test for wombat change 2017-03-14 11:59:28 -07:00
Ilya Kreymer
a76dbefec2 regex rewrite: loosen rules for top & location rewrite, add tests
.WB_wombat_location and .WB_wombat_top overrides should help with less strict rewriting
2017-03-14 11:44:15 -07:00
Ilya Kreymer
0f0c20a03a fuzzy matching: new, clean fuzzy matcher implementation for webagg
rules: default rule: fuzzy match urls ignoring prefix match (needs more testing)
tests: update tests for new broad fuzzy match rule
2017-03-14 11:44:15 -07:00
Ilya Kreymer
e0878f0f67 wombat: reinit paths if inited via new window creation/iframe to reflect correct url!
refactor wombat into single _WBWombat object
2017-03-14 11:44:09 -07:00
Ilya Kreymer
8fe2c1b5bd apps & cli: remove old apps, keep:
- webagg-server
- wayback
- live-rewrite-server
support adding custom settings to AutoApp
support for --live flag that automatically adds live-web source at '/live'
tests: disable cdx_server tests as old cdx_server removed
2017-03-12 12:21:54 -07:00
Ilya Kreymer
ac84dcc2e3 setup: cleanup deps: remove urllib3 (installed by requests), add werkzeug to core deps 2017-03-12 12:21:23 -07:00
Ilya Kreymer
57eba8fcde client side rewrite: add override for window.frames access 2017-03-12 09:47:29 -07:00
Ilya Kreymer
cab1c43473 live: switch live-rewrite-server to new arch, remove old live_rewrite_server.py 2017-03-10 14:15:02 -08:00
Ilya Kreymer
544df71302 setup: use latest webtest again
tests: use geventwebserver for LiveServerTests instead of separate process
2017-03-10 11:19:27 -08:00
Ilya Kreymer
baa248c502 responseloader: for py2, look at the original header line only 2017-03-10 11:16:05 -08:00
Ilya Kreymer
d04f8fc2e3 recorder: cookie filter:
- update ExcludeSpecificHeaders() to be passed directly as a filter to warcio
- add ExcludeHttpOnlyCookiesHeader() to exclude only Set-Cookie if HttpOnly is present
remove unused code
2017-03-10 10:07:13 -08:00
Ilya Kreymer
7a8fed2681 update to wario1.1
archiveindexer: explicitly consume content for each record
2017-03-10 10:05:39 -08:00
Ilya Kreymer
af7bbfd6e1 build: update gevent, support py3.6 2017-03-09 11:59:54 -08:00
Ilya Kreymer
d4321792b7 tests: convert test_inputreq to use werkzeug (same as the app), remove bottle from test dependencies 2017-03-08 23:09:19 -08:00
Ilya Kreymer
e86e3e6d32 build process: simplify build process by moving essential deps to requirements.txt, and extras to extra_requirements.txt
setup.py just loads from requirements.txt
Dockerfile pip installs requirements, then extra requirements for improved cacheing
travis runs setup install, then installs extra requirements
2017-03-08 17:05:29 -08:00
Ilya Kreymer
738fc0e427 Merge pull request #209 from ikreymer/warcio-split
Warcio split
2017-03-08 16:35:08 -08:00
Ilya Kreymer
98c0475806 test: fix test to use closest='now' for live test 2017-03-08 12:50:51 -08:00
Ilya Kreymer
a2ffbde2f6 dockerfile: add portalocker
rewriterapp: don't add memento headers for ajax responses to avoid replay issues
2017-03-08 12:30:20 -08:00
Ilya Kreymer
0784e4e5aa spin-off warcio!
update imports to point to warcio
warcio rename fixes:
- ArcWarcRecord.stream -> raw_stream
- ArcWarcRecord.status_headers -> http_headers
- ArchiveLoadFailed single param init
2017-03-07 10:58:00 -08:00
Ilya Kreymer
4a94699a65 warc refactor: ArchiveLoadFailed no longer derived from WbException
catch separately, set status to 503 Archive Not Available explicitly
2017-03-01 18:02:35 -08:00
Ilya Kreymer
2b3fde028f refactor: split LimitReader into limitreader.py 2017-03-01 15:13:32 -08:00
Ilya Kreymer
b7285b1a77 refactor: split off BlockLoader support into BlockArcWarcRecordLoader, plain ArcWarcRecordLoader only includes parse_record_stream(), no load()
use BlockArcWarcRecordLoader() only when needed for replay
2017-03-01 14:57:44 -08:00
Ilya Kreymer
1213466afb warc & recorder refactor: split BaseWARCWriter from MultiWARCWriter, move to warc/warcwriter.py, recorder/multifilewarcwriter.py
split indexing functionality from base warc iterator, move to archiveindexer.py
2017-03-01 14:18:44 -08:00
Ilya Kreymer
3faa55906a warcwriter: attempt to separate warc writing semantics from the recorder
use StatusAndHeaders instead of requests CaseInsensitiveDict for consistency
refactor writer api: create_warc_record() for creating new record
copy_warc_record() for copying a full record from a stream
add writer tests, separate from recorder
2017-03-01 12:50:32 -08:00
Ilya Kreymer
c66d251a90 warc: make ArchiveIterator an actual iterator
warc indexing test: add test for reading warc with interspersed empty gzip records, ensure they are ignored
2017-03-01 12:48:06 -08:00
Ilya Kreymer
114ef2a637 autoapp: add OSError for py2.7 2017-02-27 22:13:59 -08:00
Ilya Kreymer
a4b770d34e new-pywb refactor!
frontendapp compatibility
- add support for separate not found page for 404s (not_found.html)
- support for exception handling with error template (error.html)
- support for home page (index.html)
- add memento headers for replay
- add referrer fallback check
- tests: port integration tests for front-end replay, cdx server
- not included: proxy mode, exact redirect mode, non-framed replay
- move unused tests to tests_disabled
- cli: add optional werkzeug profiler with --profile flag
2017-02-27 19:07:51 -08:00
Ilya Kreymer
0dbc803422 webagg: default 'memento+' config to /timemap/link/ instead of /timemap/*/ for greater compatibility 2017-02-26 12:18:31 -08:00
Ilya Kreymer
91e45be75d rewriteinputreq: ensure path is not blank, default to '/' 2017-02-24 15:00:37 -08:00
Ilya Kreymer
070ecca7af redis multi-key index source optimization: support optional 'member_key_templ' for retrieving members instead of using scan_iter()
add scan_keys() which checks member key if provided, otherwise falls back to scan_iter()
2017-02-21 11:25:18 -08:00
Ilya Kreymer
35c298011a rewriterapp: buffer upstream source while rewriting! 2017-02-19 20:49:51 -08:00
Ilya Kreymer
2a7da54be9 Merge branch 'master' into new-pywb 2017-02-17 18:49:01 -08:00
Ilya Kreymer
31bf7a47f1 new-wayback cli script, using new FrontEndApp (rewriting) + AutoConfigApp (config-driven aggregator)
support for dynamic collections: check all .cdxj files in /<coll>/indexes/*.cdxj when accessing /<coll>
support for fixed routes: specified in config.yaml as per https://github.com/ikreymer/pywb/wiki/Distributed-Archive-Config
werkzeug routing in FrontEndApp: default query, replay, search pages working
route listing: /_coll_info.json for listing fixed + dynamic routes
autoindexing enabled, indexing WARCs added to archives directory to .cdxj index
Addresses #196
2017-02-17 18:04:07 -08:00
Ilya Kreymer
60f3c0a213 setup: further update trove classifiers, add python 2 and 3, switch to production stable, closes #208 2017-02-16 15:12:22 -08:00
Ilya Kreymer
77960c1311 setup: bound webtest for 2.6 support 2017-02-16 14:32:14 -08:00
Ilya Kreymer
7f95396be0 setup: use jinja2<2.9 for now 2017-02-16 11:29:30 -08:00
Ilya Kreymer
14e1dbb268 update CHANGELIST for 0.33.1 2017-02-16 11:09:31 -08:00
Ilya Kreymer
4f6fa3ffd8 client-rewrite: disable eval() override for now, needs more testing 2017-02-16 11:04:42 -08:00
Ilya Kreymer
58b141bd53 add python 3 classifiers (#208) 2017-02-16 11:02:53 -08:00
Ilya Kreymer
531422fc1b client-side rewrite improvements:
- add overrides for document.URL, xhr.responseURL, function for general single property override
- postMessage: add overrides for additional MessageEvent properties, target, srcElement, path, eventPhase
- postMessage: avoid duplicate event listeners registered
- check for duplicate postMessage override inits
2017-02-15 17:03:15 -08:00
Ilya Kreymer
a5bc932e0c memento agg test: fix test to reflect change from link->* 2017-02-06 21:17:48 -05:00