1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

176 Commits

Author SHA1 Message Date
Ilya Kreymer
97182b71b7 refactor:
- merge pywb.urlrewrite -> pywb.rewrite, remove obsolete stuff (rewrite_content.py, rewrite_live.py, dsrules.py)
- move wbrequestresponse -> pywb.apps
- move pywb.webapp.handlers -> pywb.apps.static_handler
- remove pywb.webapp, pywb.framework packages
- disable old header_rewriter, content_rewriter tests
- finish renaming from previous warcserver refactor
- all other tests passing!
2017-05-23 19:08:29 -07:00
Ilya Kreymer
d8b67319e1 rewrite refactoring:
- rewrite headers after content to ensure content-length/content-encoding rewritten if content modified
- header rewriter: remove proxyrewriter, set default rule to 'prefix' or 'keep' if url rewriting or not
- set is_content_rw if record.content_stream(), assume content is modified
- add BufferedRewriter as base for dash, hls, amf rewriting which processes the full stream
- should_rw_content() determines if should attempt content rewriting
- support banner-only insert mode: added HTMLInsertOnlyRewriter, enable if no custom JS rules
- test: enable banner-only test mode
2017-05-22 18:52:17 -07:00
Ilya Kreymer
c1be7d4da5 rewrite system refactor:
- rewriter interface accepts RewriteInfo instance
- add StreamingRewriter adapter wraps html, regex rewriters to support rewriting streaming text from general rewriter interface
- add RewriteDASH, RewriteHLS as (non-streaming) rewriters. Need to read contents into buffer (for now)
- add RewriteAMF experimental AMF rewriter
- general rewriting system in BaseContentRewriter, default rewriters configured in DefaultRewriter
- tests: disable banner-only test as not currently support banner only (for now)
2017-05-22 18:52:17 -07:00
Ilya Kreymer
94262546d5 integration tests: add fixture to run all relevant tests in framed and non-framed mode
rename test_framed_inverse -> test_memento, remove unneeded test config
2017-05-03 20:05:07 -07:00
Ilya Kreymer
58f39f0558 setup: update to warcio==1.2
add ensure_http_headers=True when reading WARC records
tests: fix pytest warnings, use webtest.TestApp instead of TestApp
2017-04-29 13:47:54 -07:00
Ilya Kreymer
082487ab3c support per-collection assets again:
- wb-manager added metadata now loaded dynamically, cached, for search and index pages (#196)
- metadata updated w/o restart (#87)
- per-collection template overrides and per-template static file support
tests: test_auto_colls.py fully ported to new system
(per-collection config.yaml no longer supported)
2017-04-26 12:18:36 -07:00
Ilya Kreymer
52dc46fe6a remove obsolete code and tests!
disable test_auto_colls for now until fully supported in new system
2017-04-25 19:39:19 -07:00
Ilya Kreymer
e060ea7b56 frontendapp: encapsulate, don't extend rewriterapp
rewriterapp: add 'Content-Location' if fuzzy match, or if using memento
tests: fix test to check for Content-Location for fuzzy match instead of redirect
2017-04-21 15:37:21 -07:00
Ilya Kreymer
f593b5f80f trailing slash fix: add trailing slash, preserving query, if no slash present after hostname (#211) 2017-04-04 18:10:49 -07:00
Ilya Kreymer
a82cfc1ab2 rewriter: add rewrite_dash for rewriting DASH and HLS manifests!
rewriter: refactor to use mixins to extend base rewriter (todo: more refactoring)
fuzzy-matcher: support for additional 'match_filters' to filter fuzzy results via optional regexes by mime type,
eg. allow more lenient fuzzy matching on DASH manifests than other resources (for now)
fuzzy-matching: add WebAgg-Fuzzy-Match response header if response is fuzzy matched, redirect to exact match in rewriterapp
2017-03-20 14:41:12 -07:00
Ilya Kreymer
0f0c20a03a fuzzy matching: new, clean fuzzy matcher implementation for webagg
rules: default rule: fuzzy match urls ignoring prefix match (needs more testing)
tests: update tests for new broad fuzzy match rule
2017-03-14 11:44:15 -07:00
Ilya Kreymer
8fe2c1b5bd apps & cli: remove old apps, keep:
- webagg-server
- wayback
- live-rewrite-server
support adding custom settings to AutoApp
support for --live flag that automatically adds live-web source at '/live'
tests: disable cdx_server tests as old cdx_server removed
2017-03-12 12:21:54 -07:00
Ilya Kreymer
a4b770d34e new-pywb refactor!
frontendapp compatibility
- add support for separate not found page for 404s (not_found.html)
- support for exception handling with error template (error.html)
- support for home page (index.html)
- add memento headers for replay
- add referrer fallback check
- tests: port integration tests for front-end replay, cdx server
- not included: proxy mode, exact redirect mode, non-framed replay
- move unused tests to tests_disabled
- cli: add optional werkzeug profiler with --profile flag
2017-02-27 19:07:51 -08:00
Ilya Kreymer
50a3353da3 wsgi server: default to gevent-based wsgi server for all cmd line server apps, add -s command for specifying server #201
cli: add 'webagg-server' cli command for running new webagg system
tests: fix cli test for gevent server
2016-12-09 16:46:33 -08:00
Ilya Kreymer
ab77c1b6d9 refactor autoindex: switch to gevent-based simple polling, as watchdog doesn't work with gevent #200 2016-11-11 10:31:48 -08:00
Ilya Kreymer
66ca8d8b26 http block loader: raise exception for 4xx, 5xx responses
tests: add tests for limitreader posting, fix charset for frame test
2016-07-31 12:56:00 -04:00
Ilya Kreymer
c8c0cecda3 rewrite improvements: if content-type is text/plain but mod is js_ or cs_, treat as js or css (#31)
header rewriter: ensure removed content-length and content-encoding are added back if no rewriting performed on response body
2016-07-27 21:34:58 -04:00
Ilya Kreymer
658303caad rewrite headers: undo not rewriting x- headers, needs more research and exclusions (eg. x-frame-options) 2016-04-26 13:11:08 -07:00
Ilya Kreymer
4a60e15577 cookie rewrite improvements: #177
- don't remove max-age and expires if in 'live' rewrite mode (flag set on urlrewriter)
- remove secure only if replay prefix is not https
- fix expires UTC->GMT as cookie parsing chokes on UTC
- other rewriting: don't append rewrite prefix to x- headers
tests: add more cookie rewriting tests
2016-04-26 09:45:23 -07:00
Ilya Kreymer
c5a166f601 tests: use httpbin.org instead of example.com/ for range-request test 2016-03-26 22:28:04 -04:00
Ilya Kreymer
3f734e1c98 tests: remove 3.2, fix auto_index test assert 2016-03-10 13:07:57 -08:00
Ilya Kreymer
1d5b23413f proxy: ensure proxy cert download sets content length
proxy options: 'use_default_coll' must specify exact default coll
(otherwise a random coll is chosen, as ordering is not defined)
travis: add py3.4, py3.5!
2016-02-23 18:09:09 -08:00
Ilya Kreymer
3a584a1ec3 py3: all tests pass, at last!
but not yet py2... need to resolve encoding in rewriting issues
2016-02-23 13:26:53 -08:00
Ilya Kreymer
1e54f8c8fa proxy: add tests for proxy-mode 'Pywb-Rewrite-Prefix' header which adds optional prefix to proxy mode rewrites.. ensures such rewrites always absolute to include the prefix 2015-12-29 16:10:23 -08:00
Ilya Kreymer
a25096968a proxy: ip resolver: show 500 error if incorrect coll preconfigured for ip-based settings (todo: make it configurable?) 2015-12-29 14:53:50 -08:00
Ilya Kreymer
381f350917 proxy: switching not available for ip resolver either
tests: update tests for auth and ip resolver to check that proxy magic is not set
2015-12-12 22:59:32 -08:00
Ilya Kreymer
7a0680fb35 memento: for not found timemap query, return empty timemap, instead of html query error page, closes #158 2015-11-30 09:40:07 -08:00
Ilya Kreymer
d98c1f6cf7 memento/api: add a new /collinfo.json end-point, enabled with 'enable_coll_info' config setting, which returns
the value fo collinfo.json template. Default template returns an entry for each handler route,
including the route path (id), title (name) and memento timegate and timemap paths, to be used with
an aggregator. Using a custom 'info_json' template can specify a different collinfo template, alternative to #69 (local aggregation)
Closes #146
2015-11-04 15:36:44 -08:00
Ilya Kreymer
bd2b5181a0 tests: add new tests for redis-based cache, #145 2015-10-30 13:18:58 -07:00
Ilya Kreymer
3132bfa7f4 cache: add a simple RedisCache implementation (alongside local and uwsgi)
proxy_ip_resolver: add option to use RedisCache if redis_cache_key set in config
proxy_ip_resolver: add 'delete' option to delete ip from cache, closes #145
2015-10-30 13:15:07 -07:00
Ilya Kreymer
eeb35ea3b4 proxy: add ProxyRouter wrapper to check for content-length and, if missing, perform full buffering (http1.0) or chunked encoding (http1.1) (separate from replay view buffering)
add tests for buffering and chunked encoding, fixes #143, also tests no banner url-rewrite only proxy related to #142
2015-10-25 18:02:51 -07:00
Ilya Kreymer
979fcaeda3 tests: fix mock YoutubeDLWrapper after refactor, #141 2015-10-23 12:19:15 -07:00
Ilya Kreymer
39e824cb3a live rewite proxy: decouple having http/https proxy from recording,
move youtubedl wrapper calls, metadata add calls to live rewrite proxy class for easier extension
closes #141 also improves #136
2015-10-23 11:57:12 -07:00
Ilya Kreymer
c7224ecceb tests: use proxy str directly (imrpove test cov) 2015-10-23 11:54:16 -07:00
Ilya Kreymer
4ba4521b56 tests: use random port instead of 8080 for cli test to avoid conflicts with running services 2015-10-23 11:53:28 -07:00
Ilya Kreymer
e37636de84 cdxindexer: if latest ujson (with forward slash not-escaping) is available, use that when indexing, closes #140
tests: update indexer CDXJ tests to be order-independent
travis: install ujson for testing
2015-10-22 17:46:05 -07:00
Ilya Kreymer
e249f300e3 tests refactor! init pywb once per module, instead of once per test
refactor common init pattern to server_mock for now (can add fixtures also)
2015-10-14 20:34:46 -07:00
Ilya Kreymer
b612c584de tests: test fixes for windows 2015-10-13 21:36:27 -07:00
Ilya Kreymer
6f7bd8c291 proxy resolvers: add tests for ip-based resolver
cache: default cache returns empty instead of raise KeyError on invalid key, to be consistent with uwsgi
2015-10-11 17:46:12 -07:00
Ilya Kreymer
31912b3bf7 proxy: update tests for new use_banner, use_client_rewrite options, #107 2015-09-09 13:22:32 -07:00
Ilya Kreymer
e1a9334a54 tests: update test to match cdx-convert 2015-08-25 23:06:00 +03:00
Ilya Kreymer
63c6efc851 autocolls test: patch wsgiref not waitress as it is default 2015-07-31 09:26:48 -07:00
Ilya Kreymer
f2a2c86552 tests: proxy check to ensure content-length header is always present in proxy mode 2015-07-30 11:06:44 -07:00
Ilya Kreymer
c2f99d6cfd replay/memento: always include 'Content-Location' for in no-redir mode replay (not just for memento timegate), #122 2015-07-19 00:11:25 -07:00
Ilya Kreymer
66f5ad62b3 memento: when redir_to_exact is false, don't redirect latest replay/timegate to current timestamp, but return directly latest capture.
when memento enabled, the timegate now follows memento pattern 2.2  (http://tools.ietf.org/html/rfc7089#section-4.2.2)
also return content-location instead of location, update memento no-redirect tests to match new behavior. closes #122
2015-07-18 23:30:31 -07:00
Ilya Kreymer
080587516b youtube-dl tests: use mock youtube-dl info for tests 2015-06-27 20:46:55 -07:00
Ilya Kreymer
f0359877f0 youtube-dl: remove from dependency, installation is optional. Return 404 if attempting live
proxy of videos and youtube-dl is not available (the only use case).
HTTPParser wrapping logic no longer needed in latest versions
Modify tests to only run if youtube-dl is installed in cases where it is not available #118
2015-06-27 16:11:59 -07:00
Ilya Kreymer
06fcc89de6 readers: support 'content-encoding: deflate' using different zlib decompression options
support default and alt settings for attempting to decompress deflate stream
tests: add tests with httpbin.org/deflate Fixes #115
2015-06-24 13:11:33 -07:00
Ilya Kreymer
7bf8b97cb0 tests: add tests for root collection access, and also a custom handler passed to pywb_init
(a simple redirect handler)
2015-04-17 11:48:50 -07:00
Ilya Kreymer
307809bbe9 live-rewrite-server: switch to 'inverse' frame mode by default,
switch from /rewrite/ to /live/ path, update tests
2015-04-13 13:00:06 -07:00