- rewrite headers after content to ensure content-length/content-encoding rewritten if content modified
- header rewriter: remove proxyrewriter, set default rule to 'prefix' or 'keep' if url rewriting or not
- set is_content_rw if record.content_stream(), assume content is modified
- add BufferedRewriter as base for dash, hls, amf rewriting which processes the full stream
- should_rw_content() determines if should attempt content rewriting
- support banner-only insert mode: added HTMLInsertOnlyRewriter, enable if no custom JS rules
- test: enable banner-only test mode
- rewriter interface accepts RewriteInfo instance
- add StreamingRewriter adapter wraps html, regex rewriters to support rewriting streaming text from general rewriter interface
- add RewriteDASH, RewriteHLS as (non-streaming) rewriters. Need to read contents into buffer (for now)
- add RewriteAMF experimental AMF rewriter
- general rewriting system in BaseContentRewriter, default rewriters configured in DefaultRewriter
- tests: disable banner-only test as not currently support banner only (for now)
- wb-manager added metadata now loaded dynamically, cached, for search and index pages (#196)
- metadata updated w/o restart (#87)
- per-collection template overrides and per-template static file support
tests: test_auto_colls.py fully ported to new system
(per-collection config.yaml no longer supported)
rewriterapp: add 'Content-Location' if fuzzy match, or if using memento
tests: fix test to check for Content-Location for fuzzy match instead of redirect
rewriter: refactor to use mixins to extend base rewriter (todo: more refactoring)
fuzzy-matcher: support for additional 'match_filters' to filter fuzzy results via optional regexes by mime type,
eg. allow more lenient fuzzy matching on DASH manifests than other resources (for now)
fuzzy-matching: add WebAgg-Fuzzy-Match response header if response is fuzzy matched, redirect to exact match in rewriterapp
frontendapp compatibility
- add support for separate not found page for 404s (not_found.html)
- support for exception handling with error template (error.html)
- support for home page (index.html)
- add memento headers for replay
- add referrer fallback check
- tests: port integration tests for front-end replay, cdx server
- not included: proxy mode, exact redirect mode, non-framed replay
- move unused tests to tests_disabled
- cli: add optional werkzeug profiler with --profile flag
the value fo collinfo.json template. Default template returns an entry for each handler route,
including the route path (id), title (name) and memento timegate and timemap paths, to be used with
an aggregator. Using a custom 'info_json' template can specify a different collinfo template, alternative to #69 (local aggregation)
Closes#146
cdx field renaming: canonical cdx field name changes
statuscode -> status
mimetype -> mime
original -> url
old names still accept for query/filtering, however, cdx json will use new names
ensures consistency between .cdxj field names and names used by cdx server json output
collections manager now creates .cdxj by default
bump version to 0.9.0b2!
cdx obj: allow alt field names to be used (eg. mime, mimetype, m)
(status/statuscode/s) in querying and reading cdx
cdx minimal: (#75) now implies cdxj to avoid more formats
minimal includes digest always and mime when warc/revisit
tests for cdxj loading
indexing optimization: reuse same entry obj for records of same type
frames: add request_ts to wbinfo and use that as the timestamp in the top-frame. for exact replay, request_ts == timestamp
for latest replay / no timestamp / memento timegate, redirect to current time instead of time of last capture, while serving
last capture.
timeutils: add timestamp_now() function to return timestamp of current datetime
Add extra tests for this mode
Tracked via #72
add distinct to_iri() and to_uri() functions in WbUrl
internal representation is always as ascii uri
for rewriting, defaults to iri representation unless
'rewrite_ascii_only_urls' is set to true per collection
add wbrequest.get_url() to get url as either iri or uri to be passed
to templates
instead of default error.html
'not_found_html' settable in the config per collection, as per #65
for not found index query, still use query.html but add condition to check for 0 results
add more query and replay not found
remove unused conditional (for search_view -- always exists)
add vi_ modifier:
-on record, gets video_info from youtube-dl, sends to proxy,
if any, via PUTMETA to create metadata record
-on playback, fetches special metadata record with video info and
returns to client as json
-vidrw script: fetches video info, if any, and attempts to replace
iframe and embed tags (so far) which are videos
wombat: export extract_url function, fix spaces and use object instance
semantics
canonical, no-modifier archival url (instead of mp_).
When using frames, the page redirects to a 'tf_' page, which then uses
replaceHistory() to change url back to canonical form.
memento: support for framed replay, include memento headers in top frame
bump version to 0.6.2
will run (this was cumbersome to maintain and not really useful)
ReferRedirect just checks that the current request host header, if present, matches that of the referrer
and checks that the coll and script name match.
* removed proxy_pac as it was also unneeded/unused and required use of the hostpaths
* added test for invalid CONNECT usage (405 response)
remove BaseContentView, move top-frame functionality to SearchPageWbUrlHandler
remove RewriteLiveView, fold functionality into the handler
move default mod setting into RewriteContent
cleanup rewrite_content/fetch_request api to take a full wb_url
add content-length to responses whenever possible (WbResponse) and static files
bump version to 0.5.2
remove BaseContentView, move top-frame functionality to SearchPageWbUrlHandler
remove RewriteLiveView, fold functionality into the handler
move default mod setting into RewriteContent
cleanup rewrite_content/fetch_request api to take a full wb_url
add content-length to responses whenever possible (WbResponse) and static files
bump version to 0.5.2
refactor poute request parsing to happen in the actual router class instead of in the route
in proxy mode, add support for picking a route via proxy-auth
improve test for 'top' rewriting
split replay view into BaseContentView and ReplayView
refactor RewriteLiveHandler into RewriteLiveView
add additional tests for framed and non-framed mode
default to framed replay!