- frontendapp.py: restored the pulling out of collection route creation into its own function
- rewriterapp.py: reformated file and added documentation
utils:
- geventserver.py: added documentation
- wbexception.py: updated documentation
- split out the ui customization documentation into its own file ui-customization.rst
- added initial documentation covering the new template setup to the ui-customization.rst
- formatted them according to project
- query.js: ensured correct timestamp to date function is used
templates:
- head_insert.html: is_framed check is no longer a string it is a boolean, corrected redirect check
tests:
- test_html_rewriter.py: added missing rewrite modifier test checking i.style containing a background image html encoded
warcserver:
- added missing quote_plus import and cleaned up imports
- enabled with 'transclusions: 2' (default) config option
- legacy flash-supporting transclusions script (still working) available via 'transclusions: 1' or enable_flash_video_rewrite option
- add transclusions.js with support for poster image
- legacy vidrw: don't add undefined url as source
- locatization: wrap text in not_found.html to be translatable
- add base.html template with head, header, footer optional customizations
- refactor all top-level templates to extend base.html, except frame_insert.html
- localization: add placeholder support for jinja2 localization extension, '{% trans %}' and _('') tags, placeholder null localization
- refactor new query UI to support localization
- update some text to match localized versions used in ukwa-pywb, update test
- xmlqueryindexsource: fix typo, improve tests to be more clear with url encoding
- exceptions: move UpstreamException and AppNotFound to wbexceptions
- docker: ensure sample_archive is added to Dockerfile still
- yaml: use python Loader to support custom intrepolation of env vars
- content rewrite: ensure custom exceptions passed up to frontendapp
The OpenWayback reference implementation of this API relies on doubly-escaped queries. This change should bring this implementation into line with OutbackCDX and OWB's original API.
fuzzy match limit: add 'fuzzy_search_limit' option to default_filters in rules.yaml
default fuzzy matching search limit to 100 results to avoid timeouts for large result sets that don't have any matches
- ensure timemap returns full url-m warcserver supports 'memento_format' param which, if present, specifies
full format to use for memento links in timemap
- memento tests: timemap tests include full url-m, test both framed and frameless timemap responses
- fix timemap in 'redirect-to-exact' mode, (ensure timegate redirect condition applies only to top-frame)
- tests: add additional timemap tests, with and without exact redirect
- optimize 'wb-manager acl match' command to not load entire file before matching
- acl match <coll_or_file): if 'coll_or_file' exists as file, use it, don't check if auto-collection exist
- don't parse json on every aclj line until key prefix matches, resulting in speed boost!
- convert aclj to dict (via cdxobject) only when match is found (disable aggregator source tracking)
- support memento timegate on top-frame (when no timestamp is provided)
- treat top-frame no-timestamp url as canonical timegate
- tests: update tests, add memento redirect mode tests for timegate, timegate with accept-dt header
- ensure lxml-enabled parsing in XmlQueryIndexSource works by passing the raw bytestring instead of unicode text to the parser
- tests: add lxml and non-lxml parsing tests to test_xmlquery_indexsource.py, add lxml to test install
- misc fixes: fix typo in banner.html, update gevent api to support latest gevent
- store original wsgi SCRIPT_NAME (before collection path is pushed)
- add 'static_prefix' jinja env global which defaults to original prefix + /static/
- update existing templates to use '{{ static_prefix }}' instead of '{{ host_prefix }}/{{ static_path }''
- set 'pywb.host_prefix' via rewriterapp, set 'static_prefix' to absolute url if available (to support proxy mode)
- add AppPageNotFound() exception to differntiate app-level not found path from replay content not found
- add custom error messages for collectino not found and static file not found
tests: add tests for collection not found and static file not found errors
- fix proxy mode when 'redirect_to_exact=True' is set config, don't redirect in proxy mode
- more general prefer support, moved to content_rewriter to support preference<->mod mappings
- add 'banner-only' preference mapped to bn_ modifier
- proxy mode: allow 'raw' and 'banner-only' preferences
- proxy mode: 'Prefer: rewritten' forced to 'banner-only', served with 'Preference-Applied: banner-only'
- tests: test proxy with prefer header, 'redirect_to_exact=True', add 'banner-only' to Prefer header tests in rewriting mode
- support Prefer on top-frame url in framed mode, Prefer check runs before custom response
- update Prefer test fixtures to test framed vs frameless and no-mod vs mp_ modifier, all combinations
- 'enable_prefer: true' in config can be used to enable experimental Memento Prefer behavior
- Prefer header support both redirect and non-redirect style negotiation, extending existing Memento patterns
- Prefer header can be applied both on memento and timegate endpoints
- for redirect style negotiation, Prefer results in a redirect to final memento (if needed), both on Timegate and URL-M (Memento Pattern 2.3)
- for non-redirect style negotiation (Memento Pattern 2.2), Prefer header affects content being served and changes the Content-Location to the canonical representation
- Vary: Prefer and Preference-Applied headers always added to URL-M and Timegate responses
- use WbException throughout, only catch HTTPException from werkzeug routing
- only apply refer redirect check for 404 not found errors
- xmlquery index: log unexpected exceptions, treat missing element as not found
- support as target an auto-collection, where acl file added automatically in ./collections/<coll>/acl/access-rules.aclj
or specifying an .aclj explicitly for more custom configs
- support adding urls and surts, determine if url is already a surt, otherwise canonicalize
acl commands include:
- acl add <target_file_or_coll> <url_or_surt> <access> -- add (or replace) rule for url/surt with access level <access>
- acl remove <target_filr_or_coll> <url_or_surt> -- remove url/surt from target
- acl list <target_file_or_coll> -- list all rules for target
- acl validate <target_file_or_coll> -- ensure sort order is correct, otherwise fix and save
- acl match <target_file_or_coll> <url> -- find matching rule, if any, in target for specified url, or print no match/default rule
- acl importtxt <target_file_or_coll> <filename> -- bulk import of 'excludes.txt' style rules, one url-per-line and add to target
- 'acl_paths' config can accept a list of files or directories, a file or a directory string
- tests_acl: test collection with acl list, single file, dir
- .aclj files contain access controls in reverse sorted, CDXJ-like format
- ./sample_archive/acl contains sample acl files
- directory and single-file acl sources (extend directory aggregator and file index source)
- tests for longest-prefix acl match
- tests for acl applied to collection
- pywb.utils.merge -- merge(..., reverse=True) support for py2.7 (backported from py3.5)
- acl types:
* allow - all allowed
* block - allowed in index (as blocked) but content not allowed, served as 451
* exclude - removed from index and content, served as 404
- warcserver: AccessChecker inited if 'acl_paths' specified in custom collections
- exceptions:
* clean up wbexception, subclasses provide the status code, message loaded automatically
* warcserver handles AccessException with json response (now with 451 status)
* pass status to template to allow custom handling
- 'ba_' - for <base> rewriting
- 'je_' - 'javascript-embed' default for client-side rewriting in wombat
better modifiers for css rewriting (server and client):
- 'ce_' - 'css-embed' for any url() embeds in CSS
- 'cs_' - for css stylesheet @import rewriting/other .css