Use JSON values in query string for JSON request bodies (#893 )

This commit also adds a more complicated JSON test case that is also in warcio.js to ensure parity. Treat numbers like JavaScript's Number.prototype.toString() by dropping decimal from floats if they represent whole number.
requirements: Adjust installation of Py3AMF module. (#920 )
2025-03-15 00:03:28 +01:00 · 2024-11-13 14:07:35 -08:00 · 2024-11-07 12:09:35 -05:00 · 2024-04-26 10:32:43 +02:00 · 2024-04-26 10:26:56 +02:00 · 2024-04-26 10:21:03 +02:00
67 changed files with 1053 additions and 271 deletions
--- a/.github/workflows/ci.yaml
+++ b/.github/workflows/ci.yaml
@ -8,7 +8,7 @@ jobs:
    strategy:
      max-parallel: 3
      matrix:
-        python-version: ['3.7', '3.8', '3.9', '3.10']
+        python-version: ['3.7', '3.8', '3.9', '3.10', '3.11']
    steps:
      - name: checkout
--- a/.gitignore
+++ b/.gitignore
@ -53,3 +53,7 @@ git_hash.py
 # Sphinx documentation
 docs/_build/*
 # virtualenvs
 env/
 venv/
--- a/CHANGES.rst
+++ b/CHANGES.rst
@ -1,3 +1,19 @@
 pywb 2.7.3 changelist
 ~~~~~~~~~~~~~~~~~~~~~
 * issue_792 catch warcio exception by @oskarhek in https://github.com/webrecorder/pywb/pull/793
 * Add ui.logo_home_url as config.yaml option by @tw4l in https://github.com/webrecorder/pywb/pull/791
 * [#795] Show error when adding duplicate warc file by @kuechensofa in https://github.com/webrecorder/pywb/pull/797
 * Make search page more intuitive by @krakan in https://github.com/webrecorder/pywb/pull/794
 * Modify search template buttons by @tw4l in https://github.com/webrecorder/pywb/pull/801
 * [#804] Use default_locale when lang not set in the request by @krakan in https://github.com/webrecorder/pywb/pull/805
 * feat: regex substitution on surt rules match by @mijho in https://github.com/webrecorder/pywb/pull/780
 * Bump minimatch from 3.0.4 to 3.1.2 in /pywb/vueui by @dependabot in https://github.com/webrecorder/pywb/pull/777
 * Bump decode-uri-component from 0.2.0 to 0.2.2 in /pywb/vueui by @dependabot in https://github.com/webrecorder/pywb/pull/786
 * rules: add 'debugNoBatch' rewrite for fb and insta by @ikreymer in https://github.com/webrecorder/pywb/pull/806
 * Vue main order by @tw4l in https://github.com/webrecorder/pywb/pull/809
 * wombat: bump to 3.4.4 https://github.com/webrecorder/pywb/pull/808
 pywb 2.7.2 changelist
 ~~~~~~~~~~~~~~~~~~~~~
@ -1165,7 +1181,7 @@ pywb 0.9.6 changelist
 pywb 0.9.5 changelist
 ~~~~~~~~~~~~~~~~~~~~~
-* s3 loading: support ``s3://`` scheme in block loader, allowing for loading index and archive files from s3. ``boto`` library must be installed seperately
+* s3 loading: support ``s3://`` scheme in block loader, allowing for loading index and archive files from s3. ``boto`` library must be installed separately
  via ``pip install boto``. Attempt default boto auth path, and if that fails, attempt anonymous s3 connection.
 * Wombat/Client-Side Rewrite Customizations: New ``rewrite_opts.client`` settings from ``config.yaml`` are passed directly to wombat as json. 
@ -1261,7 +1277,7 @@ pywb 0.9.1 changelist
 * cdx server query: add support for ``url=*.host`` and ``url=host/*`` as shortcuts for ``matchType=domain`` and ``matchType=prefix``
-* zipnum cdx cluster: support loading index shared from prefix path instead of seperate location file.
+* zipnum cdx cluster: support loading index shared from prefix path instead of separate location file.
  The ``shard_index_loc`` config property may contain match and replace properties.
  Regex replacement is then used to obtain path prefix from the shard prefix path.
@ -1627,7 +1643,7 @@ pywb 0.4.7 changelist
 * Rewrite: Parsing of html as raw bytes instead of decode/encode, detection still needed for non-ascii compatible encoding.
-* Indexing: Refactoring of cdx-indexer using a seperate 'archive record iterator' and pluggable cdx writer classes. Groundwork for creating custom indexers.
+* Indexing: Refactoring of cdx-indexer using a separate 'archive record iterator' and pluggable cdx writer classes. Groundwork for creating custom indexers.
 * Indexing: Support for 9 field cdx formats with -9 flag.
--- a/README.rst
+++ b/README.rst
@ -1,4 +1,4 @@
-Webrecorder pywb 2.7
+Webrecorder pywb 2.8
 ====================
 .. image:: https://raw.githubusercontent.com/webrecorder/pywb/main/pywb/static/pywb-logo.png
@ -13,7 +13,7 @@ Web Archiving Tools for All
 `View the full pywb documentation <https://pywb.readthedocs.org>`_
-**pywb** is a Python (2 and 3) web archiving toolkit for replaying web archives large and small as accurately as possible.
+**pywb** is a Python 3 web archiving toolkit for replaying web archives large and small as accurately as possible.
 The toolkit now also includes new features for creating high-fidelity web archives.
 This toolset forms the foundation of Webrecorder project, but also provides a generic web archiving toolkit
@ -60,9 +60,7 @@ Installation for Deployment
 To install pywb for usage, you can use:
-```shell
+``pip install pywb``
 pip install pywb
 ```
 Note: depending on your Python installation, you may have to use `pip3` instead of `pip`.
@ -70,9 +68,7 @@ Note: depending on your Python installation, you may have to use `pip3` instead
 Installation from local copy
 ----------------------------
-```shell
+``git clone https://github.com/webrecorder/pywb``
 git clone https://github.com/webrecorder/pywb
 ```
 To install from a locally cloned copy, install with ``pip install -e .`` or ``python setup.py install``.
--- a/build-vue-ui.sh
+++ b/build-vue-ui.sh
@ -3,4 +3,5 @@
 CURR_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
 cd $CURR_DIR/pywb/vueui/
 yarn install
 yarn run build
--- a/config.yaml
+++ b/config.yaml
@ -6,9 +6,11 @@ debug: true
 # Uncomment to set banner colors and logo
 # ui:
  # logo: path/relative/from/static/logo.png
  # logo_home_url: https://example.com
  # navbar_background_hex: 0c49b0
  # navbar_color_hex: fff
  # navbar_light_buttons: true
  # disable_printing: true
 collections:
    all: $all
--- a/docs/manual/access-control.rst
+++ b/docs/manual/access-control.rst
@ -105,6 +105,12 @@ Given these rules, a user would:
 * but would receive an 'access blocked' error message when viewing ``http://httpbin.org/`` (block)
 * would receive a 404 not found error when viewing ``http://httpbin.org/anything`` (exclude)
 To match any possible URL in an .aclj file, set ``*,`` as the leading SURT, for example::
  *, - {"access": "allow"}
 Lines starting with ``*,`` should generally be at the end of the file, respecting the reverse alphabetical order.
 Access Types: allow, block, exclude, allow_ignore_embargo
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -149,6 +155,10 @@ To make this work, pywb must be running behind an Apache or Nginx system that is
 For example, this header may be set based on IP range, or based on password authentication.
 To allow a user access to all URLs, overriding more specific rules and the ``default_access`` configuration setting, use the ``*,`` SURT::
  *, - {"access": "allow", "user": "staff"}
 Further examples of how to set this header will be provided in the deployments section.
 **Note: Do not use the user-based rules without configuring proper authentication on an Apache or Nginx frontend to set or remove this header, otherwise the 'X-Pywb-ACL-User' can easily be faked.**
--- a/docs/manual/apps.rst
+++ b/docs/manual/apps.rst
@ -46,6 +46,7 @@ It can be used to:
 * Create a new collection --  ``wb-manager init <coll>``
 * Add WARCs to collection -- ``wb-manager add <coll> <warc>``
 * Unpack WACZs to add their WARCs and indices to collection -- ``wb-manager add --unpack-wacz <coll> <wacz>``
 * Add override templates
 * Add and remove metadata to a collections ``metadata.yaml``
 * List all collections
--- a/docs/manual/usage.rst
+++ b/docs/manual/usage.rst
@ -95,8 +95,8 @@ add the WARC to a new collection and start pywb:
      docker pull webrecorder/pywb
      docker run -e INIT_COLLECTION=my-web-archive -v /pywb-data:/webarchive \
-         -v /path/to:/source webrecorder/pywb wb-manager add default /path/to/my_warc.warc.gz
+         -v /path/to:/source webrecorder/pywb wb-manager add my-web-archive /source/my_warc.warc.gz
-      docker run -p 8080:8080 -v /pywb-data/:/webarchive wayback
+      docker run -p 8080:8080 -v /pywb-data/:/webarchive webrecorder/pywb wayback
 This example is equivalent to the non-Docker example above.
@ -114,6 +114,8 @@ Using Existing Web Archive Collections
 Existing archives of WARCs/ARCs files can be used with pywb with minimal amount of setup. By using ``wb-manager add``,
 WARC/ARC files will automatically be placed in the collection archive directory and indexed.
 In pywb 2.8.0 and later, preliminary support for WACZ files is also added with ``wb-manager add --unpack-wacz``. This will unpack the provided WACZ file, adding its WARCs and indices to the collection.
 By default ``wb-manager``, places new collections in ``collections/<coll name>`` subdirectory in the current working directory. To specify a different root directory, the ``wb-manager -d <dir>``. Other options can be set in the config file.
 If you have a large number of existing CDX index files, pywb will be able to read them as well after running through a simple conversion process.
@ -154,20 +156,20 @@ To enable auto-indexing, run with ``wayback -a`` or ``wayback -a --auto-interval
 Creating a Web Archive
 ----------------------
-Using Webrecorder
+Using ArchiveWeb.page
-^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^
-If you do not have a web archive to test, one easy way to create one is to use `Webrecorder <https://webrecorder.io>`_
+If you do not have a web archive to test, one easy way to create one is to use the `ArchiveWeb.page <https://archiveweb.page>`_ browser extension for Chrome and other Chromium-based browsers such as Brave Browser. ArchiveWeb.page records pages visited during an archiving session in the browser, and provides means of both replaying and downloading the archived items created.
-After recording, you can click **Stop** and then click `Download Collection` to receive a WARC (`.warc.gz`) file.
+Follow the instructions in `How To Create Web Archives with ArchiveWeb.page <https://archiveweb.page/en/usage/>`_. After recording, press **Stop** and then `download your collection <https://archiveweb.page/en/download/>`_ to receive a WARC (`.warc.gz`) file. If you choose to download your collection in the WACZ format, the WARC files can be found inside the zipped WACZ in the ``archive/`` directory.
-You can then use this with work with pywb.
+You can then use your WARCs to work with pywb.
 Using pywb Recorder
 ^^^^^^^^^^^^^^^^^^^
-The core recording functionality in Webrecorder is also part of :mod:`pywb`. If you want to create a WARC locally, this can be
+Recording functionality is also part of :mod:`pywb`. If you want to create a WARC locally, this can be
 done by directly recording into your pywb collection:
 1. Create a collection: ``wb-manager init my-web-archive`` (if you haven't already created a web archive collection)
@ -180,6 +182,14 @@ In this configuration, the indexing happens every 10 seconds.. After 10 seconds,
 ``http://localhost:8080/my-web-archive/http://example.com/``
 Using Browsertrix
 ^^^^^^^^^^^^^^^^^
 For a more automated browser-based web archiving experience, `Browsertrix <https://browsertrix.com/>`_ provides a web interface for configuring, scheduling, running, reviewing, and curating crawls of web content. Crawl activity is shown in a live screencast of the browsers used for crawling and all web archives created in Browsertrix can be easily downloaded from the application in the WACZ format.
 `Browsertrix Crawler <https://crawler.docs.browsertrix.com/>`_, which provides the underlying crawling functionality of Browsertrix, can also be run standalone in a Docker container on your local computer.
 HTTP/S Proxy Mode Access
 ------------------------
--- a/docs/manual/vue-ui.rst
+++ b/docs/manual/vue-ui.rst
@ -53,6 +53,36 @@ For example, to use the file ``./static/my-logo.png`` as the logo, set:
    logo: my-logo.png
 Logo URL
 ^^^^^^^^
 It is possible to configure the logo to link to any URL by setting ``ui.logo_home_url`` in ``config.yml`` to the URL of your choice.
 If omitted, the logo will not link to any page.
 For example, to have the logo redirect to ``https://example.com/web-archive-landing-page``, set:
 .. code:: yaml
  ui:
    logo_home_url: https://example.com/web-archive-landing-page
 Printing
 ^^^^^^^^
 As of pywb 2.8, the replay header includes a print button that prints the contents of the replay iframe.
 This button can be disabled by setting ``ui.disable_printing`` in ``config.yaml`` to any value.
 For example:
 .. code:: yaml
  ui:
    disable_printing: true
 Banner Colors
 ^^^^^^^^^^^^^
--- a/pywb/apps/frontendapp.py
+++ b/pywb/apps/frontendapp.py
@ -1,7 +1,7 @@
 from gevent.monkey import patch_all; patch_all()
 from werkzeug.routing import Map, Rule, RequestRedirect, Submount
-from werkzeug.wsgi import pop_path_info
+from wsgiref.util import shift_path_info
 from six.moves.urllib.parse import urljoin, parse_qsl
 from six import iteritems
 from warcio.utils import to_native_str
@ -108,6 +108,7 @@ class FrontEndApp(object):
        self.templates_dir = config.get('templates_dir', 'templates')
        self.static_dir = config.get('static_dir', 'static')
        self.static_prefix = config.get('static_prefix', 'static')
        self.default_locale = config.get('default_locale', '')
        metadata_templ = os.path.join(self.warcserver.root_dir, '{coll}', 'metadata.yaml')
        self.metadata_cache = MetadataCache(metadata_templ)
@ -433,7 +434,11 @@ class FrontEndApp(object):
            cdx_url += 'limit=' + str(self.query_limit)
        try:
-            res = requests.get(cdx_url, stream=True)
+            headers = {}
            for key in environ.keys():
                if key.startswith("HTTP_X_"):
                    headers[key[5:].replace("_", "-")] = environ[key]
            res = requests.get(cdx_url, stream=True, headers=headers)
            status_line = '{} {}'.format(res.status_code, res.reason)
            content_type = res.headers.get('Content-Type')
@ -553,9 +558,9 @@ class FrontEndApp(object):
            return
        if coll != '$root':
-            pop_path_info(environ)
+            shift_path_info(environ)
            if record:
-                pop_path_info(environ)
+                shift_path_info(environ)
        paths = [self.warcserver.root_dir]
@ -598,7 +603,7 @@ class FrontEndApp(object):
        and message.
        :param dict environ: The WSGI environment dictionary for the request
-        :param str err_type: The identifier for type of error that occured
+        :param str err_type: The identifier for type of error that occurred
        :param str url: The url of the archived page that was requested
        """
        raise AppPageNotFound(err_type, url)
@ -664,8 +669,12 @@ class FrontEndApp(object):
            lang = args.pop('lang', '')
            if lang:
-                pop_path_info(environ)
+                shift_path_info(environ)
            if lang:
                environ['pywb_lang'] = lang
            elif self.default_locale:
                environ['pywb_lang'] = self.default_locale
            response = endpoint(environ, **args)
--- a/pywb/apps/rewriterapp.py
+++ b/pywb/apps/rewriterapp.py
@ -64,7 +64,7 @@ class RewriterApp(object):
        if not jinja_env:
            jinja_env = JinjaEnv(globals={'static_path': 'static'},
-                                 extensions=['jinja2.ext.i18n', 'jinja2.ext.with_'])
+                                 extensions=['jinja2.ext.i18n'])
            jinja_env.jinja_env.install_null_translations()
        self.jinja_env = jinja_env
--- a/pywb/indexer/cdxindexer.py
+++ b/pywb/indexer/cdxindexer.py
@ -1,5 +1,9 @@
 import logging
 import os
 import sys
 import traceback
 import warcio
 # Use ujson if available
 try:
@ -298,8 +302,11 @@ def write_multi_cdx_index(output, inputs, **options):
                with open(fullpath, 'rb') as infile:
                    entry_iter = record_iter(infile)
-                    for entry in entry_iter:
+                    try:
-                        writer.write(entry, filename)
+                        for entry in entry_iter:
                            writer.write(entry, filename)
                    except warcio.exceptions.ArchiveLoadFailed:
                        logging.error('Error while indexing file %s, %s',filename,traceback.format_exc())
        return writer
@ -377,7 +384,7 @@ url timestamp { ... }
    output_help = """
 Output file or directory.
- If directory, each input file is written to a seperate output file
+- If directory, each input file is written to a separate output file
  with a .cdx extension
 - If output is '-', output is written to stdout
 """
--- a/pywb/manager/aclmanager.py
+++ b/pywb/manager/aclmanager.py
@ -102,11 +102,11 @@ class ACLManager(CollectionsManager):
        except IOError as io:
            if must_exist:
-                print('Error Occured: ' + str(io))
+                print('Error Occurred: ' + str(io))
            return False
        except Exception as e:
-            print('Error Occured: ' + str(e))
+            print('Error Occurred: ' + str(e))
            return False
    def save_acl(self, r=None):
--- a/pywb/manager/manager.py
+++ b/pywb/manager/manager.py
@ -5,12 +5,16 @@ import logging
 import heapq
 import yaml
 import re
 import gzip
 import six
 import pathlib
 from distutils.util import strtobool
 from pkg_resources import resource_string, get_distribution
 from argparse import ArgumentParser, RawTextHelpFormatter
 from tempfile import mkdtemp, TemporaryDirectory
 from zipfile import ZipFile
 from pywb.utils.loaders import load_yaml_config
 from warcio.timeutils import timestamp20_now
@ -47,6 +51,9 @@ directory structure expected by pywb
    COLLS_DIR = 'collections'
    WARC_RX = re.compile(r'.*\.w?arc(\.gz)?$')
    WACZ_RX = re.compile(r'.*\.wacz$')
    def __init__(self, coll_name, colls_dir=None, must_exist=True):
        colls_dir = colls_dir or self.COLLS_DIR
        self.default_config = load_yaml_config(DEFAULT_CONFIG)
@ -115,19 +122,142 @@ directory structure expected by pywb
                   'To create a new collection, run\n\n{1} init {0}')
            raise IOError(msg.format(self.coll_name, sys.argv[0]))
-    def add_warcs(self, warcs):
+    def add_archives(self, archives, unpack_wacz=False):
        if not os.path.isdir(self.archive_dir):
            raise IOError('Directory {0} does not exist'.
                          format(self.archive_dir))
-        full_paths = []
+        invalid_archives = []
-        for filename in warcs:
+        warc_paths = []
-            filename = os.path.abspath(filename)
+        for archive in archives:
-            shutil.copy2(filename, self.archive_dir)
+            if self.WARC_RX.match(archive):
-            full_paths.append(os.path.join(self.archive_dir, filename))
+                full_path = self._add_warc(archive)
-            logging.info('Copied ' + filename + ' to ' + self.archive_dir)
+                if full_path:
                    warc_paths.append(full_path)
            elif self.WACZ_RX.match(archive):
                if unpack_wacz:
                    self._add_wacz_unpacked(archive)
                else:
                    raise NotImplementedError('Adding waczs without unpacking is not yet implemented. Use '
                                              '\'--unpack-wacz\' flag to add the wacz\'s content.')
            else:
                invalid_archives.append(archive)
-        self._index_merge_warcs(full_paths, self.DEF_INDEX_FILE)
+        self._index_merge_warcs(warc_paths, self.DEF_INDEX_FILE)
        if invalid_archives:
            logging.warning(f'Invalid archives weren\'t added: {", ".join(invalid_archives)}')
    def _rename_warc(self, warc_basename):
        dupe_idx = 1
        ext = ''.join(pathlib.Path(warc_basename).suffixes)
        pre_ext_name = warc_basename.split(ext)[0]
        while True:
            new_basename = f'{pre_ext_name}-{dupe_idx}{ext}'
            if not os.path.exists(os.path.join(self.archive_dir, new_basename)):
                break
            dupe_idx += 1
        return new_basename
    def _add_warc(self, warc):
        warc_source = os.path.abspath(warc)
        source_dir, warc_basename = os.path.split(warc_source)
        # don't overwrite existing warcs with duplicate names
        if os.path.exists(os.path.join(self.archive_dir, warc_basename)):
            warc_basename = self._rename_warc(warc_basename)
            logging.info(f'Warc {os.path.basename(warc)} already exists - renamed to {warc_basename}.')
        warc_dest = os.path.join(self.archive_dir, warc_basename)
        shutil.copy2(warc_source, warc_dest)
        logging.info(f'Copied {warc} to {self.archive_dir} as {warc_basename}')
        return warc_dest
    def _add_wacz_unpacked(self, wacz):
        wacz = os.path.abspath(wacz)
        temp_dir = mkdtemp()
        warc_regex = re.compile(r'.+\.warc(\.gz)?$')
        cdx_regex = re.compile(r'.+\.cdx(\.gz)?$')
        with ZipFile(wacz, 'r') as wacz_zip_file:
            archive_members = wacz_zip_file.namelist()
            warc_files = [file for file in archive_members if warc_regex.match(file)]
            if not warc_files:
                logging.warning(f'WACZ {wacz} does not contain any warc files.')
                return
            # extract warc files
            for warc_file in warc_files:
                wacz_zip_file.extract(warc_file, temp_dir)
            cdx_files = [file for file in archive_members if cdx_regex.match(file)]
            if not cdx_files:
                logging.warning(f'WACZ {wacz} does not contain any indices.')
                return
            for cdx_file in cdx_files:
                wacz_zip_file.extract(cdx_file, temp_dir)
        # copy extracted warc files to collections archive dir, use wacz filename as filename with added index if
        # multiple warc files exist
        warc_filename_mapping = {}
        full_paths = []
        for idx, extracted_warc_file in enumerate(warc_files):
            _, warc_ext = os.path.splitext(extracted_warc_file)
            if warc_ext == '.gz':
                warc_ext = '.warc.gz'
            warc_filename = os.path.basename(wacz)
            warc_filename, _ = os.path.splitext(warc_filename)
            warc_filename = f'{warc_filename}-{idx}{warc_ext}'
            warc_destination_path = os.path.join(self.archive_dir, warc_filename)
            if os.path.exists(warc_destination_path):
                warc_filename = self._rename_warc(warc_filename)
                logging.info(f'Warc {warc_destination_path} already exists - renamed to {warc_filename}.')
                warc_destination_path = os.path.join(self.archive_dir, warc_filename)
            warc_filename_mapping[os.path.basename(extracted_warc_file)] = warc_filename
            shutil.copy2(os.path.join(temp_dir, extracted_warc_file), warc_destination_path)
            full_paths.append(warc_destination_path)
        # rewrite filenames in wacz indices and merge them with collection index file
        for cdx_file in cdx_files:
            self._add_wacz_index(os.path.join(self.indexes_dir, self.DEF_INDEX_FILE), os.path.join(temp_dir, cdx_file),
                                 warc_filename_mapping)
        # delete temporary files
        shutil.rmtree(temp_dir)
    def _add_wacz_index(self, collection_index_path, wacz_index_path, filename_mapping):
        from pywb.warcserver.index.cdxobject import CDXObject
        # rewrite wacz index to temporary index file
        tempdir = TemporaryDirectory()
        wacz_index_name = os.path.basename(wacz_index_path)
        rewritten_index_path = os.path.join(tempdir.name, wacz_index_name)
        with open(rewritten_index_path, 'w') as rewritten_index:
            if wacz_index_path.endswith('.gz'):
                wacz_index = gzip.open(wacz_index_path, 'rb')
            else:
                wacz_index = open(wacz_index_path, 'rb')
            for line in wacz_index:
                cdx_object = CDXObject(cdxline=line)
                if cdx_object['filename'] in filename_mapping:
                    cdx_object['filename'] = filename_mapping[cdx_object['filename']]
                rewritten_index.write(cdx_object.to_cdxj())
        if not os.path.isfile(collection_index_path):
            shutil.move(rewritten_index_path, collection_index_path)
            return
        temp_coll_index_path = collection_index_path + '.tmp.' + timestamp20_now()
        self._merge_indices(collection_index_path, rewritten_index_path, temp_coll_index_path)
        shutil.move(temp_coll_index_path, collection_index_path)
        tempdir.cleanup()
    def reindex(self):
        cdx_file = os.path.join(self.indexes_dir, self.DEF_INDEX_FILE)
@ -180,20 +310,24 @@ directory structure expected by pywb
        merged_file = temp_file + '.merged'
-        last_line = None
+        self._merge_indices(cdx_file, temp_file, merged_file)
        with open(cdx_file, 'rb') as orig_index:
            with open(temp_file, 'rb') as new_index:
                with open(merged_file, 'w+b') as merged:
                    for line in heapq.merge(orig_index, new_index):
                        if last_line != line:
                            merged.write(line)
                            last_line = line
        shutil.move(merged_file, cdx_file)
        #os.rename(merged_file, cdx_file)
        os.remove(temp_file)
    @staticmethod
    def _merge_indices(index1, index2, dest):
        last_line = None
        with open(index1, 'rb') as index1_f:
            with open(index2, 'rb') as index2_f:
                with open(dest, 'wb') as dest_f:
                    for line in heapq.merge(index1_f, index2_f):
                        if last_line != line:
                            dest_f.write(line)
                            last_line = line
    def set_metadata(self, namevalue_pairs):
        metadata_yaml = os.path.join(self.curr_coll_dir, 'metadata.yaml')
        metadata = None
@ -373,16 +507,23 @@ Create manage file based web archive collections
    listcmd = subparsers.add_parser('list', help=list_help)
    listcmd.set_defaults(func=do_list)
-    # Add Warcs
+    # Add Warcs or Waczs
    def do_add(r):
        m = CollectionsManager(r.coll_name)
-        m.add_warcs(r.files)
+        m.add_archives(r.files, r.unpack_wacz)
-    addwarc_help = 'Copy ARCS/WARCS to collection directory and reindex'
+    add_archives_help = 'Copy ARCs/WARCs to collection directory and reindex'
-    addwarc = subparsers.add_parser('add', help=addwarc_help)
+    add_unpack_wacz_help = 'Copy WARCs from WACZ to collection directory and reindex'
-    addwarc.add_argument('coll_name')
+    add_archives = subparsers.add_parser('add', help=add_archives_help)
-    addwarc.add_argument('files', nargs='+')
+    add_archives.add_argument(
-    addwarc.set_defaults(func=do_add)
+        '--unpack-wacz',
        dest='unpack_wacz',
        action='store_true',
        help=add_unpack_wacz_help
    )
    add_archives.add_argument('coll_name')
    add_archives.add_argument('files', nargs='+')
    add_archives.set_defaults(func=do_add)
    # Reindex All
    def do_reindex(r):
--- a/pywb/rewrite/html_rewriter.py
+++ b/pywb/rewrite/html_rewriter.py
@ -268,7 +268,7 @@ class HTMLRewriterMixin(StreamingRewriter):
        unesc_value = self.try_unescape(value)
        rewritten_value = self.url_rewriter.rewrite(unesc_value, mod, force_abs)
-        # if no rewriting has occured, ensure we return original, not reencoded value
+        # if no rewriting has occurred, ensure we return original, not reencoded value
        if rewritten_value == value:
            return orig_value
@ -668,7 +668,7 @@ class HTMLRewriter(HTMLRewriterMixin, HTMLParser):
        if self.parse_comments:
            #data = self._rewrite_script(data)
-            # Rewrite with seperate HTMLRewriter
+            # Rewrite with separate HTMLRewriter
            comment_rewriter = HTMLRewriter(self.url_rewriter,
                                            defmod=self.defmod)
--- a/pywb/rewrite/regex_rewriters.py
+++ b/pywb/rewrite/regex_rewriters.py
@ -124,9 +124,7 @@ if (!self.__WB_pmw) {{ self.__WB_pmw = function(obj) {{ this.__WB_source = obj;
            (r'(?<![$.])\s*\blocation\b\s*[=]\s*(?![=])', self.add_suffix(check_loc), 0),
            # rewriting 'return this'
            (r'\breturn\s+this\b\s*(?![.$])', self.replace_str(this_rw), 0),
-            # rewriting 'this.' special properties access on new line, with ; prepended
+            # rewriting 'this.' special properties access
            (r'\n\s*this\b(?=(?:\.(?:{0})\b))'.format(prop_str), self.replace_str(';' + this_rw), 0),
            # rewriting 'this.' special properties access, not on new line (no ;)
            (r'(?<![$.])\s*this\b(?=(?:\.(?:{0})\b))'.format(prop_str), self.replace_str(this_rw), 0),
            # rewrite '= this' or ', this'
            (r'(?<=[=,])\s*this\b\s*(?![:.$])', self.replace_str(this_rw), 0),
--- a/pywb/rewrite/templateview.py
+++ b/pywb/rewrite/templateview.py
@ -5,7 +5,7 @@ from pywb.utils.loaders import load
 from six.moves.urllib.parse import urlsplit, quote
-from jinja2 import Environment, TemplateNotFound, contextfunction, select_autoescape
+from jinja2 import Environment, TemplateNotFound, pass_context, select_autoescape
 from jinja2 import FileSystemLoader, PackageLoader, ChoiceLoader
 from webassets.ext.jinja2 import AssetsExtension
@ -139,7 +139,7 @@ class JinjaEnv(object):
            return loc_map.get(loc)
        def override_func(jinja_env, name):
-            @contextfunction
+            @pass_context
            def get_override(context, text):
                translate = get_translate(context)
                if not translate:
@ -158,7 +158,7 @@ class JinjaEnv(object):
        # Special _Q() function to return %-encoded text, necessary for use
        # with text in banner
-        @contextfunction
+        @pass_context
        def quote_gettext(context, text):
            translate = get_translate(context)
            if not translate:
@ -171,14 +171,14 @@ class JinjaEnv(object):
        self.jinja_env.globals['_Q'] = quote_gettext
        self.jinja_env.globals['default_locale'] = default_locale
-        @contextfunction
+        @pass_context
        def switch_locale(context, locale):
            environ = context.get('env')
            curr_loc = environ.get('pywb_lang', '')
            request_uri = environ.get('REQUEST_URI', environ.get('PATH_INFO'))
-            if curr_loc:
+            if curr_loc and request_uri.startswith('/' + curr_loc + '/'):
                return request_uri.replace(curr_loc, locale, 1)
            app_prefix = environ.get('pywb.app_prefix', '')
@ -188,7 +188,7 @@ class JinjaEnv(object):
            return app_prefix + '/' + locale + request_uri
-        @contextfunction
+        @pass_context
        def get_locale_prefixes(context):
            environ = context.get('env')
            locale_prefixes = {}
@ -196,11 +196,11 @@ class JinjaEnv(object):
            orig_prefix = environ.get('pywb.app_prefix', '')
            coll = environ.get('SCRIPT_NAME', '')
-            if orig_prefix:
+            if orig_prefix and coll.startswith(orig_prefix):
                coll = coll[len(orig_prefix):]
            curr_loc = environ.get('pywb_lang', '')
-            if curr_loc:
+            if curr_loc and coll.startswith('/' + curr_loc):
                coll = coll[len(curr_loc) + 1:]
            for locale in loc_map.keys():
--- a/pywb/rewrite/test/test_regex_rewriters.py
+++ b/pywb/rewrite/test/test_regex_rewriters.py
@ -143,7 +143,7 @@ r"""
 'var foo = _____WB$wombat$check$this$function_____(this).location'
 >>> _test_js_obj_proxy('A = B\nthis.location = "foo"')
-'A = B\n;_____WB$wombat$check$this$function_____(this).location = "foo"'
+'A = B\n_____WB$wombat$check$this$function_____(this).location = "foo"'
 >>> _test_js_obj_proxy('var foo = this.location2')
 'var foo = this.location2'
--- a/pywb/rules.yaml
+++ b/pywb/rules.yaml
@ -110,7 +110,7 @@ rules:
      fuzzy_lookup:
        match: '("(?:cursor|cursorindex)":["\d\w]+)'
-        find_all: true
+        re_type: findall
    - url_prefix: 'com,facebook)/ajax/pagelet/generic.php/profiletimeline'
      fuzzy_lookup: 'com,facebook\)/.*[?&](__adt=[^&]+).*[&]data=(?:.*?(?:[&]|(profile_id|pagelet_token)[^,]+))'
@ -175,7 +175,7 @@ rules:
      fuzzy_lookup:
        match: '("q[\d]+":|after:\\"[^"]+)'
-        find_all: true
+        re_type: findall
    - url_prefix: 'com,facebook)/pages_reaction_units/more'
@ -196,6 +196,9 @@ rules:
              group: 1
              function: 'pywb.rewrite.rewrite_dash:rewrite_fb_dash'
            - match: '"debugNoBatching\s?":(?:false|0)'
              replace: '"debugNoBatching":true'
        parse_comments: true
    - url_prefix: 'com,facebook'
@ -227,6 +230,9 @@ rules:
            - match: '"is_dash_eligible":true'
              replace: '"is_dash_eligible":false'
            - match: '"debugNoBatching\s?":(?:false|0)'
              replace: '"debugNoBatching":true'
      fuzzy_lookup: '()'
@ -538,6 +544,12 @@ rules:
      rewrite:
        js_rewrite_location: urls
    - url_prefix: 'com,example)/matched'
      fuzzy_lookup:
        re_type: sub
        match: 'matched'
        replace: 'replaced'          
    # all domain rules -- fallback to this dataset
    #=================================================================
    # Applies to all urls -- should be last
--- a/pywb/static/calendar-icon.png
+++ b/pywb/static/calendar-icon.png
--- a/pywb/static/query.js
+++ b/pywb/static/query.js
@ -956,11 +956,11 @@ RenderCalendar.prototype.niceDateRange = function() {
  var from = this.queryInfo.searchParams.from;
  var to = this.queryInfo.searchParams.to;
  if (from && to) {
-    return 'From ' + from + ' to ' + to;
+    return [text.from, from, text.until, to].join(' ');
  } else if (from) {
-    return 'From ' + from + ' until ' + 'present';
+    return [text.from, from, text.until, text.present].join(' ');
  }
-  return 'From earliest until ' + to;
+  return [text.from, text.earliest, text.until, to].join(' ');
 };
 /**
--- a/pywb/static/search.js
+++ b/pywb/static/search.js
@ -14,17 +14,34 @@ var elemIds = {
  },
  dateTime: {
    from: 'dt-from',
    fromTime: 'ts-from',
    fromBad: 'dt-from-bad',
    to: 'dt-to',
    toTime: 'ts-to',
    toBad: 'dt-to-bad'
  },
  match: 'match-type-select',
  url: 'search-url',
  form: 'search-form',
  resultsNewWindow: 'open-results-new-window',
-  advancedOptions: 'advanced-options'
+  advancedOptions: 'advanced-options',
  resetSearchForm: 'reset-search-form',
 };
 function resetSearchForm(event) {
  for (const field of [
    elemIds.url,
    elemIds.match,
    elemIds.dateTime.from,
    elemIds.dateTime.fromTime,
    elemIds.dateTime.to,
    elemIds.dateTime.toTime,
  ]) {
    document.getElementById(field).value = '';
  }
  clearFilters(event);
 }
 function makeCheckDateRangeChecker(dtInputId, dtBadNotice) {
  var dtInput = document.getElementById(dtInputId);
  dtInput.onblur = function() {
@ -138,11 +155,13 @@ function performQuery(url) {
  }
  var fromT = document.getElementById(elemIds.dateTime.from).value;
  if (fromT) {
-    query.push('from=' + fromT.trim());
+    fromT += document.getElementById(elemIds.dateTime.fromTime).value;
    query.push('from=' + fromT.replace(/[^0-9]/g, ''));
  }
  var toT = document.getElementById(elemIds.dateTime.to).value;
  if (toT) {
-    query.push('to=' + toT.trim());
+    toT += document.getElementById(elemIds.dateTime.toTime).value;
    query.push('to=' + toT.replace(/[^0-9]/g, ''));
  }
  var builtQuery = query.join('&');
  if (document.getElementById(elemIds.resultsNewWindow).checked) {
@ -188,6 +207,7 @@ $(document).ready(function() {
    elemIds.dateTime.to,
    document.getElementById(elemIds.dateTime.toBad)
  );
  document.getElementById(elemIds.resetSearchForm).onclick = resetSearchForm;
  document.getElementById(elemIds.filtering.add).onclick = addFilter;
  document.getElementById(elemIds.filtering.clear).onclick = clearFilters;
  var searchURLInput = document.getElementById(elemIds.url);
@ -195,9 +215,6 @@ $(document).ready(function() {
  form.addEventListener('submit', function(event) {
    submitForm(event, form, searchURLInput);
  });
  document.getElementById(elemIds.advancedOptions).onclick = function() {
    validateFields(form);
  }
  var filteringExpression = document.getElementById(elemIds.filtering.expression);
  filteringExpression.addEventListener("keypress", function(event) {
    if (event.key === "Enter") {
--- a/pywb/static/timeline-icon.png
+++ b/pywb/static/timeline-icon.png
--- a/pywb/static/vue/vueui.js
+++ b/pywb/static/vue/vueui.js
--- a/pywb/static/wombat.js
+++ b/pywb/static/wombat.js
--- a/pywb/static/wombatProxyMode.js
+++ b/pywb/static/wombatProxyMode.js
@ -1,6 +1,6 @@
 /*
 Wombat.js client-side rewriting engine for web archive replay
-Copyright (C) 2014-2020 Webrecorder Software, Rhizome, and Contributors. Released under the GNU Affero General Public License.
+Copyright (C) 2014-2024 Webrecorder Software, Rhizome, and Contributors. Released under the GNU Affero General Public License.
 This file is part of wombat.js, see https://github.com/webrecorder/wombat.js for the full source
 Wombat.js is part of the Webrecorder project (https://github.com/webrecorder)
--- a/pywb/static/wombatWorkers.js
+++ b/pywb/static/wombatWorkers.js
@ -1,6 +1,6 @@
 /*
 Wombat.js client-side rewriting engine for web archive replay
-Copyright (C) 2014-2020 Webrecorder Software, Rhizome, and Contributors. Released under the GNU Affero General Public License.
+Copyright (C) 2014-2024 Webrecorder Software, Rhizome, and Contributors. Released under the GNU Affero General Public License.
 This file is part of wombat.js, see https://github.com/webrecorder/wombat.js for the full source
 Wombat.js is part of the Webrecorder project (https://github.com/webrecorder)
--- a/pywb/static/zoom-out-icon-333316.png
+++ b/pywb/static/zoom-out-icon-333316.png
--- a/pywb/templates/error.html
+++ b/pywb/templates/error.html
@ -3,7 +3,7 @@
 {% block body %}
 <div class="container text-danger error">
    <div class="row justify-content-center">
-        <h2 class="display-2">Pywb Error</h2>
+        <h2 class="display-2">{{ _('Pywb Error') }}</h2>
    </div>
    <div class="row">
        <div class="col-12 text-center">
--- a/pywb/templates/frame_insert.html
+++ b/pywb/templates/frame_insert.html
@ -25,8 +25,21 @@ html, body
 <div id="app" style="width: 100%; height: 200px"></div>
 <script>
-    VueUI.main("{{ static_prefix }}", "{{ url }}", "{{ wb_prefix }}", "{{ timestamp }}", "{{ ui.logo }}", "{{ ui.navbar_background_hex | default('f8f9fa') }}", "{{ ui.navbar_color_hex | default('212529') }}", "{{ ui.navbar_light_buttons }}", "{{ env.pywb_lang | default('en') }}",
+  VueUI.main({
-      allLocales, i18nStrings);
+      staticPrefix: "{{ static_prefix }}",
      url: "{{ url }}",
      prefix: "{{ wb_prefix }}",
      timestamp: "{{ timestamp }}",
      logoUrl: "{{ ui.logo }}",
      navbarBackground: "{{ ui.navbar_background_hex | default('f8f9fa') }}",
      navbarColor: "{{ ui.navbar_color_hex | default('212529') }}",
      navbarLightButtons: "{{ ui.navbar_light_buttons }}",
      logoHomeUrl: "{{ ui.logo_home_url }}",
      disablePrinting: "{{ ui.disable_printing }}",
      allLocales: allLocales
    },
    "{{ env.pywb_lang | default('en') }}",
    i18nStrings);
 </script>
 <div id="wb_iframe_div">
--- a/pywb/templates/instructions.html
+++ b/pywb/templates/instructions.html
@ -0,0 +1,216 @@
 <div class="modal fade" id="searchInstructions" tabindex="-1" role="dialog" aria-labelledby="searchInstructionsTitle" aria-hidden="true">
    <div class="modal-dialog modal-lg" role="document">
        <div class="modal-content">
            <div class="modal-header">
                <h6 class="modal-title text-muted" id="searchInstructionsTitle">{{ _("Search instructions") }}</h6>
                <button type="button" class="close" data-dismiss="modal" aria-label="{{ _('Close') }}">
                    <span aria-hidden="true">&times;</span>
                </button>
            </div>
            <div class="modal-body">
                <h5>{{ _("URL") }}</h5>
                <table class="table table-hover table-condensed">
                    <tr>
                        <td>
                            <p>
                                {%trans%}A URL consists of several parts:{%endtrans%}
                                {%trans%}<code>protocol</code>://<code>host</code>:<code>port</code>/<code>path</code>?<code>query</code>{%endtrans%}
                            </p>
                            <p>
                                {%trans%}The <code>protocol://</code> prefix is ignored when searching as it's not part of the searchable data.{%endtrans%}
                            </p>
                            <p>
                                {%trans%}A leading <kbd>www.</kbd> in the <code>host</code> will also be ignored for the same reason.{%endtrans%}
                            </p>
                            <p>
                                {%trans%}The <code>host</code> contains one or more parts separated by periods (<kbd>.</kbd>).{%endtrans%}
                                {%trans%}The part before the first period is called the <code>hostname</code>.{%endtrans%}
                                {%trans%}The part after the last period is the <code>top level domain</code>.{%endtrans%}
                                {%trans%}Every part added to the left of the top level domain <code>sub-domain</code>.{%endtrans%}
                                {%trans%}I.e. <code>x.y.z</code> is a <code>sub-domain</code> of <code>y.z</code>{%endtrans%}
                                {%trans%}which in turn is a <code>sub-domain</code> of the <code>top level domain</code> <code>z</code>{%endtrans%}
                            </p>
                            <p>
                                {%trans%}See <em>Match Type</em> below for interpretations of the search string.{%endtrans%}
                            </p>
                        </td>
                    </tr>
                </table>
                <h5>{{ _("Results Display") }}</h5>
                <table class="table table-hover table-condensed">
                    <tr>
                        <td>
                            <p>
                                {%trans%}For the <em>Default</em> search mode, the results are shown in a calendar view unless a filter is also added.{%endtrans%}
                                {%trans%}For all other cases the results will be displayed in a list.{%endtrans%}
                            </p>
                        </td>
                    </tr>
                </table>
                <h5>{{ _("Search Options") }}</h5>
                <h6>{{ _("Match Type") }}</h6>
                <p> {{ _("There are four different search modes:") }}</p>
                <table class="table table-hover table-condensed">
                    <tr>
                        <td><em>{{ _("Default") }}</em></td>
                        <td>
                            <p>
                                {%trans%}In the default mode the exact URL (minus the ignored prefixes mentioned above) is searched for.{%endtrans%}
                                {%trans%}If one leading or trailing wildcard asterisk (<kbd>*</kbd>) is added, see <em>Prefix</em> and <em>Domain</em> below.{%endtrans%}
                            </p>
                            <p class="text-muted">
                                {%trans%}Any other asterisks will be considered literal parts of the search string.{%endtrans%}
                                {%trans%}Hence, adding both a leading and a trailing wildcard asterisk is not possible.{%endtrans%}
                            </p>
                            {%trans%}Example:{%endtrans%}
                            <p class="ml-5 text-lowercase">
                                <em>{{ _("URL") }}: <strong>https://http.cat/206</strong></em> &amp; <em>{{ _("Match Type") }}: <strong>{{ _("Default") }}</strong></em>
                                <span class="float-right">
                                    <button onclick="fillForm('search-url=https://http.cat/206&match-type-select=');" class="btn btn-outline-info" role="button" aria-label="{{ _('Fill') }}">{{ _('Fill') }}</button>
                                    <button onclick="fillForm('search-url=https://http.cat/206&match-type-select=', true);" class="btn btn-outline-primary" role="button" aria-label="{{ _('Search') }}">{{ _('Search') }}</button>
                                </span>
                            </p>
                        </td>
                    </tr>
                    <tr>
                        <td><em>{{ _("Prefix") }}</em></td>
                        <td>
                            <p>
                                {%trans%}This will return all URL:s that begin with the given string.{%endtrans%}
                                {%trans%}It returns the same results as <em>Default</em> with a trailing wildcard asterisk.{%endtrans%}
                            </p>
                            {%trans%}Examples:{%endtrans%}
                            <p class="ml-5 text-lowercase">
                                <em>{{ _("URL") }}: <strong>https://http.cat/2</strong></em> &amp; <em>{{ _("Match Type") }}: <strong>{{ _("Prefix") }}</strong></em>
                                <span class="float-right">
                                    <button onclick="fillForm('search-url=https://http.cat/2&match-type-select=prefix');" class="btn btn-outline-info" role="button" aria-label="{{ _('Fill') }}">{{ _('Fill') }}</button>
                                    <button onclick="fillForm('search-url=https://http.cat/2&match-type-select=prefix', true);" class="btn btn-outline-primary" role="button" aria-label="{{ _('Search') }}">{{ _('Search') }}</button>
                                </span>
                            </p>
                            <p class="ml-5 text-lowercase">
                                <em>{{ _("URL") }}: <strong>https://http.cat/2*</strong></em> &amp; <em>{{ _("Match Type") }}: <strong>{{ _("Default") }}</strong></em>
                                <span class="float-right">
                                    <button onclick="fillForm('search-url=https://http.cat/2*&match-type-select=');" class="btn btn-outline-info" role="button" aria-label="{{ _('Fill') }}">{{ _('Fill') }}</button>
                                    <button onclick="fillForm('search-url=https://http.cat/2*&match-type-select=', true);" class="btn btn-outline-primary" role="button" aria-label="{{ _('Search') }}">{{ _('Search') }}</button>
                                </span>
                            </p>
                        </td>
                    </tr>
                    <tr>
                        <td><em>{{ _("Host") }}</em></td>
                        <td>
                            <p>
                                {%trans%}This will ignore any path and query parts of the URL and return all URL:s with the specified <code>host</code> part.{%endtrans%}
                            </p>
                            {%trans%}Example:{%endtrans%}
                            <p class="ml-5 text-lowercase">
                                <em>{{ _("URL") }}: <strong>https://http.cat/</strong></em> &amp; <em>{{ _("Match Type") }}: <strong>{{ _("Host") }}</strong></em>
                                <span class="float-right">
                                    <button onclick="fillForm('search-url=https://http.cat/&match-type-select=host');" class="btn btn-outline-info" role="button" aria-label="{{ _('Fill') }}">{{ _('Fill') }}</button>
                                    <button onclick="fillForm('search-url=https://http.cat/&match-type-select=host', true);" class="btn btn-outline-primary" role="button" aria-label="{{ _('Search') }}">{{ _('Search') }}</button>
                                </span>
                            </p>
                        </td>
                    </tr>
                    <tr>
                        <td><em>{{ _("Domain") }}</em></td>
                        <td>
                            <p>
                                {%trans%}This is similar to the previous but doesn't require the whole <code>host</code>.{%endtrans%}
                                {%trans%}It returns the same results as <em>Default</em> with a leading wildcard asterisk and a period (i.e. <kbd>*.</kbd>).{%endtrans%}
                                {%trans%}The leading wildcard matches zero or more <code>sub-domains</code> as well as zero or one <code>hostname</code>.{%endtrans%}
                            </p>
                            {%trans%}Examples:{%endtrans%}
                            <p class="ml-5 text-lowercase">
                                <em>{{ _("URL") }}: <strong>cat/</strong></em> &amp; <em>{{ _("Match Type") }}: <strong>{{ _("Domain") }}</strong></em>
                                <span class="float-right">
                                    <button onclick="fillForm('search-url=cat/&match-type-select=domain');" class="btn btn-outline-info" role="button" aria-label="{{ _('Fill') }}">{{ _('Fill') }}</button>
                                    <button onclick="fillForm('search-url=cat/&match-type-select=domain', true);" class="btn btn-outline-primary" role="button" aria-label="{{ _('Search') }}">{{ _('Search') }}</button>
                                </span>
                            </p>
                            <p class="ml-5 text-lowercase">
                                <em>{{ _("URL") }}: <strong>*.cat/</strong></em> &amp; <em>{{ _("Match Type") }}: <strong>{{ _("Default") }}</strong></em>
                                <span class="float-right">
                                    <button onclick="fillForm('search-url=*.cat/&match-type-select=');" class="btn btn-outline-info" role="button" aria-label="{{ _('Fill') }}">{{ _('Fill') }}</button>
                                    <button onclick="fillForm('search-url=*.cat/&match-type-select=', true);" class="btn btn-outline-primary" role="button" aria-label="{{ _('Search') }}">{{ _('Search') }}</button>
                                </span>
                            </p>
                        </td>
                    </tr>
                </table>
                <h6>{{ _("Date/Time Range") }}</h6>
                <table class="table table-hover table-condensed">
                    <tr>
                        <td>
                            <p>
                                {%trans%}One may specify a start and/or an end timestamp to further restrict the search - both are inclusive.{%endtrans%}
                                {%trans%}The timestamps consist of a date and an optional time of day.{%endtrans%}
                                {%trans%}The layout of these input fields are subject to which browser is used.{%endtrans%}
                            </p>
                            {%trans%}Example:{%endtrans%}
                            <p class="ml-5 text-lowercase">
                                <em>{{ _("URL") }}: <strong>https://http.cat/2</strong></em> &amp; <em>{{ _("Match Type") }}: <strong>{{ _("Prefix") }}</strong></em> &amp; <em>{{ _("From") }}: <strong>2022-02-02 09:00</strong></em>
                                <span class="float-right">
                                    <button onclick="fillForm('search-url=https://http.cat/2&match-type-select=prefix&dt-from=2022-02-02&ts-from=09:00');" class="btn btn-outline-info" role="button" aria-label="{{ _('Fill') }}">{{ _('Fill') }}</button>
                                    <button onclick="fillForm('search-url=https://http.cat/2&match-type-select=prefix&dt-from=2022-02-02&ts-from=09:00', true);" class="btn btn-outline-primary" role="button" aria-label="{{ _('Search') }}">{{ _('Search') }}</button>
                                </span>
                            </p>
                        </td>
                    </tr>
                </table>
                <h6>{{ _("Filtering") }}</h6>
                <table class="table table-hover table-condensed">
                    <tr>
                        <td>
                            <p>
                                {%trans%}Finally one may add extra filters for Mime Type, Status and URL.{%endtrans%}
                                {%trans%}For each filter one needs to specify one of the three attributes, one of a set of relations and a string.{%endtrans%}
                                {%trans%}If more than one filter is added, they will all be applied to the list of results.{%endtrans%}
                            </p>
                            <p class="text-muted">{%trans%}Remember to actually add the filter before submitting the search.{%endtrans%}</p>
                            {%trans%}Example:{%endtrans%}
                            <p class="ml-5 text-lowercase">
                                <em>{{ _("URL") }}: <strong>https://http.cat/2/</strong></em> &amp; <em>{{ _("Match Type") }}: <strong>{{ _("Prefix") }}</strong></em> &amp; <em>{{ _("Filtering") }}: <strong>{{ _("HTTP Status") }} {{ _("Is Not") }} "301"</strong></em>
                                <span class="float-right">
                                    <button onclick="fillForm('search-url=https://http.cat/2&match-type-select=prefix&filter-by=status&filter-modifier==!=&filter-expression=301');" class="btn btn-outline-info" role="button" aria-label="{{ _('Fill') }}">{{ _('Fill') }}</button>
                                    <button onclick="fillForm('search-url=https://http.cat/2&match-type-select=prefix&filter-by=status&filter-modifier==!=&filter-expression=301', true);" class="btn btn-outline-primary" role="button" aria-label="{{ _('Search') }}">{{ _('Search') }}</button>
                                </span>
                            </p>
                        </td>
                    </tr>
                </table>
            </div>
        </div>
    </div>
 </div>
 <script>
  function fillForm(query, search = false) {
    $('#searchInstructions').modal('hide');
    $('#advancedOptions').collapse('show');
    for (const item of query.split('&')) {
      var pair = item.split('=');
      var field = document.getElementById(pair[0]);
      if (field) field.value = pair.slice(1).join('=');
      if (pair[0] == "filter-expression") addFilter(event);
    }
    if (search) $('#search-button').click();
  }
 </script>
--- a/pywb/templates/query.html
+++ b/pywb/templates/query.html
@ -69,6 +69,10 @@
        'host': "{{ _('host') }}",
        'domain': "{{ _('domain') }}",
      },
      from: "{{ _('From') }}",
      until: "{{ _('until') }}",
      present: "{{ _('present') }}",
      earliest: "{{ _('earliest') }}",
    };
  var filterMods = {
@ -90,8 +94,21 @@
 <div id="app" style="width: 100%; height: 100%"></div>
 <script>
-  VueUI.main("{{ static_prefix }}", "{{ url }}", "{{ prefix }}", undefined, "{{ ui.logo }}", "{{ ui.navbar_background_hex | default('f8f9fa') }}", "{{ ui.navbar_color_hex | default('212529') }}", "{{ ui.navbar_light_buttons }}", "{{ env.pywb_lang | default('en') }}",
+  VueUI.main({
-      allLocales, i18nStrings);
+      staticPrefix: "{{ static_prefix }}",
      url: "{{ url }}",
      prefix: "{{ prefix }}",
      timestamp: undefined,
      logoUrl: "{{ ui.logo }}",
      navbarBackground: "{{ ui.navbar_background_hex | default('f8f9fa') }}",
      navbarColor: "{{ ui.navbar_color_hex | default('212529') }}",
      navbarLightButtons: "{{ ui.navbar_light_buttons }}",
      logoHomeUrl: "{{ ui.logo_home_url }}",
      disablePrinting: "{{ ui.disable_printing }}",
      allLocales: allLocales
    },
    "{{ env.pywb_lang | default('en') }}",
    i18nStrings);
 </script>
 {% endif %}
--- a/pywb/templates/search.html
+++ b/pywb/templates/search.html
@ -31,15 +31,20 @@
    <form class="needs-validation" id="search-form" novalidate>
        <div class="form-row">
            <div class="col-12">
-                <label for="search-url" class="lead" aria-label="Search For Col">
+                <label for="search-url" class="lead" aria-label="{{ _('Search Collection') }}">
                    {% set coll_title = metadata.title if metadata and metadata.title else coll %}
                    {% autoescape false %}
                    {% trans %}Search the {{ coll_title }} collection by url:{% endtrans %}
                    {% endautoescape %}
                </label>
-                <input aria-label="url" aria-required="true" class="form-control form-control-lg" id="search-url"
+                <a tabindex="0" class="btn btn-sm float-right btn-light" role="button" data-toggle="modal" data-target="#searchInstructions">{{ _('Help') }}</a>
            </div>
        </div>
        <div class="form-row">
            <div class="col-12">
                <input aria-label="{{ _('URL') }}" aria-required="true" class="form-control form-control-lg" id="search-url"
                       name="search" placeholder="{{ _('Enter a URL to search for') }}"
-                       title="{{ _('Enter a URL to search for') }}" type="search" required/>
+                       title="{{ _('Enter a URL to search for') }}" type="search" required autofocus />
                <div class="invalid-feedback">
                    {% trans %}Please enter a URL{% endtrans %}
                </div>
@ -53,23 +58,26 @@
                </div>
            </div>
            <div class="col-7">
-                <button type="submit" class="btn btn-outline-primary float-right" role="button" aria-label="Search">
+                <button type="submit" id="search-button" class="btn btn-primary float-right" role="button" aria-label="{{ _('Search') }}">
                    {% trans %}Search{% endtrans %}
                </button>
-                <button class="btn btn-outline-info float-right mr-3" type="button" role="button"
+                <button class="btn btn-outline-secondary float-right mr-3" type="button" role="button"
                        data-toggle="collapse" data-target="#advancedOptions" id="advanced-options"
-                        aria-expanded="false" aria-controls="advancedOptions" aria-label="Advanced Search Options">
+                        aria-expanded="false" aria-controls="advancedOptions" aria-label="{{ _('Search Options') }}">
-                    {{ _('Advanced Search Options') }}
+                    {{ _('Search Options') }}
                </button>
                <button id="reset-search-form" class="btn btn-outline-danger float-right mr-3" type="button" role="button" aria-label="{{ _('Reset Options') }}">
                    {{ _('Reset') }}
                </button>
            </div>
        </div>
        <div class="collapse mt-3" id="advancedOptions">
            <div class="form-group form-row">
-                <label for="match-type-select" class="col-sm-2 col-form-label" aria-label="Match Type">
+                <label for="match-type-select" class="col-sm-2 col-form-label" aria-label="{{ _('Match Type') }}">
                    {{ _('Match Type:') }}
                </label>
                <select id="match-type-select" class="form-control form-control col-sm-6">
-                    <option value=""></option>
+                    <option value="">{% trans %}Default{% endtrans %}</option>
                    <option value="prefix">{% trans %}Prefix{% endtrans %}</option>
                    <option value="host">{% trans %}Host{% endtrans %}</option>
                    <option value="domain">{% trans %}Domain{% endtrans %}</option>
@ -77,57 +85,43 @@
            </div>
            <p style="cursor: help;">
               <span data-toggle="tooltip" data-placement="right"
-                     title="Restricts the results to the given date/time range (inclusive)">
+                     title="{{ _('Restricts the results to the given date/time range (inclusive)') }}">
                   {{ _('Date/Time Range') }}
                </span>
            </p>
            <div class="form-row">
                <div class="col-6">
-                    <label class="sr-only" for="dt-from" aria-label="Date/Time Range From">{% trans %}From:{% endtrans %}</label>
+                    <label class="sr-only" for="dt-from" aria-label="{{ _('Date/Time Range From') }}">{% trans %}From:{% endtrans %}</label>
                    <div class="input-group">
                        <div class="input-group-prepend">
                            <div class="input-group-text">{% trans %}From:{% endtrans %}</div>
                        </div>
-                        <input id="dt-from" type="number" name="date-range-from" class="form-control"
+                        <input id="dt-from" type="date" placeholder="yyyy-mm-dd" name="date-range-from" class="form-control">
-                               pattern="^\d{4,14}$">
+                        <input id="ts-from" type="time" placeholder="hh:mm:ss" name="date-range-from-ts" class="form-control">
                        <div class="invalid-feedback" id="dt-from-bad">
                            {% trans %}Please enter a valid <b>From</b> timestamp. Timestamps may be 4 <= ts <=14 digits{% endtrans %}
                        </div>
                    </div>
                </div>
                <div class="col-6">
-                    <label class="sr-only" for="dt-to" aria-label="Date/Time Range To">{% trans %}To:{% endtrans %}</label>
+                    <label class="sr-only" for="dt-to" aria-label="{{ _('Date/Time Range To') }}">{% trans %}To:{% endtrans %}</label>
                    <div class="input-group">
                        <div class="input-group-prepend">
                            <div class="input-group-text">{% trans %}To:{% endtrans %}</div>
                        </div>
-                        <input id="dt-to" type="number" name="date-range-to" class="form-control" pattern="^\d{4,14}$">
+                        <input id="dt-to" type="date" placeholder="yyyy-mm-dd" name="date-range-to" class="form-control">
-                        <div class="invalid-feedback" id="dt-to-bad">
+                        <input id="ts-to" type="time" placeholder="hh:mm:ss" name="date-range-to-ts" class="form-control">
                            {% trans %}Please enter a valid <b>To</b> timestamp. Timestamps may be 4 <= ts <=14 digits{% endtrans %}
                        </div>
                    </div>
                </div>
            </div>
            <div class="form-group mt-3">
                <div class="form-row">
-                    <div class="col-6">
+                    <div class="col-12">
                        <p>{% trans %}Filtering{% endtrans %}</p>
                    </div>
                    <div class="col-6">
                        <button id="clear-filters" class="btn btn-outline-warning float-right" type="button">
                            {% trans %}Clear Filters{% endtrans %}
                        </button>
                        <button id="add-filter" class="btn btn-outline-secondary float-right mr-2" type="button">
                            {% trans %}Add Filter{% endtrans %}
                        </button>
                    </div>
                </div>
                <div class="form-row">
                    <div class="col-6">
                        <div class="row pb-1">
                            <label for="filter-by" class="col-form-label col-3">{% trans %}By:{% endtrans %}</label>
                            <select id="filter-by" class="form-control col-7">
                                <option value="" selected></option>
                                <option value="mime">{% trans %}Mime Type{% endtrans %}</option>
                                <option value="status">{% trans %}Status{% endtrans %}</option>
                                <option value="url">{% trans %}URL{% endtrans %}</option>
@ -144,17 +138,24 @@
                                <option value="=!~">{% trans %}Does Not Begin With{% endtrans %}</option>
                            </select>
                        </div>
-                        <div class="row">
+                        <div class="row pb-1">
                            <label for="filter-expression" class="col-form-label col-3">{% trans %}Expr:{% endtrans %}</label>
                            <input type="text" id="filter-expression" class="form-control col-7"
                                   placeholder="{% trans %}Enter an expression to filter by{% endtrans %}"
                            >
                        </div>
                            <button id="add-filter" class="btn btn-outline-secondary mt-2" type="button">
                                {% trans %}Add Filter{% endtrans %}
                            </button>
                    </div>
                    <div class="col-6">
                        <ul id="filter-list" class="filter-list">
                            <li id="filtering-nothing">{% trans %}No Filter{% endtrans %}</li>
                        </ul>
                        <button id="clear-filters" class="btn btn-outline-danger float-right mr-2" type="button">
                            {% trans %}Clear Filters{% endtrans %}
                        </button>
                    </div>
                </div>
            </div>
@ -192,4 +193,5 @@
        </div>
    </div>
 {% endif %}
 {% include "instructions.html" %}
 {% endblock %}
--- a/pywb/templates/vue_loc.html
+++ b/pywb/templates/vue_loc.html
@ -49,6 +49,7 @@
        "Hide calendar":"{{ _Q('Hide calendar') }}",
        "Previous capture":"{{ _Q('Previous capture') }}",
        "Next capture":"{{ _Q('Next capture') }}",
        "Print":"{{ _Q('Print') }}",
        "Select language":"{{ _Q('Select language') }}",
        "View capture on {date}":"{{ _Q('View capture on {date}') }}",
        "{count} capture":"{{ _Q('{count} capture') }}",
--- a/pywb/utils/binsearch.py
+++ b/pywb/utils/binsearch.py
@ -150,7 +150,7 @@ def iter_exact(reader, key, token=b' '):
    """
    Create an iterator which iterates over lines where the first field matches
    the 'key', equivalent to token + sep prefix.
-    Default field termin_ator/seperator is ' '
+    Default field termin_ator/separator is ' '
    """
    return iter_prefix(reader, key + token)
--- a/pywb/version.py
+++ b/pywb/version.py
@ -1,4 +1,4 @@
-__version__ = '2.7.2'
+__version__ = '2.8.3'
 if __name__ == '__main__':
    print(__version__)
--- a/pywb/vueui/src/App.vue
+++ b/pywb/vueui/src/App.vue
@ -4,9 +4,12 @@
    <nav
      class="navbar navbar-light navbar-expand-lg fixed-top top-navbar justify-content-center"
      :style="navbarStyle">
-      <a class="navbar-brand flex-grow-1 my-1" href="/">
+      <a class="navbar-brand flex-grow-1 my-1" :href="config.logoHomeUrl" v-if="config.logoHomeUrl">
        <img :src="config.logoImg" id="logo-img" alt="_('pywb logo')">
      </a>
      <div class="navbar-brand flex-grow-1 my-1" v-else>
        <img :src="config.logoImg" id="logo-img" alt="_('pywb logo')">
      </div>
      <div class="flex-grow-1 d-flex" id="searchdiv">
        <form
          class="form-inline my-2 my-md-0 mx-lg-auto"
@ -74,6 +77,17 @@
              <i class="far fa-chart-bar"></i>
            </button>
          </li>
          <li class="nav-item">
            <button
              class="btn btn-sm"
              :class="{'btn-outline-light': lightButtons, 'btn-outline-dark': !lightButtons}"
              :aria-pressed="printReplayFrame"
              @click="printReplayFrame"
              v-if="printingEnabled && hasReplayFrame()"
              :title="_('Print')">
              <i class="fas fa-print"></i>
            </button>
          </li>
          <li class="nav-item dropdown" v-if="localesAreSet">
            <button
              class="btn btn-sm dropdown-toggle"
@ -213,6 +227,9 @@ export default {
    lightButtons() {
      return !!this.config.navbarLightButtons;
    },
    printingEnabled() {
      return !this.config.disablePrinting;
    },
    previousSnapshot() {
      if (!this.currentSnapshotIndex) {
        return null;
@ -303,6 +320,14 @@ export default {
      this.showTimelineView = !this.showTimelineView;
      window.localStorage.setItem("showTimelineView", this.showTimelineView ? "1" : "0");
    },
    hasReplayFrame() {
      return !! window.frames.replay_iframe;
    },
    printReplayFrame() {
      window.frames.replay_iframe.contentWindow.focus();
      window.frames.replay_iframe.contentWindow.print();
      return false;
    },
    setData(/** @type {PywbData} data */ data) {
      // data-set will usually happen at App INIT (from parent caller)
--- a/pywb/vueui/src/components/Timeline.vue
+++ b/pywb/vueui/src/components/Timeline.vue
@ -39,7 +39,7 @@
                             @keyup.enter="changePeriod(histoPeriod, $event)"
                             @mouseover="setTooltipPeriod(histoPeriod, $event)"
                             @mouseout="setTooltipPeriod(null, $event)"
-                             tabindex="0"
+                             :tabindex="histoPeriod.snapshotCount > 0 ? 0 : -1"
                        >
                        </div>
                    </div>
@ -49,7 +49,6 @@
                         @keyup.enter="changePeriod(histoPeriod, $event)"
                         @mouseover="setTooltipPeriod(subPeriod, $event)"
                         @mouseout="setTooltipPeriod(null, $event)"
                         tabindex="0"
                    >
                        <div class="label">
                          {{subPeriod.getReadableId()}}
--- a/pywb/vueui/src/components/TimelineBreadcrumbs.vue
+++ b/pywb/vueui/src/components/TimelineBreadcrumbs.vue
@ -8,7 +8,7 @@
                    @keyup.enter="changePeriod(parents[0])"
                    :title="getPeriodZoomOutText(parents[0])"
                    tabindex="1">
-                  <img src="/static/zoom-out-icon-333316.png" /> {{parents[0].getReadableId(true)}}
+                  <i class="fa fa-search-minus"></i> {{parents[0].getReadableId(true)}}
                </span>
            </span>
            &gt;
--- a/pywb/vueui/src/i18n.js
+++ b/pywb/vueui/src/i18n.js
@ -32,7 +32,7 @@ export class PywbI18N {
  getMonth(id, type='long') {
    return decodeURIComponent(this.config[PywbI18N.monthIdPrefix[id]+'_'+type]);
  }
-  // can get long (default) or short day string or intial
+  // can get long (default) or short day string or initial
  // PywbI18N expects to receive day's initials like:
  // config.mon_short, config.tue_long, ...., config.<mmm>_short, config.<mmm>_long
  getWeekDay(id, type='long') {
--- a/pywb/vueui/src/index.js
+++ b/pywb/vueui/src/index.js
@ -7,39 +7,44 @@ import Vue from "vue/dist/vue.esm.browser";
 // ===========================================================================
-export function main(staticPrefix, url, prefix, timestamp, logoUrl, navbarBackground, navbarColor, navbarLightButtons, locale, allLocales, i18nStrings) {
+export function main(config, locale, i18nStrings) {
  PywbI18N.init(locale, i18nStrings);
-  new CDXLoader(staticPrefix, url, prefix, timestamp, logoUrl, navbarBackground, navbarColor, navbarLightButtons, allLocales);
+  new CDXLoader(config);
 }
 // ===========================================================================
 class CDXLoader {
-  constructor(staticPrefix, url, prefix, timestamp, logoUrl, navbarBackground, navbarColor, navbarLightButtons, allLocales) {
+  constructor(config) {
    this.loadingSpinner = null;
    this.loaded = false;
    this.opts = {};
-    this.prefix = prefix;
+    this.url = config.url;
-    this.staticPrefix = staticPrefix;
+    this.prefix = config.prefix;
-    this.logoUrl = logoUrl;
+    this.staticPrefix = config.staticPrefix;
-    this.navbarBackground = navbarBackground;
+    this.logoUrl = config.logoUrl;
-    this.navbarColor = navbarColor;
+    this.logoHomeUrl = config.logoHomeUrl;
-    this.navbarLightButtons = navbarLightButtons;
+    this.navbarBackground = config.navbarBackground;
-    this.timestamp = timestamp;
+    this.navbarColor = config.navbarColor;
    this.navbarLightButtons = config.navbarLightButtons;
    this.disablePrinting = config.disablePrinting;
-    this.isReplay = (timestamp !== undefined);
+    this.timestamp = config.timestamp;
    this.isReplay = (config.timestamp !== undefined);
    setTimeout(() => {
      if (!this.loaded) {
-        this.loadingSpinner = new LoadingSpinner({text: PywbI18N.instance?.getText('Loading...'), isSmall: !!timestamp}); // bootstrap loading-spinner EARLY ON
+        this.loadingSpinner = new LoadingSpinner({text: PywbI18N.instance?.getText('Loading...'), isSmall: !!this.timestamp}); // bootstrap loading-spinner EARLY ON
        this.loadingSpinner.setOn();
      }
    }, 500);
    if (this.isReplay) {
-      window.WBBanner = new VueBannerWrapper(this, url, timestamp);
+      window.WBBanner = new VueBannerWrapper(this, this.url, this.timestamp);
    }
    let queryURL;
    let url;
    // query form *?=url...
    if (window.location.href.indexOf("*?") > 0) {
@ -47,23 +52,24 @@ class CDXLoader {
      url = new URL(queryURL).searchParams.get("url");
    // otherwise, traditional calendar form /*/<url>
-    } else if (url) {
+    } else if (this.url) {
      url = this.url
      const params = new URLSearchParams();
      params.set("url", url);
      params.set("output", "json");
-      queryURL = prefix + "cdx?" + params.toString();
+      queryURL = this.prefix + "cdx?" + params.toString();
    // otherwise, an error since no URL
    } else {
      throw new Error("No query URL specified");
    }
-    const logoImg = this.staticPrefix + "/" + (this.logoUrl ? this.logoUrl : "pywb-logo-sm.png");
+    config.logoImg = this.staticPrefix + "/" + (!!this.logoUrl ? this.logoUrl : "pywb-logo-sm.png");
-    this.app = this.initApp({logoImg, navbarBackground, navbarColor, navbarLightButtons, url, allLocales, timestamp});
+    this.app = this.initApp(config);
    this.loadCDX(queryURL).then((cdxList) => {
-      this.setAppData(cdxList, url, this.timestamp);
+      this.setAppData(cdxList, url, config.timestamp);
    });
  }
--- a/pywb/vueui/yarn.lock
+++ b/pywb/vueui/yarn.lock
@ -386,7 +386,7 @@ color-name@~1.1.4:
 concat-map@0.0.1:
  version "0.0.1"
  resolved "https://registry.yarnpkg.com/concat-map/-/concat-map-0.0.1.tgz#d8a96bd77fd68df7793a73036a3ba0d5405d477b"
-  integrity sha1-2Klr13/Wjfd5OnMDajug1UBdR3s=
+  integrity sha512-/Srv4dswyQNBfohGpz9o6Yb3Gz3SrUDqBH5rTuhGR7ahtlbYKnVxw2bCFMRljaA7EXHaXZ8wsHdodFvbkhKmqg==
 consolidate@^0.15.1:
  version "0.15.1"
@ -469,9 +469,9 @@ debug@~3.1.0:
    ms "2.0.0"
 decode-uri-component@^0.2.0:
-  version "0.2.0"
+  version "0.2.2"
-  resolved "https://registry.yarnpkg.com/decode-uri-component/-/decode-uri-component-0.2.0.tgz#eb3913333458775cb84cd1a1fae062106bb87545"
+  resolved "https://registry.yarnpkg.com/decode-uri-component/-/decode-uri-component-0.2.2.tgz#e69dbe25d37941171dd540e024c444cd5188e1e9"
-  integrity sha1-6zkTMzRYd1y4TNGh+uBiEGu4dUU=
+  integrity sha512-FqUYQ+8o158GyGTrMFJms9qh3CqTKvAqgqsTnkLI8sKu0028orqBhxNMFkFen0zGyg6epACD32pjVk58ngIErQ==
 deep-is@^0.1.3:
  version "0.1.3"
@ -1103,9 +1103,9 @@ mime@^1.4.1:
  integrity sha512-x0Vn8spI+wuJ1O6S7gnbaQg8Pxh4NNHb7KSINmEWKiPE4RKOplvijn+NkmYmmRgP68mc70j2EbeTFRsrswaQeg==
 minimatch@^3.0.4:
-  version "3.0.4"
+  version "3.1.2"
-  resolved "https://registry.yarnpkg.com/minimatch/-/minimatch-3.0.4.tgz#5166e286457f03306064be5497e8dbb0c3d32083"
+  resolved "https://registry.yarnpkg.com/minimatch/-/minimatch-3.1.2.tgz#19cd194bfd3e428f049a70817c038d89ab4be35b"
-  integrity sha512-yJHVQEhyqPLUTgt9B83PXu6W3rx4MvvHvSUvToogpwoGDOUQ+yDrR0HRot+yOCdCO7u4hX3pWft6kWBBcqh0UA==
+  integrity sha512-J7p63hRiAjw1NDEww1W7i37+ByIrOWO5XQQAzZ3VOcL0PNybwpfmV/N05zFAzwQ9USyEcX6t3UO+K5aqBQOIHw==
  dependencies:
    brace-expansion "^1.1.7"
--- a/pywb/warcserver/access_checker.py
+++ b/pywb/warcserver/access_checker.py
@ -260,6 +260,10 @@ class AccessChecker(object):
            if key.startswith(acl_key):
                acl_obj = CDXObject(acl)
            # Check for "*," in ACL, which matches any URL
            if acl_key == b"*,":
                acl_obj = CDXObject(acl)
            if acl_obj:
                user = acl_obj.get('user')
                if user == acl_user:
--- a/pywb/warcserver/index/fuzzymatcher.py
+++ b/pywb/warcserver/index/fuzzymatcher.py
@ -15,7 +15,7 @@ from collections import namedtuple
 # ============================================================================
 FuzzyRule = namedtuple('FuzzyRule',
                       'url_prefix, regex, replace_after, filter_str, ' +
-                       'match_type, find_all')
+                       'match_type, re_type')
 # ============================================================================
@ -23,6 +23,7 @@ class FuzzyMatcher(object):
    DEFAULT_FILTER = ['urlkey:{0}']
    DEFAULT_MATCH_TYPE = 'prefix'
    DEFAULT_REPLACE_AFTER = '?'
    DEFAULT_RE_TYPE = 'search'
    FUZZY_SKIP_PARAMS = ('alt_url', 'reverse', 'closest', 'end_key',
                         'url', 'matchType', 'filter')
@ -58,16 +59,16 @@ class FuzzyMatcher(object):
            replace_after = self.DEFAULT_REPLACE_AFTER
            filter_str = self.DEFAULT_FILTER
            match_type = self.DEFAULT_MATCH_TYPE
-            find_all = False
+            re_type = self.DEFAULT_RE_TYPE
        else:
            regex = self.make_regex(config.get('match'))
            replace_after = config.get('replace', self.DEFAULT_REPLACE_AFTER)
            filter_str = config.get('filter', self.DEFAULT_FILTER)
            match_type = config.get('type', self.DEFAULT_MATCH_TYPE)
-            find_all = config.get('find_all', False)
+            re_type = config.get('re_type', self.DEFAULT_RE_TYPE)
-        return FuzzyRule(url_prefix, regex, replace_after, filter_str, match_type, find_all)
+        return FuzzyRule(url_prefix, regex, replace_after, filter_str, match_type, re_type)
    def get_fuzzy_match(self, urlkey, url, params):
        filters = set()
@ -78,9 +79,12 @@ class FuzzyMatcher(object):
                continue
            groups = None
-            if rule.find_all:
+            if rule.re_type == 'findall':
                groups = rule.regex.findall(urlkey)
-            else:
+            if rule.re_type == 'sub':
                matched_rule = rule
                break
            elif rule.re_type == 'search':
                m = rule.regex.search(urlkey)
                groups = m and m.groups()
@ -102,7 +106,7 @@ class FuzzyMatcher(object):
        no_filters = (not filters or filters == {'urlkey:'}) and (matched_rule.replace_after == '?')
        inx = url.find(matched_rule.replace_after)
-        if inx > 0:
+        if inx > 0 and matched_rule.re_type != 'sub':
            length = inx + len(matched_rule.replace_after)
            # don't include trailing '?' for default filter
            if no_filters:
@ -111,13 +115,17 @@ class FuzzyMatcher(object):
                if url[length - 1] == '/':
                    length -= 1
            url = url[:length]
-        elif not no_filters:
+        elif not no_filters and matched_rule.re_type != 'sub':
            url += matched_rule.replace_after[0]
        if matched_rule.match_type == 'domain':
            host = urlsplit(url).netloc
            url = host.split('.', 1)[1]
        if matched_rule.re_type == 'sub':
            filters = {'urlkey:'}
            url = re.sub(rule.regex, rule.replace_after, url)            
        fuzzy_params = {'url': url,
                        'matchType': matched_rule.match_type,
                        'filter': filters,
--- a/pywb/warcserver/index/test/test_fuzzymatcher.py
+++ b/pywb/warcserver/index/test/test_fuzzymatcher.py
@ -234,3 +234,10 @@ class TestFuzzy(object):
        params = self.get_params(url, actual_url, mime='application/x-shockwave-flash')
        cdx_iter, errs = self.fuzzy(self.source, params)
        assert list(cdx_iter) == []
    def test_fuzzy_sub_replacement(self):
        url = 'https://example.com/matched'
        actual_url = 'https://example.com/replaced'
        params = self.get_params(url, actual_url)
        cdx_iter, errs = self.fuzzy(self.source, params)
        assert list(cdx_iter) == self.get_expected(actual_url)
--- a/pywb/warcserver/inputrequest.py
+++ b/pywb/warcserver/inputrequest.py
@ -11,6 +11,7 @@ from io import BytesIO
 import base64
 import cgi
 import json
 import math
 import sys
@ -328,7 +329,22 @@ class MethodQueryCanonicalizer(object):
                    _parser(v, name)
            elif name:
-                data[get_key(name)] = str(json_obj)
+                if isinstance(json_obj, bool) and json_obj:
                    data[get_key(name)] = "true"
                elif isinstance(json_obj, bool):
                    data[get_key(name)] = "false"
                elif json_obj is None:
                    data[get_key(name)] = "null"
                elif isinstance(json_obj, float):
                    # Treat floats like JavaScript's Number.prototype.toString(),
                    # drop decimal if float represents a whole number.
                    fraction, _ = math.modf(json_obj)
                    if fraction == 0.0:
                        data[get_key(name)] = str(int(json_obj))
                    else:
                        data[get_key(name)] = str(json_obj)
                else:
                    data[get_key(name)] = str(json_obj)
        _parser(json.loads(string))
        return urlencode(data)
--- a/pywb/warcserver/test/test_inputreq.py
+++ b/pywb/warcserver/test/test_inputreq.py
@ -39,7 +39,7 @@ class InputReqApp(object):
 #=============================================================================
 class TestInputReq(object):
-    def setup(self):
+    def setup_method(self):
        self.app = InputReqApp()
        self.testapp = webtest.TestApp(self.app)
@ -82,44 +82,49 @@ Foo: Bar\r\n\
 class TestPostQueryExtract(object):
    @classmethod
    def setup_class(cls):
-        cls.post_data = b'foo=bar&dir=%2Fbaz'
+        cls.post_data = b'foo=bar&dir=%2Fbaz&do=true&re=false&re=null'
        cls.binary_post_data = b'\x816l`L\xa04P\x0e\xe0r\x02\xb5\x89\x19\x00fP\xdb\x0e\xb0\x02,'
    def test_post_extract_1(self):
        mq = MethodQueryCanonicalizer('POST', 'application/x-www-form-urlencoded',
                                len(self.post_data), BytesIO(self.post_data))
-        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=POST&foo=bar&dir=/baz'
+        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=POST&foo=bar&dir=/baz&do=true&re=false&re=null'
-        assert mq.append_query('http://example.com/?123=ABC') == 'http://example.com/?123=ABC&__wb_method=POST&foo=bar&dir=/baz'
+        assert mq.append_query('http://example.com/?123=ABC') == 'http://example.com/?123=ABC&__wb_method=POST&foo=bar&dir=/baz&do=true&re=false&re=null'
    def test_post_extract_json(self):
-        post_data = b'{"a": "b", "c": {"a": 2}, "d": "e"}'
+        post_data = b'{"a": "b", "c": {"a": 2}, "d": "e", "f": true, "g": [false, null]}'
        mq = MethodQueryCanonicalizer('POST', 'application/json',
                                len(post_data), BytesIO(post_data))
-        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=POST&a=b&a.2_=2&d=e'
+        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=POST&a=b&a.2_=2&d=e&f=true&g=false&g.2_=null'
        post_data = b'{"type": "event", "id": 44.0, "float": 35.7, "values": [true, false, null], "source": {"type": "component", "id": "a+b&c= d", "values": [3, 4]}}'
        mq = MethodQueryCanonicalizer('POST', 'application/json',
                                len(post_data), BytesIO(post_data))
        assert mq.append_query('http://example.com/events') == 'http://example.com/events?__wb_method=POST&type=event&id=44&float=35.7&values=true&values.2_=false&values.3_=null&type.2_=component&id.2_=a%2Bb%26c%3D+d&values.4_=3&values.5_=4'
    def test_put_extract_method(self):
        mq = MethodQueryCanonicalizer('PUT', 'application/x-www-form-urlencoded',
                                len(self.post_data), BytesIO(self.post_data))
-        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=PUT&foo=bar&dir=/baz'
+        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=PUT&foo=bar&dir=/baz&do=true&re=false&re=null'
    def test_post_extract_non_form_data_1(self):
        mq = MethodQueryCanonicalizer('POST', 'application/octet-stream',
                                len(self.post_data), BytesIO(self.post_data))
        #base64 encoded data
-        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=POST&__wb_post_data=Zm9vPWJhciZkaXI9JTJGYmF6'
+        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=POST&__wb_post_data=Zm9vPWJhciZkaXI9JTJGYmF6JmRvPXRydWUmcmU9ZmFsc2UmcmU9bnVsbA=='
    def test_post_extract_non_form_data_2(self):
        mq = MethodQueryCanonicalizer('POST', 'text/plain',
                                len(self.post_data), BytesIO(self.post_data))
        #base64 encoded data
-        assert mq.append_query('http://example.com/pathbar?id=123') == 'http://example.com/pathbar?id=123&__wb_method=POST&__wb_post_data=Zm9vPWJhciZkaXI9JTJGYmF6'
+        assert mq.append_query('http://example.com/pathbar?id=123') == 'http://example.com/pathbar?id=123&__wb_method=POST&__wb_post_data=Zm9vPWJhciZkaXI9JTJGYmF6JmRvPXRydWUmcmU9ZmFsc2UmcmU9bnVsbA=='
    def test_post_extract_length_invalid_ignore(self):
        mq = MethodQueryCanonicalizer('POST', 'application/x-www-form-urlencoded',
@ -136,13 +141,13 @@ class TestPostQueryExtract(object):
        mq = MethodQueryCanonicalizer('POST', 'application/x-www-form-urlencoded',
                                len(self.post_data) - 4, BytesIO(self.post_data))
-        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=POST&foo=bar&dir=%2'
+        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=POST&foo=bar&dir=/baz&do=true&re=false&re='
    def test_post_extract_length_too_long(self):
        mq = MethodQueryCanonicalizer('POST', 'application/x-www-form-urlencoded',
                                len(self.post_data) + 4, BytesIO(self.post_data))
-        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=POST&foo=bar&dir=/baz'
+        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=POST&foo=bar&dir=/baz&do=true&re=false&re=null'
    def test_post_extract_malformed_form_data(self):
        mq = MethodQueryCanonicalizer('POST', 'application/x-www-form-urlencoded',
@ -155,7 +160,7 @@ class TestPostQueryExtract(object):
        mq = MethodQueryCanonicalizer('POST', 'multipart/form-data',
                                len(self.post_data), BytesIO(self.post_data))
-        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=POST&__wb_post_data=Zm9vPWJhciZkaXI9JTJGYmF6'
+        assert mq.append_query('http://example.com/') == 'http://example.com/?__wb_method=POST&__wb_post_data=Zm9vPWJhciZkaXI9JTJGYmF6JmRvPXRydWUmcmU9ZmFsc2UmcmU9bnVsbA=='
    def test_options(self):
--- a/pywb/warcserver/test/test_upstream.py
+++ b/pywb/warcserver/test/test_upstream.py
@ -18,7 +18,7 @@ from .testutils import LiveServerTests, HttpBinLiveTests, BaseTestClass
 class TestUpstream(LiveServerTests, HttpBinLiveTests, BaseTestClass):
-    def setup(self):
+    def setup_method(self):
        app = BaseWarcServer()
        base_url = 'http://localhost:{0}'.format(self.server.port)
--- a/requirements.txt
+++ b/requirements.txt
@ -1,19 +1,21 @@
 six
 warcio>=1.7.1
 requests
-redis<3.0
+redis==2.10.6
-jinja2<3.0.0
+jinja2>=3.1.2
 surt>=0.3.1
 brotlipy
 pyyaml
-werkzeug
+werkzeug==2.2.3
 webencodings
-gevent==21.12.0
+gevent==22.10.2
 greenlet>=2.0.2,<3.0
 webassets==2.0
 portalocker
 wsgiprox>=1.5.1
 fakeredis<1.0
 tldextract
 python-dateutil
-markupsafe<2.1.0
+markupsafe>=2.1.1
 ua_parser
 py3AMF
--- a/sample_archive/access/allow_all.aclj
+++ b/sample_archive/access/allow_all.aclj
@ -0,0 +1 @@
 *, - {"access": "allow", "user": "staff"}
--- a/sample_archive/access/pywb.aclj
+++ b/sample_archive/access/pywb.aclj
@ -5,6 +5,8 @@ org,iana)/_css/2013.1/fonts/opensans-semibold.ttf - {"access": "allow"}
 org,iana)/_css - {"access": "exclude"}
 org,iana)/### - {"access": "allow"}
 org,iana)/ - {"access": "exclude"}
 com,example)/?example=3 - {"access": "block", "user": "staff"}
 com,example)/?example=3 - {"access": "exclude", "user": "staff2"}
 org,example)/?example=1 - {"access": "block"}
 com,example)/?example=2 - {"access": "allow_ignore_embargo"}
 com,example)/?example=1 - {"access": "allow_ignore_embargo", "user": "staff2"}
--- a/sample_archive/cdxj/example.cdx.gz
+++ b/sample_archive/cdxj/example.cdx.gz
--- a/sample_archive/waczs/invalid_example_1.wacz
+++ b/sample_archive/waczs/invalid_example_1.wacz
--- a/sample_archive/waczs/valid_example_1.wacz
+++ b/sample_archive/waczs/valid_example_1.wacz
--- a/setup.py
+++ b/setup.py
@ -62,10 +62,6 @@ def generate_git_hash_py(pkg, filename='git_hash.py'):
 def load_requirements(filename):
    with open(filename, 'rt') as fh:
        requirements = fh.read().rstrip().split('\n')
    if sys.version_info > (3, 0):
        requirements.append("py3AMF")
    else:
        requirements.append("pyAMF")
    return requirements
@ -113,6 +109,7 @@ setup(
            "translate_toolkit"
        ],
    },
    python_requires='>=3.7,<3.12',
    tests_require=load_requirements("test_requirements.txt"),
    cmdclass={'test': PyTest},
    test_suite='',
@ -131,16 +128,12 @@ setup(
        'Environment :: Web Environment',
        'License :: OSI Approved :: GNU General Public License (GPL)',
        'License :: OSI Approved :: GNU General Public License v3 (GPLv3)',
        'Programming Language :: Python :: 2',
        'Programming Language :: Python :: 2.7',
        'Programming Language :: Python :: 3',
        'Programming Language :: Python :: 3.3',
        'Programming Language :: Python :: 3.4',
        'Programming Language :: Python :: 3.5',
        'Programming Language :: Python :: 3.6',
        'Programming Language :: Python :: 3.7',
        'Programming Language :: Python :: 3.8',
        'Programming Language :: Python :: 3.9',
        'Programming Language :: Python :: 3.10',
        'Programming Language :: Python :: 3.11',
        'Topic :: Internet :: Proxy Servers',
        'Topic :: Internet :: WWW/HTTP',
        'Topic :: Internet :: WWW/HTTP :: WSGI',
--- a/test_requirements.txt
+++ b/test_requirements.txt
@ -3,7 +3,6 @@ WebTest
 pytest-cov
 mock
 urllib3
 httpbin==0.5.0
 flask<2.0
 ujson
 lxml
 httpbin>=0.10.2
--- a/tests/config_test_access.yaml
+++ b/tests/config_test_access.yaml
@ -62,6 +62,13 @@ collections:
        acl_paths:
            - ./sample_archive/access/pywb.aclj
    pywb-wildcard-surt:
        index_paths: ./sample_archive/cdx/
        archive_paths: ./sample_archive/warcs/
        default_access: block
        acl_paths:
            - ./sample_archive/access/allow_all.aclj
--- a/tests/test_acl.py
+++ b/tests/test_acl.py
@ -41,12 +41,23 @@ class TestACLApp(BaseConfigTest):
        assert 'Access Blocked' in resp.text
    def test_allow_via_acl_header(self):
-        resp = self.query('http://www.iana.org/about/')
+        resp = self.testapp.get('/pywb/cdx?url=http://www.iana.org/about/', headers={"X-Pywb-Acl-User": "staff"})
        assert len(resp.text.splitlines()) == 1
        resp = self.testapp.get('/pywb/mp_/http://www.iana.org/about/', headers={"X-Pywb-Acl-User": "staff"}, status=200)
    def test_block_via_acl_header(self):
        resp = self.testapp.get('/pywb/cdx?url=http://example.com/?example=3', headers={"X-Pywb-Acl-User": "staff"})
        assert len(resp.text.splitlines()) > 0
        resp = self.testapp.get('/pywb/mp_/http://example.com/?example=3', headers={"X-Pywb-Acl-User": "staff"}, status=451)
    def test_exclude_via_acl_header(self):
        resp = self.testapp.get('/pywb/cdx?url=http://example.com/?example=3', headers={"X-Pywb-Acl-User": "staff2"})
        assert len(resp.text.splitlines()) == 0
        resp = self.testapp.get('/pywb/mp_/http://example.com/?example=3', headers={"X-Pywb-Acl-User": "staff2"}, status=404)
    def test_allowed_more_specific(self):
        resp = self.query('http://www.iana.org/_css/2013.1/fonts/opensans-semibold.ttf')
@ -85,5 +96,9 @@ class TestACLApp(BaseConfigTest):
        assert '"http://httpbin.org/anything/resource.json"' in resp.text
    def test_allow_all_acl_user_specific(self):
        resp = self.testapp.get('/pywb-wildcard-surt/mp_/http://example.com/', status=451)
        assert 'Access Blocked' in resp.text
        resp = self.testapp.get('/pywb-wildcard-surt/mp_/http://example.com/', headers={"X-Pywb-Acl-User": "staff"}, status=200)
--- a/tests/test_auto_colls.py
+++ b/tests/test_auto_colls.py
@ -537,7 +537,7 @@ class TestManagedColls(CollsDirMixin, BaseConfigTest):
            main(['template', 'foo', '--remove', 'query_html'])
    def test_err_no_such_coll(self):
-        """ Test error adding warc to non-existant collection
+        """ Test error adding warc to non-existent collection
        """
        warc1 = self._get_sample_warc('example.warc.gz')
--- a/tests/test_embargo.py
+++ b/tests/test_embargo.py
@ -46,8 +46,12 @@ class TestEmbargoApp(BaseConfigTest):
    def test_embargo_ignore_acl_with_header_only(self):
        # ignore embargo with custom header only
        headers = {"X-Pywb-ACL-User": "staff2"}
        resp = self.testapp.get('/pywb-embargo-acl/20140126201054mp_/http://example.com/?example=1', status=200, headers=headers)
        resp = self.testapp.get('/pywb-embargo-acl/cdx?url=http://example.com/?example=1', headers=headers)
        assert len(resp.text.splitlines()) > 0
        resp = self.testapp.get('/pywb-embargo-acl/20140126201054mp_/http://example.com/?example=1', status=200, headers=headers)
        resp = self.testapp.get('/pywb-embargo-acl/cdx?url=http://example.com/?example=1')
        assert len(resp.text.splitlines()) == 0
        resp = self.testapp.get('/pywb-embargo-acl/20140126201054mp_/http://example.com/?example=1', status=404)
--- a/tests/test_force_https.py
+++ b/tests/test_force_https.py
@ -56,6 +56,6 @@ class TestForceHttpsRoot(BaseConfigTest):
        resp = self.get('/20140128051539{0}/http://www.iana.org/domains/example', fmod,
                        headers={'X-Forwarded-Proto': 'https'})
-        assert resp.headers['Location'] == 'https://localhost:80/20140128051539{0}/http://www.iana.org/domains/reserved'.format(fmod)
+        assert resp.headers['Location'] == 'https://localhost:80/20140128051539{0}/http://www.iana.org/help/example-domains'.format(fmod)
--- a/tests/test_integration.py
+++ b/tests/test_integration.py
@ -400,7 +400,7 @@ class TestWbIntegration(BaseConfigTest):
        assert resp.status_int == 200
        assert resp.headers['Content-Location'].endswith('/pywb/20140126200928{0}/http://www.iana.org/domains/root/db'.format(fmod))
-    def test_not_existant_warc_other_capture(self, fmod):
+    def test_not_existent_warc_other_capture(self, fmod):
        resp = self.get('/pywb/20140703030321{0}/http://example.com/?example=2', fmod)
        assert resp.status_int == 200
        assert resp.headers['Content-Location'].endswith('/pywb/20140603030341{0}/http://example.com?example=2'.format(fmod))
@ -410,7 +410,7 @@ class TestWbIntegration(BaseConfigTest):
        assert resp.status_int == 200
        assert resp.headers['Content-Location'].endswith('/pywb/20140603030341{0}/http://example.com?example=2'.format(fmod))
-    def test_not_existant_warc_no_other(self, fmod):
+    def test_not_existent_warc_no_other(self, fmod):
        resp = self.get('/pywb/20140703030321{0}/http://example.com/?example=3', fmod, status=503)
        assert resp.status_int == 503
--- a/tests/test_live_rewriter.py
+++ b/tests/test_live_rewriter.py
@ -91,25 +91,28 @@ class TestLiveRewriter(HttpBinLiveTests, BaseConfigTest):
        resp = self.head('/live/{0}httpbin.org/get?foo=bar', fmod_sl)
        assert resp.status_int == 200
-    @pytest.mark.skipif(sys.version_info < (3,0), reason='does not respond in 2.7')
+    # Following tests are temporarily commented out because latest version of PSF httpbin
-    def test_live_bad_content_length(self, fmod_sl):
+    # now returns 400 if content-length header isn't parsable as an int
        resp = self.get('/live/{0}httpbin.org/response-headers?content-length=149,149', fmod_sl, status=200)
        assert resp.headers['Content-Length'] == '149'
-        resp = self.get('/live/{0}httpbin.org/response-headers?Content-Length=xyz', fmod_sl, status=200)
+    # @pytest.mark.skipif(sys.version_info < (3,0), reason='does not respond in 2.7')
-        assert resp.headers['Content-Length'] == '90'
+    # def test_live_bad_content_length(self, fmod_sl):
    #     resp = self.get('/live/{0}httpbin.org/response-headers?content-length=149,149', fmod_sl, status=200)
    #     assert resp.headers['Content-Length'] == '149'
-    @pytest.mark.skipif(sys.version_info < (3,0), reason='does not respond in 2.7')
+    #     resp = self.get('/live/{0}httpbin.org/response-headers?Content-Length=xyz', fmod_sl, status=200)
-    def test_live_bad_content_length_with_range(self, fmod_sl):
+    #     assert resp.headers['Content-Length'] == '90'
        resp = self.get('/live/{0}httpbin.org/response-headers?content-length=149,149', fmod_sl,
                        headers={'Range': 'bytes=0-'}, status=206)
        assert resp.headers['Content-Length'] == '149'
        assert resp.headers['Content-Range'] == 'bytes 0-148/149'
-        resp = self.get('/live/{0}httpbin.org/response-headers?Content-Length=xyz', fmod_sl,
+    # @pytest.mark.skipif(sys.version_info < (3,0), reason='does not respond in 2.7')
-                        headers={'Range': 'bytes=0-'}, status=206)
+    # def test_live_bad_content_length_with_range(self, fmod_sl):
-        assert resp.headers['Content-Length'] == '90'
+    #     resp = self.get('/live/{0}httpbin.org/response-headers?content-length=149,149', fmod_sl,
-        assert resp.headers['Content-Range'] == 'bytes 0-89/90'
+    #                     headers={'Range': 'bytes=0-'}, status=206)
    #     assert resp.headers['Content-Length'] == '149'
    #     assert resp.headers['Content-Range'] == 'bytes 0-148/149'
    #     resp = self.get('/live/{0}httpbin.org/response-headers?Content-Length=xyz', fmod_sl,
    #                     headers={'Range': 'bytes=0-'}, status=206)
    #     assert resp.headers['Content-Length'] == '90'
    #     assert resp.headers['Content-Range'] == 'bytes 0-89/90'
    def test_custom_unicode_header(self, fmod_sl):
        value = u'⛄'
--- a/tests/test_manager.py
+++ b/tests/test_manager.py
@ -0,0 +1,135 @@
 import os
 import pytest
 from pywb.manager.manager import CollectionsManager
 VALID_WACZ_PATH = 'sample_archive/waczs/valid_example_1.wacz'
 INVALID_WACZ_PATH = 'sample_archive/waczs/invalid_example_1.wacz'
 TEST_COLLECTION_NAME = 'test-col'
 class TestManager:
    def test_add_valid_wacz_unpacked(self, tmp_path):
        """Test if adding a valid wacz file to a collection succeeds"""
        manager = self.get_test_collections_manager(tmp_path)
        manager._add_wacz_unpacked(VALID_WACZ_PATH)
        assert 'valid_example_1-0.warc' in os.listdir(manager.archive_dir)
        assert manager.DEF_INDEX_FILE in os.listdir(manager.indexes_dir)
        with open(os.path.join(manager.indexes_dir, manager.DEF_INDEX_FILE), 'r') as f:
            assert '"filename": "valid_example_1-0.warc"' in f.read()
    def test_add_valid_wacz_unpacked_dupe_name(self, tmp_path):
        """Test if warc that already exists is renamed with -index suffix"""
        manager = self.get_test_collections_manager(tmp_path)
        manager._add_wacz_unpacked(VALID_WACZ_PATH)
        # Add it again to see if there are name conflicts
        manager._add_wacz_unpacked(VALID_WACZ_PATH)
        assert 'valid_example_1-0.warc' in os.listdir(manager.archive_dir)
        assert 'valid_example_1-0-1.warc' in os.listdir(manager.archive_dir)
        assert manager.DEF_INDEX_FILE in os.listdir(manager.indexes_dir)
        with open(os.path.join(manager.indexes_dir, manager.DEF_INDEX_FILE), 'r') as f:
            data = f.read()
            assert '"filename": "valid_example_1-0.warc"' in data
            assert '"filename": "valid_example_1-0-1.warc"' in data
    def test_add_invalid_wacz_unpacked(self, tmp_path, caplog):
        """Test if adding an invalid wacz file to a collection fails"""
        manager = self.get_test_collections_manager(tmp_path)
        manager._add_wacz_unpacked(INVALID_WACZ_PATH)
        assert 'invalid_example_1-0.warc' not in os.listdir(manager.archive_dir)
        assert 'sample_archive/waczs/invalid_example_1.wacz does not contain any warc files.' in caplog.text
        index_path = os.path.join(manager.indexes_dir, manager.DEF_INDEX_FILE)
        if os.path.exists(index_path):
            with open(index_path, 'r') as f:
                assert '"filename": "invalid_example_1-0.warc"' not in f.read()
    def test_add_valid_archives_unpack_wacz(self, tmp_path):
        manager = self.get_test_collections_manager(tmp_path)
        archives = ['sample_archive/warcs/example.arc', 'sample_archive/warcs/example.arc.gz',
                    'sample_archive/warcs/example.warc', 'sample_archive/warcs/example.warc.gz',
                    'sample_archive/waczs/valid_example_1.wacz']
        manager.add_archives(archives, unpack_wacz=True)
        with open(os.path.join(manager.indexes_dir, manager.DEF_INDEX_FILE), 'r') as f:
            index_text = f.read()
        for archive in archives:
            archive = os.path.basename(archive)
            if archive.endswith('wacz'):
                archive = 'valid_example_1-0.warc'
            assert archive in os.listdir(manager.archive_dir)
            assert archive in index_text
    def test_add_valid_archives_dupe_name(self, tmp_path):
        manager = self.get_test_collections_manager(tmp_path)
        warc_filename = 'sample_archive/warcs/example.warc.gz'
        manager.add_archives([warc_filename, warc_filename])
        with open(os.path.join(manager.indexes_dir, manager.DEF_INDEX_FILE), 'r') as f:
            index_text = f.read()
        expected_archives = ('example.warc.gz', 'example-1.warc.gz')
        for archive in expected_archives:
            assert archive in os.listdir(manager.archive_dir)
            assert archive in index_text
    def test_add_valid_archives_dont_unpack_wacz(self, tmp_path):
        manager = self.get_test_collections_manager(tmp_path)
        archives = ['sample_archive/warcs/example.arc', 'sample_archive/warcs/example.arc.gz',
                    'sample_archive/warcs/example.warc', 'sample_archive/warcs/example.warc.gz',
                    'sample_archive/waczs/valid_example_1.wacz']
        with pytest.raises(NotImplementedError):
            manager.add_archives(archives, unpack_wacz=False)
    def test_add_invalid_archives_unpack_wacz(self, tmp_path, caplog):
        manager = self.get_test_collections_manager(tmp_path)
        manager.add_archives(['sample_archive/warcs/example.warc', 'sample_archive/text_content/sample.html'],
                             unpack_wacz=True)
        assert 'sample.html' not in os.listdir(manager.archive_dir)
        assert 'example.warc' in os.listdir(manager.archive_dir)
        assert "Invalid archives weren't added: sample_archive/text_content/sample.html" in caplog.messages
    def test_merge_wacz_index(self, tmp_path):
        manager = self.get_test_collections_manager(tmp_path)
        manager._add_wacz_index(os.path.join(manager.indexes_dir, manager.DEF_INDEX_FILE),
                                'sample_archive/cdxj/example.cdxj',
                                {'example.warc.gz': 'rewritten.warc.gz'})
        with open(os.path.join(manager.indexes_dir, manager.DEF_INDEX_FILE), 'r') as f:
            index_content = f.read()
            index_content = index_content.strip()
        assert 'example.warc.gz' not in index_content
        assert 'rewritten.warc.gz' in index_content
        # check that collection index is sorted
        index_lines = index_content.split('\n')
        assert sorted(index_lines) == index_lines
    def test_merge_wacz_index_gzip(self, tmp_path):
        manager = self.get_test_collections_manager(tmp_path)
        manager._add_wacz_index(os.path.join(manager.indexes_dir, manager.DEF_INDEX_FILE),
                                'sample_archive/cdxj/example.cdx.gz',
                                {'example-collection.warc': 'rewritten.warc'})
        with open(os.path.join(manager.indexes_dir, manager.DEF_INDEX_FILE), 'r') as f:
            index_content = f.read()
            index_content = index_content.strip()
        assert 'example-collection.warc' not in index_content
        assert 'rewritten.warc' in index_content
        # check that collection index is sorted
        index_lines = index_content.split('\n')
        assert sorted(index_lines) == index_lines
    @staticmethod
    def get_test_collections_manager(collections_path):
        manager = CollectionsManager(TEST_COLLECTION_NAME, colls_dir=collections_path, must_exist=False)
        manager.add_collection()
        return manager
--- a/tox.ini
+++ b/tox.ini
@ -4,23 +4,24 @@ testpaths =
    tests
 [tox]
-envlist = py36, py37, py38, py39, py310
+envlist = py37, py38, py39, py310, py311
 [gh-actions]
 python =
    3.6: py36
    3.7: py37
    3.8: py38
    3.9: py39
    3.10: py310
    3.11: py311
 [testenv]
 setenv = PYWB_NO_VERIFY_SSL = 1
 passenv = *
 deps =
    -rtest_requirements.txt
    -rrequirements.txt
    -rextra_requirements.txt
 commands =
-    py.test --cov-config .coveragerc --cov pywb -v --doctest-modules ./pywb/ tests/
+    pytest --cov-config .coveragerc --cov pywb -v --doctest-modules ./pywb/ tests/
--- a/2
+++ b/2
@ -1 +1 @@
-Subproject commit 04ca325f3a59e7efc8ad0fa5abe25ec1bc9d9620
+Subproject commit 20596ca1e66928cae6f309af781f961aa112ca7f
Author	SHA1	Message	Date
Tessa Walsh	7b0f8b5860	Use JSON values in query string for JSON request bodies (#893 ) This commit also adds a more complicated JSON test case that is also in warcio.js to ensure parity. Treat numbers like JavaScript's Number.prototype.toString() by dropping decimal from floats if they represent whole number.	2024-11-13 14:07:35 -08:00
Hellseher	b44c93bf6e	requirements: Adjust installation of Py3AMF module. (#920 ) Move Py3AMF from setup.py load_requirements to requirements.txt --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-11-07 12:09:35 -05:00
Tessa Walsh	97fffe3a34	Once more, now 2.8.3	2024-04-26 10:32:43 +02:00
Tessa Walsh	6205646b9b	Bump version to 2.8.2	2024-04-26 10:26:56 +02:00
Tessa Walsh	23891be2f1	Bump version	2024-04-26 10:21:03 +02:00
Ed Summers	b190dddee9	Pin redis for fakeredis (#904 ) It looks like `poetry install` will install the latest version of redis (v5.0.4) instead of what pip installs (v2.10.6). Unfortunately this means that the old version of fakeredis that is pinned in the requirements.txt will not work properly. Fixes #903	2024-04-26 04:03:27 -04:00
Tessa Walsh	b9f1609df9	Handle WARC filename conflicts with wb-manager add (#902 ) Append -index to end of filename prior to extension until there is no conflict Also makes sure this behavior is documented in tests	2024-04-24 08:09:02 -04:00
Tessa Walsh	e89924bd39	Rename --uncompress-wacz to --unpack-wacz and add docs (#901 ) Also adds help text for wb-manager add --unpack-wacz option in CLI	2024-04-24 05:02:26 -04:00
Tessa Walsh	b4c91c6633	Bump version in README	2024-04-23 17:27:09 -04:00
Tessa Walsh	1e2665af13	Change version to 2.8.0	2024-04-23 23:26:06 +02:00
Tessa Walsh	fee14d7fe8	Use fontawesome icon for timeline zoom out, remove unused static files (#895 ) * Replace zoomout image in timeline with fontawesome icon * Remove unused icons from static directory	2024-04-17 00:47:58 -04:00
Tessa Walsh	5712945991	Update usage docs section on creating web archives (#899 ) Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>	2024-04-15 10:22:39 -04:00
Ilya Kreymer	2fd6190b72	update wombat to latest (3.7.3) (#896 )	2024-04-10 14:45:13 -04:00
Alex Osborne	791a8d1033	rewrite: stop prepending semicolon to `this.` special property access (#850 ) (#888 ) The prepended semicolon breaks code (such as jQuery) that looks like: foo = foo ? foo : this.location; I think the reason we started inserting the semicolon was because in situations like: x = 1 + 2 this.location = "foo" we used to rewrite to: x = 1 + 2 (this && this._WB_wombat_obj_proxy \|\| this).location = "foo" which the browser would interpret as a bogus function call like `2(this && ... )`. But nowadays prepending the semicolon should be unnecessary as we currently rewrite to: x = 2 + 3 _____WB$wombat$check$this$function_____(this).location = "foo" which will trigger JavaScript's automatic semicolon insertion rules like the original code does.	2024-04-09 09:37:55 -07:00
Tessa Walsh	86ee3bd752	Allow ACLJs to use *, SURT wildcard to match all URLs (#882 ) Also adds tests and documentation	2024-04-03 17:11:58 -04:00
Tessa Walsh	d1e1636ae3	Improve keyboard accessibility of Vue timeline (#889 ) Co-authored by Lee Davey <Lee.Davey@bl.uk>	2024-04-03 17:02:55 -04:00
Ed Summers	b4955cca66	Upgrade dependencies (#839 ) - Update and pin dependencies to specific versions that support Python 3.7-3.11 - Replace deprecated werkzeug.pop_path_info with wsgiref.shift_path_info - Use the latest httpbin from psf/httpbin - Remove unused flask test dependency - Drop Python 2 and Python <3.7 support - Ensure greenlet 2 is used for now, as psf/httpbin doesn't yet work with greenlet 3 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>	2024-04-02 17:16:50 -04:00
kuechensofa	f40e7ef18c	Sort index when adding wacz archives (#820 )	2023-11-23 12:10:52 -05:00
Florian Zimmermeister	6b4f9b323e	Fix code sample syntax in README (#864 )	2023-11-23 11:02:10 -05:00
Ivan Jelenić	7879dd0222	Fixes get_locale_prefixes() wrong paths (#874 ) If default_locale was set, and a web page was visited that doesn't have a langauge code in the path in the URL, the URL path parts returned by get_locale_prefixes() was wrong (e.g. /hrst/ instead of /hr/test/).	2023-11-23 10:59:06 -05:00
Ivan Jelenić	013746c10a	Fixes environ paths when default_locale set (#873 ) If the default_locale was set and the URL path didn't contain a language code, it was behaving as if there was a language code in the URL. In that case, it was moving part of the PATH_INFO to SCRIPT_NAME, but as there wasn't any language code in the URL, it moved something else. This fixes that.	2023-11-23 10:56:26 -05:00
Ivan Jelenić	79140441df	Fixes switch_locale not adding locale if missing from URL (#871 ) If the two letter language code was missing in the URI, switch_locale(locale) didn't add it (it worked fine if it was present). That means that it produced the same URL for all locales, each missing the two letter language code in the URL.	2023-11-23 10:50:56 -05:00
Ivan Jelenić	af92a9726e	Sets "Pywb Error" string as translatable in error template (#868 )	2023-11-23 10:33:38 -05:00
Tessa Walsh	83b2113be2	Add config.yaml UI option to disable printing from replay banner (#815 ) * Add UI option to disable printing * Initialize VueUI.main with config dict	2023-03-27 10:23:37 -04:00
Tessa Walsh	ed36830dc5	Pass env vars to tox (#823 ) This enables us to skip youtube-dl tests in GitHub Actions by ensuring that the "CI" env var is passed to tox.	2023-03-26 16:12:13 -04:00
aponb	81b6a57dfb	Update usage.rst Docker examples (#816 )	2023-02-20 10:10:08 -05:00
Jonas Linde	5c427b9ff2	[#715 ] Forward custom headers for cdx queries (#813 ) In particular the X-Pywb-ACL-User header must be forwarded in order for it to be able to control CDX-queries	2023-02-15 17:05:21 -05:00
kuechensofa	454486bf75	[#799 ] wb-manager: Add wacz archives to collection with --uncompress-wacz (#800 ) Add WACZ support for `wb-manager add` by unpacking WACZ files with --uncompress-wacz. A future commit will add pywb support for WACZ files without requiring them to be unpacked.	2023-02-15 17:00:38 -05:00
Tessa Walsh	b8693307d1	Bump version to 2.8.0-dev	2023-02-15 15:38:10 -05:00
Jonas Linde	98be48d6e4	Add a button to print the replay frame (#814 )	2023-02-15 15:36:30 -05:00
Sara Tavares	c441d83435	chore(typos): fix typos across codebase (#811 ) Co-authored-by: stavares843 <stavares843@users.noreply.github.com>	2023-02-15 13:04:20 -05:00
Ilya Kreymer	4a3e7ddff7	update CHANGES for 2.7.3	2023-02-02 16:22:24 -08:00
Ilya Kreymer	02288db81c	bump wombat to 3.4.4 (#808 )	2023-02-02 16:21:02 -08:00
Ilya Kreymer	4fc2b451d7	templates: fix typo	2023-02-02 15:10:18 -08:00
Tessa Walsh	c8e78fd7c1	Add yarn install to Vue build script	2023-02-02 16:24:01 -05:00
Tessa Walsh	d44d640b93	Set logoHomeUrl to last option in Vue.main This ensures that any locally modified templates won't break when upgrading to pywb 2.7.3.	2023-02-02 16:24:01 -05:00
Ilya Kreymer	03f9708d8d	CHANGELIST: Update changelist for 2.7.3	2023-02-01 17:44:50 -08:00
Ilya Kreymer	406fad95c2	rules: add 'debugNoBatch' rewrite for fb and insta (#806 )	2023-02-01 10:45:22 -08:00
dependabot[bot]	d207c76bae	Bump decode-uri-component from 0.2.0 to 0.2.2 in /pywb/vueui (#786 ) Bumps [decode-uri-component](https://github.com/SamVerschueren/decode-uri-component) from 0.2.0 to 0.2.2. - [Release notes](https://github.com/SamVerschueren/decode-uri-component/releases) - [Commits](https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.2) --- updated-dependencies: - dependency-name: decode-uri-component dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-01-31 18:49:25 -08:00
dependabot[bot]	131732d238	Bump minimatch from 3.0.4 to 3.1.2 in /pywb/vueui (#777 ) Bumps [minimatch](https://github.com/isaacs/minimatch) from 3.0.4 to 3.1.2. - [Release notes](https://github.com/isaacs/minimatch/releases) - [Commits](https://github.com/isaacs/minimatch/compare/v3.0.4...v3.1.2) --- updated-dependencies: - dependency-name: minimatch dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-01-31 18:49:13 -08:00
Mark Johnson	59d9beac05	feat: regex substitution on surt rules match (#780 ) substituion functionality already exists on a global level for matched rules but this causes issues when rule sets conflict in the desired outcome. This change enables setting regex substitution at the rule level to avoid these conflicts.	2023-01-31 18:48:19 -08:00
Tessa Walsh	0758e81b62	Add default_locale fix to 2.7.3 changelog	2023-01-31 15:02:48 -05:00
Jonas Linde	d392a8d908	[#804 ] Use default_locale when lang not set in the request (#805 )	2023-01-31 13:47:50 -05:00
Tessa Walsh	9bc8a2e1ef	Modify search template buttons (#801 ) * Modify search page button colors * Rename Clear Options to Reset and reset URL as well * Add search improvements to CHANGES	2023-01-25 13:33:18 -05:00
Jonas Linde	43e5c8bac0	Make search page more intuitive (#794 ) * Add date-range feedback for i18n * Make search page more intuitive and add help text * Add a clear-options button to the search page	2023-01-23 17:17:40 -05:00
Tessa Walsh	cdab280669	Add 2.7.3 changes	2023-01-19 11:30:42 -05:00
kuechensofa	e6ec8b4aeb	[#795 ] wb-manager: Show error when adding duplicate warc files (#797 )	2023-01-19 11:26:56 -05:00
Tessa Walsh	1790fd006a	Bump version to 2.7.3	2023-01-05 17:34:17 -05:00
Tessa Walsh	3d0673e32a	Add ui.logo_home_url as config.yaml option (#790 ) * Add ui.logo_home_url as config option * Add ui.logo_home_url to docs	2023-01-05 17:33:00 -05:00
oskarhek	3050fd2b2b	issue_792 catch warcio exception (#793 )	2023-01-05 17:15:49 -05:00
		`@ -1 +1 @@`
			`Subproject commit 04ca325f3a59e7efc8ad0fa5abe25ec1bc9d9620`				`Subproject commit 20596ca1e66928cae6f309af781f961aa112ca7f`