.. _rewriter: Rewriter ======== pywb includes a sophisticated server and client-side rewriting systems, including a rules-based configuration for domain and content-specific rewriting rules, fuzzy index matching for replay, and a thorough client-side JS rewriting system. With pywb 2.3.0, the client-side rewriting system exists in a separate module at `https://github.com/webrecorder/wombat`` URL Rewriting ------------- URL rewriting is a key aspect of correctly replaying archived pages. It is applied to HTML, CSS files, and HTTP headers, as these are loaded directly by the browser. pywb avoids URL rewriting in JavaScript, to allow that to be handled by the client. (No url rewriting is performed when running in :ref:`https-proxy` mode) Most of the rewriting performed is **url-rewriting**, changing the original URLs to point to the pywb server instead of the live web. Typically, the rewriting converts: ```` -> ``///`` For example, the ``http://example.com/`` might be rewritten as ``http://localhost:8080/my-coll/2017mp_/http://example.com/`` The rewritten url 'prefixes' the pywb host, the collection, requested datetime (timestamp) and type modifier to the actual url. The result is an 'archival url' which contains the original url and additional information about the archive and timestamp. .. _urlrewrite_type_mod: Url Rewrite Type Modifier ~~~~~~~~~~~~~~~~~~~~~~~~~ The type modifier included after the timestamp specifies the format of the resource to be loaded. Currently, pywb supports the following modifiers: Identity Modifier (``id_``) """"""""""""""""""""""""""" When this modifier is used, eg. ``/my-coll/id_/http://example.com/``, no content rewriting is performed on the response, and the original, un-rewritten content is returned. This is useful for HTML or other text resources that are normally rewritten when using the default (``mp_`` modifier). Note that certain HTTP headers (hop-by-hop or cookie related) may still be prefixed with ``X-Orig-Archive-`` as they may affect the transmission, so original headers are not guaranteed. No Modifier """"""""""" The 'canonical' replay url is one without the modifier and represents the url that a user will see and enter into the browser. The behavior for the canonical/no modifier archival url is only different if framed replay is used (see :ref:`framed_vs_frameless`) * If framed replay, this url serves the top level frame * If frameless replay, this url serves the content and is equivalent to the ``mp_`` modifier. Main Page Modifier (``mp_``) """""""""""""""""""""""""""" This modifier is used to indicate 'main page' content replay, generally HTML pages. Since pywb also checks content type detection, this modifier can be used for any resources that is being loaded for replay, and generally render it correctly. Binary resources can be rendered with this modifier. JS and CSS Hint Modifiers (``js_`` and ``cs_``) """"""""""""""""""""""""""""""""""""""""""""""" These modifiers are useful to 'hint' for pywb that a certain resource is being treated as a JS or CSS file. This only makes a difference where there is an ambiguity. For example, if a resource has type ``text/html`` but is loaded in a ``