diff --git a/docs/manual/configuring.rst b/docs/manual/configuring.rst index 062025f7..fc983e5e 100644 --- a/docs/manual/configuring.rst +++ b/docs/manual/configuring.rst @@ -318,9 +318,17 @@ Note: When a root collection is set, no other collections are currently accessib Recording Mode -------------- -A new recording mode can be enabled for any automatically managed collection by adding a ``recorder`` block in -the root of ``config.yaml``. -The mode can be configured with the following options:: +Recording mode enables pywb to support recording into any automatically managed collection, using +the ``//record/`` path. Accessing this path will result in pywb writing new WARCs directly into +the collection ````. + +To enable recording from the live web, simply run ``wayback --record``. + +To further customize recording mode, add the ``recorder`` block to the root of ``config.yaml``. + +The command-line option is equivalent to adding ``recorder: live``. + +The full set of configurable options (with their default settings) is as follows:: recorder: source_coll: live @@ -329,9 +337,7 @@ The mode can be configured with the following options:: filename_template: my-warc-{timestamp}-{hostname}-{random}.warc.gz -This will enable the ``/record/`` access point under every managed collection, writing new WARCs directly into each collection. The required ``source_coll`` setting specifies the source collection from which to load content that will be recorded. - Most likely this will be the :ref:`live-web` collection, which should also be defined. However, it could be any other collection, allowing for "extraction" from other collections or remote web archives. Both the request and response are recorded into the WARC file, and most standard HTTP verbs should be recordable. diff --git a/docs/manual/rewriter.rst b/docs/manual/rewriter.rst index da428fe0..944f8d2a 100644 --- a/docs/manual/rewriter.rst +++ b/docs/manual/rewriter.rst @@ -1,4 +1,42 @@ Rewriter ======== +pywb includes a sophisticated server and client-side rewriting systems, including a rules-based +configuration for domain and content-specific rewriting rules, fuzzy index matching for replay, +and a thorough client-side JS rewriting system. + + +URL Rewriting +------------- + +Most of the rewriting performed is **url-rewriting**, changing the original URLs to point to +the pywb server instead of the live web. For example, a url to ``http://example.com/`` might be +rewritten as ``http://localhost:8080/my-coll/2017mp_/http://example.com/`` + +URL rewriting is applied to HTML, CSS files, and HTTP headers, as these are loaded directly by the browser. +pywb avoids URL rewriting in JavaScript, to allow that to be handled by the client. + +(No url rewriting is performed when running in :ref:`https-proxy` mode) + + +Configuring Rewriters +--------------------- + +pywb provides customizeable rewriting based on content-type, the available types are configured +in the :py:mod:``pywb.rewriter.default_rewriter``, which specifies rewriter classes per known type, +and mapping of content-types to rewriters. + + +HTML Rewriting +~~~~~~~~~~~~~~ + +An HTML parser is used to rewrite HTML attributes and elements. Most rewriting is applied to url +attributes to add the url rewriting prefix. The CSS and JS in HTML is rewritten using the CS and JSS +rewriters. + +CSS Rewriting +~~~~~~~~~~~~~ + +The CSS rewriter rewrites any urls found in CSS files or ``