1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-14 15:53:28 +01:00
pywb/docs/manual/owb-pywb-terms.rst
Ilya Kreymer 9e09bcd2a7
Docs Update: OpenWayback -> pywb Transition Guide (#588)
* docs work on OpenWayback -> pywb transition, part 1

* docs: add config change examples, exclusions and deploy recommendations

* update with path index example

* update terms with collection info

* docs update:
- add zipnum examples to owb-to-pywb config transition
- add working docker compose examples for nginx subdirectory, apache subdirectory and outback cdx deployment in ./sample-deploy
- update usage and owb-to-pywb deployment docs with updated subdiretory deployment info + sample-deploy links

* tweak exclusion info, deploy title

* add missing filee uwsgi_subdir.ini

* Docs: fix typos and clarifications from review (thanks @ldko!)

Co-authored-by: Lauren Ko <lauren.ko@unt.edu>

* docs: explain that existing cdx can be added to outbackcdx, explain reindexing is optional

* docs: elaborate on docker-compose examples

* minor tweaks

* update to latest wombat 3.0.2
* update CHANGES.rst

* bump version to 2.5.0 for release

Co-authored-by: Lauren Ko <lauren.ko@unt.edu>
2020-12-04 18:40:58 -08:00

43 lines
1.8 KiB
ReStructuredText

OpenWayback vs pywb Terms
=========================
pywb and OpenWayback use slightly different terms to describe the configuration options, as explained below.
Some differences are:
- The ``wayback.xml`` config file in OpenWayback is replaced with ``config.yaml`` yaml
- The terms ``Access Point`` and ``Wayback Collection`` are replaced with ``Collection`` in pywb. The collection configuration represents a unique path (access point) and the data that is accessed at that path.
- The ``Resource Store`` in OpenWayback is known in pywb as the archive paths, configured under ``archive_paths``
- The ``Resource Index`` in OpenWayback is known in pywb as the index paths, configurable under ``index_paths``
- The ``Exclusions`` in OpenWayback are replaced with general :ref:`access-control`
Pywb Collection Basics
----------------------
A pywb collection must consist of a minimum of three parts: the collection name, the ``index_paths`` (where to read the index), and the ``archive_paths`` (where to read the WARC files).
The collection is accessed by name, so there is no distinct access point.
The collections are configured in the ``config.yaml`` under the ``collections`` key:
For example, a basic collection definition can be specified via:
.. code:: yaml
collections:
wayback:
index_paths: /archive/cdx/
archive_paths: /archive/storage/warcs/
Pywb also supports a convention-based directory structure. Collections created in this structure can be detected automatically
and need not be specified in the ``config.yaml``. This structure is designed for smaller collections that are all stored locally in a subdirectory.
See the :ref:`dir_structure` for the default pywb directory structure.
However, for importing existing collections from OpenWayback, it is probably easier to specify the existing paths as shown above.