mirror of
https://github.com/webrecorder/pywb.git
synced 2025-03-15 00:03:28 +01:00
Update usage docs section on creating web archives (#899)
Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics>
This commit is contained in:
parent
2fd6190b72
commit
5712945991
@ -154,20 +154,20 @@ To enable auto-indexing, run with ``wayback -a`` or ``wayback -a --auto-interval
|
|||||||
Creating a Web Archive
|
Creating a Web Archive
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
Using Webrecorder
|
Using ArchiveWeb.page
|
||||||
^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
If you do not have a web archive to test, one easy way to create one is to use `Webrecorder <https://webrecorder.io>`_
|
If you do not have a web archive to test, one easy way to create one is to use the `ArchiveWeb.page <https://archiveweb.page>`_ browser extension for Chrome and other Chromium-based browsers such as Brave Browser. ArchiveWeb.page records pages visited during an archiving session in the browser, and provides means of both replaying and downloading the archived items created.
|
||||||
|
|
||||||
After recording, you can click **Stop** and then click `Download Collection` to receive a WARC (`.warc.gz`) file.
|
Follow the instructions in `How To Create Web Archives with ArchiveWeb.page <https://archiveweb.page/en/usage/>`_. After recording, press **Stop** and then `download your collection <https://archiveweb.page/en/download/>`_ to receive a WARC (`.warc.gz`) file. If you choose to download your collection in the WACZ format, the WARC files can be found inside the zipped WACZ in the ``archive/`` directory.
|
||||||
|
|
||||||
You can then use this with work with pywb.
|
You can then use your WARCs to work with pywb.
|
||||||
|
|
||||||
|
|
||||||
Using pywb Recorder
|
Using pywb Recorder
|
||||||
^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The core recording functionality in Webrecorder is also part of :mod:`pywb`. If you want to create a WARC locally, this can be
|
Recording functionality is also part of :mod:`pywb`. If you want to create a WARC locally, this can be
|
||||||
done by directly recording into your pywb collection:
|
done by directly recording into your pywb collection:
|
||||||
|
|
||||||
1. Create a collection: ``wb-manager init my-web-archive`` (if you haven't already created a web archive collection)
|
1. Create a collection: ``wb-manager init my-web-archive`` (if you haven't already created a web archive collection)
|
||||||
@ -180,6 +180,14 @@ In this configuration, the indexing happens every 10 seconds.. After 10 seconds,
|
|||||||
``http://localhost:8080/my-web-archive/http://example.com/``
|
``http://localhost:8080/my-web-archive/http://example.com/``
|
||||||
|
|
||||||
|
|
||||||
|
Using Browsertrix
|
||||||
|
^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
For a more automated browser-based web archiving experience, `Browsertrix <https://browsertrix.com/>`_ provides a web interface for configuring, scheduling, running, reviewing, and curating crawls of web content. Crawl activity is shown in a live screencast of the browsers used for crawling and all web archives created in Browsertrix can be easily downloaded from the application in the WACZ format.
|
||||||
|
|
||||||
|
`Browsertrix Crawler <https://crawler.docs.browsertrix.com/>`_, which provides the underlying crawling functionality of Browsertrix, can also be run standalone in a Docker container on your local computer.
|
||||||
|
|
||||||
|
|
||||||
HTTP/S Proxy Mode Access
|
HTTP/S Proxy Mode Access
|
||||||
------------------------
|
------------------------
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user