mirror of
https://github.com/webrecorder/pywb.git
synced 2025-03-24 06:59:52 +01:00
More README tweaks
This commit is contained in:
parent
e2623ed149
commit
4cfeb6d958
48
README.rst
48
README.rst
@ -11,7 +11,7 @@ PyWb 0.9.0 Beta
|
|||||||
pywb is a python implementation of web archival replay tools, sometimes also known as 'Wayback Machine'.
|
pywb is a python implementation of web archival replay tools, sometimes also known as 'Wayback Machine'.
|
||||||
|
|
||||||
pywb allows high-quality replay (browsing) of archived web data stored in standardized `ARC <http://en.wikipedia.org/wiki/ARC_(file_format)>`_ and `WARC <http://en.wikipedia.org/wiki/Web_ARChive>`_,
|
pywb allows high-quality replay (browsing) of archived web data stored in standardized `ARC <http://en.wikipedia.org/wiki/ARC_(file_format)>`_ and `WARC <http://en.wikipedia.org/wiki/Web_ARChive>`_,
|
||||||
and it can also server as a rewriting proxy to live web content.
|
and it can also serve as a customizable rewriting proxy to live web content.
|
||||||
|
|
||||||
The replay system is designed to accurately replay complex dynamic sites, including `video and audio content <https://github.com/ikreymer/pywb/wiki/Video-Replay-and-Recording>`_ and sites
|
The replay system is designed to accurately replay complex dynamic sites, including `video and audio content <https://github.com/ikreymer/pywb/wiki/Video-Replay-and-Recording>`_ and sites
|
||||||
with complex JavaScript.
|
with complex JavaScript.
|
||||||
@ -32,31 +32,32 @@ A new utility, ``wb-manager`` performs the most common collection management tas
|
|||||||
Archive a Web Page
|
Archive a Web Page
|
||||||
""""""""""""""""""
|
""""""""""""""""""
|
||||||
|
|
||||||
If you do not have any web archive files, you can create easiely record one from any page by using the free
|
If you do not have any web archive files (WARCS), you can create easiely create one from any page by using the free
|
||||||
https://webrecorder.io/ service (also powered by pywb).
|
https://webrecorder.io/ service
|
||||||
|
|
||||||
For example, you may visit https://webrecorder.io/record/http://example.com, then (after a few seconds),
|
For example, you may visit https://webrecorder.io/record/http://example.com, then (after a few seconds),
|
||||||
click "Download -> Web Archive (WARC)" to get the WARC file (.warc.gz)
|
click *Download -> Web Archive (WARC)* to get the WARC file (.warc.gz)
|
||||||
|
|
||||||
|
|
||||||
Create a new Collection
|
Create a new Collection
|
||||||
"""""""""""""""""""""""
|
"""""""""""""""""""""""
|
||||||
If you have an existing WARC/ARC file(s), you can set up a quick collection as follows, including installing
|
|
||||||
|
Once you have an existing WARC/ARC file(s), you can set up a quick collection as follows, including installing
|
||||||
pywb:
|
pywb:
|
||||||
|
|
||||||
```
|
::
|
||||||
pip install pywb==0.9.0b2
|
|
||||||
wb-manager init my_coll
|
|
||||||
wb-manager add my_coll <path/to/warc>
|
|
||||||
wayback
|
|
||||||
```
|
|
||||||
|
|
||||||
Point your browser to ``http://localhost:8080/my_coll/<url>/`` where ``<url>`` is a url in your WARC/ARC file.
|
pip install pywb==0.9.0b2
|
||||||
|
wb-manager init my_coll
|
||||||
|
wb-manager add my_coll <path/to/warc>
|
||||||
|
wayback
|
||||||
|
|
||||||
(If you just recorded ``http://example.com/``, you should be able to view ``http://localhost:8080/my_coll/http://example.com/``)
|
|
||||||
|
Point your browser to ``http://localhost:8080/my_coll/<url>/`` where ``<url>`` is a url in your WARC/ARC file. (If you just recorded ``http://example.com/``, you should be able to view ``http://localhost:8080/my_coll/http://example.com/``)
|
||||||
|
|
||||||
If all worked well, you should see replay of ``<url>``. Congrats, you are now running your own web archive!
|
If all worked well, you should see replay of ``<url>``. Congrats, you are now running your own web archive!
|
||||||
|
|
||||||
|
|
||||||
A more `detailed tutorial is available on the wiki <https://github.com/ikreymer/pywb/wiki/Auto-Configuration-and-Wayback-Collections-Manager>`_.
|
A more `detailed tutorial is available on the wiki <https://github.com/ikreymer/pywb/wiki/Auto-Configuration-and-Wayback-Collections-Manager>`_.
|
||||||
|
|
||||||
|
|
||||||
@ -176,16 +177,18 @@ For more info, see `Proxy Mode Usage <https://github.com/ikreymer/pywb/wiki/Pywb
|
|||||||
The `pywb-proxy-demo <https://github.com/ikreymer/pywb-proxy-demo>`_ project also contains a working configuration of proxy mode deployment.
|
The `pywb-proxy-demo <https://github.com/ikreymer/pywb-proxy-demo>`_ project also contains a working configuration of proxy mode deployment.
|
||||||
|
|
||||||
|
|
||||||
Running with WSGI
|
Running with any WSGI Container
|
||||||
"""""""""""""""""
|
"""""""""""""""""""""""""""""""
|
||||||
|
|
||||||
The command-line ``wayback`` utility starts pywb using the waitress WSGI server by default. It is sufficient for basic usage and testing.
|
The command-line ``wayback`` utility starts pywb using the `waitress <>`_ server. This should be sufficient for basic usage and testing.
|
||||||
|
|
||||||
However, pywb can be configured to run with any standard WSGI container/server, using ``application`` in ``pywb.apps.wayback`` module as the entry point.
|
However, since pywb conforms to the Python `WSGI <http://wsgi.readthedocs.org/en/latest/>`_ specification, it can be run with any standard WSGI container/server
|
||||||
|
and can be embedded in larger applications.
|
||||||
|
|
||||||
The `uWSGI <https://uwsgi-docs.readthedocs.org/en/latest/>`_ is recommended for most production deployments.
|
When running with a different container, specify ``pywb.apps.wayback`` as the WSGI application module.
|
||||||
|
|
||||||
The ``uwsgi.ini and ``run-uwsgi.sh`` scripts in this repo provides examples of running pywb with uWSGI.
|
For production deployments, `uWSGI <https://uwsgi-docs.readthedocs.org/en/latest/>`_ with gevent is the recommended container and the ``uwsgi.ini and ``run-uwsgi.sh``
|
||||||
|
scripts in this repo provides examples of running pywb with uWSGI.
|
||||||
|
|
||||||
|
|
||||||
Custom UI and User Metadata
|
Custom UI and User Metadata
|
||||||
@ -209,13 +212,14 @@ and `UI Customization <https://github.com/ikreymer/pywb/wiki/UI-Customization>`_
|
|||||||
Automatic Indexing
|
Automatic Indexing
|
||||||
""""""""""""""""""
|
""""""""""""""""""
|
||||||
|
|
||||||
pywb now also includes a new (still experimental) automatic indexing of any web archive files (WARC or ARC).
|
pywb now also includes support for automatic indexing of any web archive files (WARC or ARC).
|
||||||
Whenever a WARC or ARC file is added or changed, pywb will update the internal index automatically and make the archived content
|
|
||||||
|
Whenever a WARC/ARC file is added or changed, pywb will update the internal index automatically and make the archived content
|
||||||
instantly available for replay, without manual intervention or restart. (Of course, indexing will take some time if adding
|
instantly available for replay, without manual intervention or restart. (Of course, indexing will take some time if adding
|
||||||
many gigabytes of data all at once, but is quite useful for smaller archive updates).
|
many gigabytes of data all at once, but is quite useful for smaller archive updates).
|
||||||
|
|
||||||
To enable auto-indexing, you can run the `wayback -a` when running command line, or run
|
To enable auto-indexing, you can run the `wayback -a` when running command line, or run
|
||||||
`wb-manager autoindex <path/to/coll>` seperately.
|
`wb-manager autoindex <path/to/coll>` as a seperate program.
|
||||||
|
|
||||||
|
|
||||||
About Wayback Machine
|
About Wayback Machine
|
||||||
|
Loading…
x
Reference in New Issue
Block a user