1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 00:03:28 +01:00
pywb/README.md

105 lines
2.3 KiB
Markdown
Raw Normal View History

2014-01-29 01:36:31 -08:00
PyWb 0.1 Beta
2014-01-23 16:30:37 -08:00
==============
2013-12-18 18:57:55 -08:00
2014-01-23 16:30:37 -08:00
[![Build Status](https://travis-ci.org/ikreymer/pywb.png?branch=master)](https://travis-ci.org/ikreymer/pywb)
2014-01-29 01:36:31 -08:00
pywb is a Python implementation of the Wayback Machine software.
2014-01-23 16:30:37 -08:00
2014-01-29 01:36:31 -08:00
Some goals are to:
2013-12-18 18:57:55 -08:00
2014-01-29 01:36:31 -08:00
* Provide the best possible playback of archival web content (usually in WARC or ARC files)
2014-01-29 01:36:31 -08:00
* Be highly customizable in rewriting content to provide best possible playback experience
2014-01-29 01:36:31 -08:00
* Provide a pluggable, optional ui
* Be easy to deploy and hack
The Wayback Machine usually serves archival content in the following form:
`http://<host>/<collection>/<timestamp>/<original url>`
2014-01-04 06:12:27 +00:00
2014-01-29 01:36:31 -08:00
Ex: The [Internet Archive Wayback Machine][1] has urls of the form:
`http://web.archive.org/web/20131015120316/http://archive.org/`
2014-01-23 16:30:37 -08:00
2014-01-29 01:36:31 -08:00
A listing of archived content, often in calendar form, is available when a `*` is used instead of timestamp.
2014-01-23 16:30:37 -08:00
2014-01-29 01:36:31 -08:00
pywb uses this interface as a starting point.
2014-01-23 16:30:37 -08:00
2014-01-29 01:36:31 -08:00
### Requirements
2014-01-23 16:30:37 -08:00
2014-01-29 01:36:31 -08:00
pywb currently works best with 2.7.x
It should run in a standard WSGI container, although currently
tested primarily with uWSGI 1.9 and 2.0
2014-01-29 01:36:31 -08:00
Support for other versions of Python 3 is planned.
2014-01-29 01:36:31 -08:00
### Installation
2014-01-29 01:36:31 -08:00
pywb comes with sample archived content, also used
for unit testing the app.
2014-01-29 01:36:31 -08:00
The data can be found in `sample_archive` and contains
`warc` and `cdx` files. The sample archive contains
recent captures from `http://example.com` and `http://iana.org`
2014-01-29 01:36:31 -08:00
To start a pywb with sample data
2014-01-29 01:36:31 -08:00
- Clone this repo
2014-01-29 01:36:31 -08:00
- Install with `python setup.py install`
2014-01-29 01:36:31 -08:00
- Run Start with `run.sh`
2014-01-29 01:36:31 -08:00
- Set your browser to `localhost:8080/pywb/example.com` or `localhost:8080/pywb/iana.org`
to see pywb rendering the sample archive data
2014-01-29 01:36:31 -08:00
### Sample Setup
2014-01-29 01:36:31 -08:00
pywb is currently configurable via yaml.
2014-01-29 01:36:31 -08:00
The simplest [config.yaml](config.yaml) is roughly as follows:
2014-01-29 01:36:31 -08:00
``` yaml
2014-01-24 01:17:18 -08:00
2014-01-29 01:36:31 -08:00
routes:
- name: pywb
2014-01-24 01:17:18 -08:00
2014-01-29 01:36:31 -08:00
index_paths:
- ./sample_archive/cdx/
2014-01-24 01:17:18 -08:00
2014-01-29 01:36:31 -08:00
archive_paths:
- ./sample_archive/warcs/
2014-01-24 01:17:18 -08:00
2014-01-29 01:36:31 -08:00
head_insert_html_template: ./ui/head_insert.html
2014-01-29 01:36:31 -08:00
calendar_html_template: ./ui/query.html
2014-01-29 01:36:31 -08:00
hostpaths: ['http://localhost:8080/']
```
2014-01-29 01:36:31 -08:00
(Refer to [full version of config.yaml](config.yaml) for additional documentation)
2014-01-29 01:36:31 -08:00
The `PYWB_CONFIG` env can be used to set a different file
The `PYWB_CONFIG_MODULE` env variable can be used to set a different init module
See `run.sh` for more details
2014-01-04 05:55:17 +00:00
2014-01-29 01:36:31 -08:00
[1]: https://archive.org/web/