1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 00:03:28 +01:00

update subpackage READMEs

This commit is contained in:
Ilya Kreymer 2014-02-18 18:13:44 -08:00
parent a09dec4b3e
commit 7c1ac10d6f
4 changed files with 35 additions and 41 deletions

View File

@ -1,30 +1,20 @@
## PyWb CDX v0.2
[![Build Status](https://travis-ci.org/ikreymer/pywb_cdx.png?branch=master)](https://travis-ci.org/ikreymer/pywb_cdx)
### pywb.cdx package
This package contains the CDX processing suite of the pywb wayback tool suite.
The CDX Server loads, filters and transforms cdx from multiple sources in response
to a given query.
### Installation and Tests
`pip install -r requirements` -- to install
`python run-tests.py` -- to run all tests
### Sample App
#### Sample App
A very simple reference WSGI app is included.
Run: `python -m pywb_cdx.wsgi_cdxserver` to start the app, keyboard interrupt to stop.
Run: `python -m pywb.cdx.wsgi_cdxserver` to start the app, keyboard interrupt to stop.
The default [config.yaml](pywb_cdx/config.yaml) points to the sample data directory
and uses port 8080
### CDX Server API Reference
#### CDX Server API Reference
Goal is to provide compatiblity with this feature set and more:
https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server

View File

@ -1,6 +1,4 @@
## PyWb Rewrite v0.2
[![Build Status](https://travis-ci.org/ikreymer/pywb_rewrite.png?branch=master)](https://travis-ci.org/ikreymer/pywb_rewrite)
### pywb.rewrite
This package includes the content rewriting component of the pywb wayback tool suite.
@ -11,23 +9,19 @@ An additional domain-specific rewritin is planned, especially for JS, to allow f
replay of difficult pages.
### Command-Line Rewriter
#### Command-Line Rewriter
To enable easier testing of rewriting, this package includes a command-line rewriter
which will fetch a live url and apply the registered rewriting rules to that url:
After installing with:
`pip install -r requirements.txt`
Run:
`python ./pywb_rewrite/rewrite_live.py http://example.com`
`python ./pywb.rewrite/rewrite_live.py http://example.com`
To specify custom timestamp and prefix:
```
python ./pywb_rewrite/rewrite_live.py http://example.com /mycoll/20141026000102/http://mysite.example.com/path.html
python ./pywb.rewrite/rewrite_live.py http://example.com /mycoll/20141026000102/http://mysite.example.com/path.html
```
This will print to stdout the content of `http://example.com` with all urls rewritten relative to
@ -37,11 +31,12 @@ Headers are also rewritten, for further details, consult the `get_rewritten` fun
[pywb_rewrite/rewrite_live.py](pywb_rewrite/rewrite_live.py)
### Tests
#### Tests
Rewriting doctests as well as live rewriting tests (subject to change) are provided.
To run full test suite: `python run-tests.py`
pywb.rewrite is part of a full test suite that can be executed via
`python run-tests.py`

View File

@ -1,16 +1,17 @@
## PyWb Utils v0.2 ##
### pywb.utils
[![Build Status](https://travis-ci.org/ikreymer/pywb_utils.png?branch=master)](https://travis-ci.org/ikreymer/pywb_utils)
This is a standalone module contains a variety of utils used by pywb wayback tool suite.
`python run-tests.py` will run all tests
This package contains a utils used by pywb wayback tool suite.
#### Modules
[binsearch.py](pywb_utils/binsearch.py) -- Binary search implementation over text files
* [binsearch.py](pywb.utils/binsearch.py) -- Binary search implementation over text files
[loaders.py](pywb_utils/loaders.py) -- Loading abstraction for http, local file system, as well as buffered and seekable file readers
* [loaders.py](pywb.utils/loaders.py) -- Loading abstraction for loading via http or local file system.
[timeutils.py](pywb_utils/timeutils.py) -- Utility functions for converting between standard datetime formats 14-digit timestamp
* [bufferedreaders.py](pywb.utils/bufferedreaders.py) -- Buffering wrappers for file-like object, also provide gzip decompression and
de-chunking facilities.
* [statusandheaders.py](pywb.utils/statusandheaders.py) -- Represent http status line + headers and parsing them out from a stream
* [timeutils.py](pywb.utils/timeutils.py) -- Utility functions for converting between standard datetime formats 14-digit timestamp

View File

@ -1,6 +1,4 @@
## PyWb Warc v0.2
[![Build Status](https://travis-ci.org/ikreymer/pywb_warc.png?branch=master)](https://travis-ci.org/ikreymer/pywb_warc)
### pywb.warc
This is the WARC/ARC record loading component of pywb wayback tool suite.
@ -16,7 +14,17 @@ This package provides the following facilities:
### Tests
This package will include a test suite for different WARC and ARC loading formats.
This package will includes a test suite for loading a variety of WARC and ARC records.
To run: `python run-tests.py`
Tests so far:
* Compressed WARC, ARC Records
* Uncompressed ARC Records
* Compressed WARC created by wget 1.14
* Same Url revisit record resolving
TODO:
* Different url revisit record resolving (TODO)
* File type detection (no .warc, .arc extensions)