1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-24 06:59:52 +01:00

update subpackage READMEs

This commit is contained in:
Ilya Kreymer 2014-02-18 18:13:44 -08:00
parent a09dec4b3e
commit 7c1ac10d6f
4 changed files with 35 additions and 41 deletions

View File

@ -1,30 +1,20 @@
## PyWb CDX v0.2 ### pywb.cdx package
[![Build Status](https://travis-ci.org/ikreymer/pywb_cdx.png?branch=master)](https://travis-ci.org/ikreymer/pywb_cdx)
This package contains the CDX processing suite of the pywb wayback tool suite. This package contains the CDX processing suite of the pywb wayback tool suite.
The CDX Server loads, filters and transforms cdx from multiple sources in response The CDX Server loads, filters and transforms cdx from multiple sources in response
to a given query. to a given query.
### Installation and Tests #### Sample App
`pip install -r requirements` -- to install
`python run-tests.py` -- to run all tests
### Sample App
A very simple reference WSGI app is included. A very simple reference WSGI app is included.
Run: `python -m pywb_cdx.wsgi_cdxserver` to start the app, keyboard interrupt to stop. Run: `python -m pywb.cdx.wsgi_cdxserver` to start the app, keyboard interrupt to stop.
The default [config.yaml](pywb_cdx/config.yaml) points to the sample data directory The default [config.yaml](pywb_cdx/config.yaml) points to the sample data directory
and uses port 8080 and uses port 8080
### CDX Server API Reference #### CDX Server API Reference
Goal is to provide compatiblity with this feature set and more: Goal is to provide compatiblity with this feature set and more:
https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server

View File

@ -1,6 +1,4 @@
## PyWb Rewrite v0.2 ### pywb.rewrite
[![Build Status](https://travis-ci.org/ikreymer/pywb_rewrite.png?branch=master)](https://travis-ci.org/ikreymer/pywb_rewrite)
This package includes the content rewriting component of the pywb wayback tool suite. This package includes the content rewriting component of the pywb wayback tool suite.
@ -11,23 +9,19 @@ An additional domain-specific rewritin is planned, especially for JS, to allow f
replay of difficult pages. replay of difficult pages.
### Command-Line Rewriter #### Command-Line Rewriter
To enable easier testing of rewriting, this package includes a command-line rewriter To enable easier testing of rewriting, this package includes a command-line rewriter
which will fetch a live url and apply the registered rewriting rules to that url: which will fetch a live url and apply the registered rewriting rules to that url:
After installing with:
`pip install -r requirements.txt`
Run: Run:
`python ./pywb_rewrite/rewrite_live.py http://example.com` `python ./pywb.rewrite/rewrite_live.py http://example.com`
To specify custom timestamp and prefix: To specify custom timestamp and prefix:
``` ```
python ./pywb_rewrite/rewrite_live.py http://example.com /mycoll/20141026000102/http://mysite.example.com/path.html python ./pywb.rewrite/rewrite_live.py http://example.com /mycoll/20141026000102/http://mysite.example.com/path.html
``` ```
This will print to stdout the content of `http://example.com` with all urls rewritten relative to This will print to stdout the content of `http://example.com` with all urls rewritten relative to
@ -37,11 +31,12 @@ Headers are also rewritten, for further details, consult the `get_rewritten` fun
[pywb_rewrite/rewrite_live.py](pywb_rewrite/rewrite_live.py) [pywb_rewrite/rewrite_live.py](pywb_rewrite/rewrite_live.py)
### Tests #### Tests
Rewriting doctests as well as live rewriting tests (subject to change) are provided. Rewriting doctests as well as live rewriting tests (subject to change) are provided.
To run full test suite: `python run-tests.py`
pywb.rewrite is part of a full test suite that can be executed via
`python run-tests.py`

View File

@ -1,16 +1,17 @@
## PyWb Utils v0.2 ## ### pywb.utils
[![Build Status](https://travis-ci.org/ikreymer/pywb_utils.png?branch=master)](https://travis-ci.org/ikreymer/pywb_utils) This package contains a utils used by pywb wayback tool suite.
This is a standalone module contains a variety of utils used by pywb wayback tool suite.
`python run-tests.py` will run all tests
#### Modules #### Modules
[binsearch.py](pywb_utils/binsearch.py) -- Binary search implementation over text files * [binsearch.py](pywb.utils/binsearch.py) -- Binary search implementation over text files
[loaders.py](pywb_utils/loaders.py) -- Loading abstraction for http, local file system, as well as buffered and seekable file readers * [loaders.py](pywb.utils/loaders.py) -- Loading abstraction for loading via http or local file system.
[timeutils.py](pywb_utils/timeutils.py) -- Utility functions for converting between standard datetime formats 14-digit timestamp * [bufferedreaders.py](pywb.utils/bufferedreaders.py) -- Buffering wrappers for file-like object, also provide gzip decompression and
de-chunking facilities.
* [statusandheaders.py](pywb.utils/statusandheaders.py) -- Represent http status line + headers and parsing them out from a stream
* [timeutils.py](pywb.utils/timeutils.py) -- Utility functions for converting between standard datetime formats 14-digit timestamp

View File

@ -1,6 +1,4 @@
## PyWb Warc v0.2 ### pywb.warc
[![Build Status](https://travis-ci.org/ikreymer/pywb_warc.png?branch=master)](https://travis-ci.org/ikreymer/pywb_warc)
This is the WARC/ARC record loading component of pywb wayback tool suite. This is the WARC/ARC record loading component of pywb wayback tool suite.
@ -16,7 +14,17 @@ This package provides the following facilities:
### Tests ### Tests
This package will include a test suite for different WARC and ARC loading formats. This package will includes a test suite for loading a variety of WARC and ARC records.
To run: `python run-tests.py` Tests so far:
* Compressed WARC, ARC Records
* Uncompressed ARC Records
* Compressed WARC created by wget 1.14
* Same Url revisit record resolving
TODO:
* Different url revisit record resolving (TODO)
* File type detection (no .warc, .arc extensions)