From 7c1ac10d6f00ba66f58a551c383f28f5699c2942 Mon Sep 17 00:00:00 2001 From: Ilya Kreymer Date: Tue, 18 Feb 2014 18:13:44 -0800 Subject: [PATCH] update subpackage READMEs --- pywb/cdx/README.md | 18 ++++-------------- pywb/rewrite/README.md | 21 ++++++++------------- pywb/utils/README.md | 19 ++++++++++--------- pywb/warc/README.md | 18 +++++++++++++----- 4 files changed, 35 insertions(+), 41 deletions(-) diff --git a/pywb/cdx/README.md b/pywb/cdx/README.md index 26a41eb1..71737c5b 100644 --- a/pywb/cdx/README.md +++ b/pywb/cdx/README.md @@ -1,30 +1,20 @@ -## PyWb CDX v0.2 - -[![Build Status](https://travis-ci.org/ikreymer/pywb_cdx.png?branch=master)](https://travis-ci.org/ikreymer/pywb_cdx) - +### pywb.cdx package This package contains the CDX processing suite of the pywb wayback tool suite. The CDX Server loads, filters and transforms cdx from multiple sources in response to a given query. -### Installation and Tests - -`pip install -r requirements` -- to install - -`python run-tests.py` -- to run all tests - - -### Sample App +#### Sample App A very simple reference WSGI app is included. -Run: `python -m pywb_cdx.wsgi_cdxserver` to start the app, keyboard interrupt to stop. +Run: `python -m pywb.cdx.wsgi_cdxserver` to start the app, keyboard interrupt to stop. The default [config.yaml](pywb_cdx/config.yaml) points to the sample data directory and uses port 8080 -### CDX Server API Reference +#### CDX Server API Reference Goal is to provide compatiblity with this feature set and more: https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server diff --git a/pywb/rewrite/README.md b/pywb/rewrite/README.md index dc658ea9..0e459ce0 100644 --- a/pywb/rewrite/README.md +++ b/pywb/rewrite/README.md @@ -1,6 +1,4 @@ -## PyWb Rewrite v0.2 - -[![Build Status](https://travis-ci.org/ikreymer/pywb_rewrite.png?branch=master)](https://travis-ci.org/ikreymer/pywb_rewrite) +### pywb.rewrite This package includes the content rewriting component of the pywb wayback tool suite. @@ -11,23 +9,19 @@ An additional domain-specific rewritin is planned, especially for JS, to allow f replay of difficult pages. -### Command-Line Rewriter +#### Command-Line Rewriter To enable easier testing of rewriting, this package includes a command-line rewriter which will fetch a live url and apply the registered rewriting rules to that url: -After installing with: - -`pip install -r requirements.txt` - Run: -`python ./pywb_rewrite/rewrite_live.py http://example.com` +`python ./pywb.rewrite/rewrite_live.py http://example.com` To specify custom timestamp and prefix: ``` -python ./pywb_rewrite/rewrite_live.py http://example.com /mycoll/20141026000102/http://mysite.example.com/path.html +python ./pywb.rewrite/rewrite_live.py http://example.com /mycoll/20141026000102/http://mysite.example.com/path.html ``` This will print to stdout the content of `http://example.com` with all urls rewritten relative to @@ -37,11 +31,12 @@ Headers are also rewritten, for further details, consult the `get_rewritten` fun [pywb_rewrite/rewrite_live.py](pywb_rewrite/rewrite_live.py) -### Tests +#### Tests Rewriting doctests as well as live rewriting tests (subject to change) are provided. -To run full test suite: `python run-tests.py` - + +pywb.rewrite is part of a full test suite that can be executed via +`python run-tests.py` diff --git a/pywb/utils/README.md b/pywb/utils/README.md index 35ebca86..b244efb8 100644 --- a/pywb/utils/README.md +++ b/pywb/utils/README.md @@ -1,16 +1,17 @@ -## PyWb Utils v0.2 ## +### pywb.utils -[![Build Status](https://travis-ci.org/ikreymer/pywb_utils.png?branch=master)](https://travis-ci.org/ikreymer/pywb_utils) - -This is a standalone module contains a variety of utils used by pywb wayback tool suite. - -`python run-tests.py` will run all tests +This package contains a utils used by pywb wayback tool suite. #### Modules -[binsearch.py](pywb_utils/binsearch.py) -- Binary search implementation over text files +* [binsearch.py](pywb.utils/binsearch.py) -- Binary search implementation over text files -[loaders.py](pywb_utils/loaders.py) -- Loading abstraction for http, local file system, as well as buffered and seekable file readers +* [loaders.py](pywb.utils/loaders.py) -- Loading abstraction for loading via http or local file system. -[timeutils.py](pywb_utils/timeutils.py) -- Utility functions for converting between standard datetime formats 14-digit timestamp +* [bufferedreaders.py](pywb.utils/bufferedreaders.py) -- Buffering wrappers for file-like object, also provide gzip decompression and +de-chunking facilities. + +* [statusandheaders.py](pywb.utils/statusandheaders.py) -- Represent http status line + headers and parsing them out from a stream + +* [timeutils.py](pywb.utils/timeutils.py) -- Utility functions for converting between standard datetime formats 14-digit timestamp diff --git a/pywb/warc/README.md b/pywb/warc/README.md index fe6bf216..f3a4bad4 100644 --- a/pywb/warc/README.md +++ b/pywb/warc/README.md @@ -1,6 +1,4 @@ -## PyWb Warc v0.2 - -[![Build Status](https://travis-ci.org/ikreymer/pywb_warc.png?branch=master)](https://travis-ci.org/ikreymer/pywb_warc) +### pywb.warc This is the WARC/ARC record loading component of pywb wayback tool suite. @@ -16,7 +14,17 @@ This package provides the following facilities: ### Tests -This package will include a test suite for different WARC and ARC loading formats. +This package will includes a test suite for loading a variety of WARC and ARC records. -To run: `python run-tests.py` +Tests so far: +* Compressed WARC, ARC Records +* Uncompressed ARC Records +* Compressed WARC created by wget 1.14 +* Same Url revisit record resolving + + +TODO: + +* Different url revisit record resolving (TODO) +* File type detection (no .warc, .arc extensions)