1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-28 08:32:29 +01:00
Ilya Kreymer a09dec4b3e cdx: add domain-specific rules at cdx layer for custom canonicalization!
and 'fuzzy' matching when not found
handled via cdxdomainspecific.py
BaseCDXServer contains a canonicalizer object and a fuzzy query
canonicalizer abstracted to seperate class (in canonicalizer.py)
clean up cdx related exceptions
default rules read from cdx/rules.yaml
filename configurable via 'domain_specific_rules' setting in config.yaml
fix typo in pywb/rewrite
2014-02-18 14:56:13 -08:00
..
2014-02-17 10:01:09 -08:00
2014-02-17 10:01:09 -08:00
2014-02-17 10:01:09 -08:00

PyWb CDX v0.2

Build Status

This package contains the CDX processing suite of the pywb wayback tool suite.

The CDX Server loads, filters and transforms cdx from multiple sources in response to a given query.

Installation and Tests

pip install -r requirements -- to install

python run-tests.py -- to run all tests

Sample App

A very simple reference WSGI app is included.

Run: python -m pywb_cdx.wsgi_cdxserver to start the app, keyboard interrupt to stop.

The default config.yaml points to the sample data directory and uses port 8080

CDX Server API Reference

Goal is to provide compatiblity with this feature set and more: https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server

TODO