1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-26 07:49:24 +01:00

4 Commits

Author SHA1 Message Date
Ilya Kreymer
22f1f78fca cdx: clean up filters, add '~' modifier for contains
rules: fix regex to be lazy not greedy, turn off unneeded custom
canonicalizer (need tests for custom canon)
cleanup fuzzy match query
fix data package in setup.py
2014-02-27 18:22:10 +00:00
Ilya Kreymer
453ab678ed refactor domain specific rules:
- head insert callback passed in with rule, up to template
to handle additional inserts based on rule properties
- ability to pass in custom rules config to both cdx server
and content rewriter
- move canonicalize to utils pkg
- add wombat, modify wb.js to remove wombat-related settings
2014-02-26 22:04:37 -08:00
Ilya Kreymer
5a41f59f39 new unified config system, via rules.yaml!
contains configs for cdx canon, fuzzy matching and rewriting!
rewriting: ability to add custom regexs per domain
also, ability to toggle js rewriting and custom rewriting file
(default is wombat.js)
2014-02-26 18:02:01 -08:00
Ilya Kreymer
a09dec4b3e cdx: add domain-specific rules at cdx layer for custom canonicalization!
and 'fuzzy' matching when not found
handled via cdxdomainspecific.py
BaseCDXServer contains a canonicalizer object and a fuzzy query
canonicalizer abstracted to seperate class (in canonicalizer.py)
clean up cdx related exceptions
default rules read from cdx/rules.yaml
filename configurable via 'domain_specific_rules' setting in config.yaml
fix typo in pywb/rewrite
2014-02-18 14:56:13 -08:00