1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

1818 Commits

Author SHA1 Message Date
Kenji Nagahashi
9eda5ad97e address test cases broken by previous commit.
move py.test fixture and fixture classes (TestExclusionPerms, PrintReporter)
  to tests.fixture module. update test_config.yaml accordingly.
2014-02-28 01:39:04 +00:00
Ilya Kreymer
bff39626b5 add first set of zipnum tests #17
still need to test timed reload, multi sources
2014-02-27 12:33:11 -08:00
Ilya Kreymer
7863b2bade add sample data for zipnum #17 2014-02-27 20:10:44 +00:00
Ilya Kreymer
22f1f78fca cdx: clean up filters, add '~' modifier for contains
rules: fix regex to be lazy not greedy, turn off unneeded custom
canonicalizer (need tests for custom canon)
cleanup fuzzy match query
fix data package in setup.py
2014-02-27 18:22:10 +00:00
Ilya Kreymer
453ab678ed refactor domain specific rules:
- head insert callback passed in with rule, up to template
to handle additional inserts based on rule properties
- ability to pass in custom rules config to both cdx server
and content rewriter
- move canonicalize to utils pkg
- add wombat, modify wb.js to remove wombat-related settings
2014-02-26 22:04:37 -08:00
Ilya Kreymer
5a41f59f39 new unified config system, via rules.yaml!
contains configs for cdx canon, fuzzy matching and rewriting!
rewriting: ability to add custom regexs per domain
also, ability to toggle js rewriting and custom rewriting file
(default is wombat.js)
2014-02-26 18:02:01 -08:00
Kenji Nagahashi
2c40c9b112 refactor cdxserver, add tests focused on wsgi_cdxserver, add docstrings.
align cdxops function interfaces - all cdx_iter.
  move module functions / common ops to class methods
  support both 0/1 and true/false for boolean parameters
  move CDXObject to text conversion to wsgi_cdxserver (may have broken
    embedded cdxserver mode).
  pass config object as function arg rather than as global var.
2014-02-27 01:58:07 +00:00
Ilya Kreymer
349a1a7a3a add unit test to timeutils.py
tweak .travis.yml
2014-02-25 15:30:16 -08:00
Kenji Nagahashi
14f4b4d26e Merge branch 'master' into cdx-server 2014-02-25 23:14:15 +00:00
Ilya Kreymer
b5d8accd1d trying different coveralls 2014-02-25 00:19:21 -08:00
Ilya Kreymer
7b9bc1ee3d fix .travis.yml 2014-02-24 23:45:37 -08:00
Ilya Kreymer
bf3a373e7e testing coveralls 2014-02-24 23:43:32 -08:00
Ilya Kreymer
f24b2e7767 fix typo from merge 2014-02-24 23:40:32 -08:00
Ilya Kreymer
3cd7b6b209 Merge branch 'master' into pkg-reorg 2014-02-24 21:33:11 -08:00
Ilya Kreymer
d702b299ae wburl: split into BaseWbUrl and WbUrl for better extensibility 2014-02-24 21:30:38 -08:00
Ilya Kreymer
21e885b78a statusandheaders: add support for header line continuations with space/tab
add basic unit test for statusandheaders
2014-02-24 21:14:10 -08:00
Ilya Kreymer
7968f360ce timeutils: timestamp_to_datetime() uses custom timestamp parsing
instead of strptime to automatically clamp timestamp to allowed
range (instead of erroring) on invalid timestamps.
returns datetime.datetime as advertised instead of struct_time as well
2014-02-24 16:30:11 -08:00
Ilya Kreymer
a474335501 fix missing param, typo 2014-02-24 19:42:37 +00:00
Ilya Kreymer
ef062fee7b cdx: add prototype support for redis cdx source (need testing) 2014-02-24 11:05:48 -08:00
Ilya Kreymer
51d61a8738 package reorg!
split up remaining parts of pywb root pkg
into core, dispatch and bootstrap
2014-02-24 03:00:01 -08:00
Ilya Kreymer
9194e867ea - add referrer self-redirect check and test case
- dispatching: cleanup wbrequestresponse, move tests to a seperate file
- wbrequest: store both rel_prefix and host_prefix, with wb_prefix either full
or rel path as needed, so that full and relative paths are
both available in wbrequest
- create WbUrlHandler to differentiate handlers which
support WbUrl (timestamp[mod]/url) semantic vs other request handlers.
2014-02-23 23:31:54 -08:00
Ilya Kreymer
a4f1224d16 zipnum: add file mtime check to location loading #17 2014-02-22 18:28:54 -08:00
Ilya Kreymer
d8d7435d77 add zipnum location reloading support
default to 10 min interval #17
2014-02-22 16:49:37 -08:00
Ilya Kreymer
1754f15831 Combine FileLoader/HttpLoader into a single BlockLoader which
delegates based on scheme
2014-02-22 16:49:26 -08:00
Ilya Kreymer
434fd23a95 optimize zipnum to support loading multiple continuous blocks,
decompressing each one individually. #17
2014-02-22 10:50:30 -08:00
Ilya Kreymer
8e840ccaaf zipnum first version! #17
split binsearch further into binsearch and linearsearch components
reading blocks one at a time currently, due to zlib decompress limitations
fix bufferedreader.readline() and fileloader bugs
2014-02-22 10:50:03 -08:00
Ilya Kreymer
a56cbcf62e binsearch: add range based matching via iter_range()
support for: exact, prefix, host, domain match types
2014-02-20 21:21:12 -08:00
Ilya Kreymer
922917a631 rename BufferedReader -> DecompressingBufferedReader
remove max_len from DecompressingBufferedReader as it applied to
the compressed size, not original size.
Add integration test for verifying content length of larger file
2014-02-20 11:53:08 -08:00
Kenji Nagahashi
bb87d98b73 Merge remote-tracking branch 'origin/master' into cdx-server 2014-02-20 18:10:51 +00:00
Ilya Kreymer
433b150542 Merge branch 'master' into perms-work 2014-02-20 09:22:13 -08:00
Ilya Kreymer
0cd6588a1d Add exclusions support #24
exclusions: add AllAllowPerms and refactor exclusions interface
add TestExclusionPerms and a sample exclusion integration test
refactor cdx server init params into **kwargs
convert all cdx params to use camelCase
2014-02-20 09:18:10 -08:00
Ilya Kreymer
4c96993411 fix missed param conversion 2014-02-20 09:14:27 -08:00
Kenji Nagahashi
0b768ce11a Merge remote-tracking branch 'origin/perms-work' into cdx-server 2014-02-20 10:05:07 +00:00
Kenji Nagahashi
79eb3be44f rewrite wsgi_cdxserver with werkzeug
use pkg_resources instead of pkgutil because pkgutil breaks with auto-reload.
add --port command line option.
2014-02-20 09:58:08 +00:00
Ilya Kreymer
ff428ed43e exclusions: add AllAllowPerms and refactor exclusions interface
add TestExclusionPerms and a sample exclusion integration test
refactor cdx server init params into **kwargs
convert all cdx params to use camelCase
2014-02-19 20:20:31 -08:00
Ilya Kreymer
be284859be sample perms addition to cdx ops 2014-02-19 17:52:13 -08:00
Kenji Nagahashi
d0229b6b2d cleanup setup.py indent for ease of add/remove things. also use find_package(). 2014-02-19 23:37:44 +00:00
Ilya Kreymer
531464902f add uncompressed warc 2014-02-19 00:14:23 -08:00
Ilya Kreymer
312bd71568 automatic record (warc/arc) format detection and decompression if needed.
no need to rely on file type listing
2014-02-19 00:13:15 -08:00
Ilya Kreymer
84e0121aa5 fixup READMEs, add domain-specific rules to cdx sample app 2014-02-18 18:18:46 -08:00
Ilya Kreymer
7c1ac10d6f update subpackage READMEs 2014-02-18 18:13:44 -08:00
Ilya Kreymer
a09dec4b3e cdx: add domain-specific rules at cdx layer for custom canonicalization!
and 'fuzzy' matching when not found
handled via cdxdomainspecific.py
BaseCDXServer contains a canonicalizer object and a fuzzy query
canonicalizer abstracted to seperate class (in canonicalizer.py)
clean up cdx related exceptions
default rules read from cdx/rules.yaml
filename configurable via 'domain_specific_rules' setting in config.yaml
fix typo in pywb/rewrite
2014-02-18 14:56:13 -08:00
Ilya Kreymer
ab95524b7b update README for 0.2! 2014-02-17 15:29:39 -08:00
Ilya Kreymer
5b34803a99 cdx: add support for filter:= and filter:!= for doing exact
(as opposed to regex matches)
eg: filter:urlkey=com,example)/?example=1 matches exact
string 'com,example)/?example=1' in the urlkey field
(as opposed to applying it as a regex)
2014-02-17 15:19:33 -08:00
Ilya Kreymer
28187b34d3 fix typos in remotecdxserver, url-agnostic dedup
when raising new exception, pass traceback of original also!
2014-02-17 14:52:13 -08:00
Ilya Kreymer
158b490453 add wsgi_cdxserver_test for testing cdx server app example
pywb.cdx.wsgi_cdxserver
2014-02-17 13:59:57 -08:00
Ilya Kreymer
abea504b04 cleanup cdx server config, refactored such that
a cdx server need implement a single interface:
load_cdx(self, **params)

CDXServer and RemoteCDXServer distinct classes in cdxserver.py
utility function cdxserver.create_cdx_server() to create
appropriate server based on input
2014-02-17 13:58:02 -08:00
Ilya Kreymer
94f1dc3be5 cleanup wbexceptions, remove unused 2014-02-17 10:23:37 -08:00
Ilya Kreymer
5345459298 pywb 0.2!
move to distinct packages: pywb.utils, pywb.cdx, pywb.warc, pywb.util, pywb.rewrite!
each package will have its own README and tests
shared sample_data and install
2014-02-17 10:01:09 -08:00
Ilya Kreymer
2528ee0a7c refactoring of binsearch and cdxserver into seperate packages
also move complicated doctests and integration tests to tests/
2014-02-12 13:16:07 -08:00