1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

556 Commits

Author SHA1 Message Date
Ilya Kreymer
4e71a0b772 better rules.yaml fix 2014-03-06 02:51:54 -08:00
Ilya Kreymer
3718e1d21b rewrite fixes: html_rewriter do not unescape attrs!
rules: don't rewrite past end of block or line
2014-03-06 02:29:52 -08:00
Ilya Kreymer
673ff35d15 minor fixes: wombat add document.WB_wombat_location
loaders: file 'urls' starting with . and / are always file paths
pep8 fixes for cdx, utils packages
2014-03-05 17:13:14 -08:00
Kenji Nagahashi
28b49f9aeb add doc directory for Sphinx documentation 2014-03-05 23:03:04 +00:00
ikreymer
03ebca47c0 Merge pull request #29 from kngenie/just-a-cleanup
clean up docstrings: fix reST formatting issues.
2014-03-05 14:36:07 -08:00
Kenji Nagahashi
64f4699203 clean up docstrings: fix reST formatting issues.
cherry-picked f03e0a7092 + some more.
2014-03-05 22:07:27 +00:00
Ilya Kreymer
daf868fd61 README tweaks
update setup.py to support setup.py test!
.travis.yml uses python setup.py test
2014-03-05 11:19:26 -08:00
Ilya Kreymer
25a8514352 Update README (move pywb configuration section to wiki),
recommend running pywb.apps.wayback
make uWSGI optional (but included in Vagrant)
rename run.sh -> run-uwsgi.sh
2014-03-05 10:42:08 -08:00
Ilya Kreymer
fe1fa43fef zipnum: remove time-based reloading for now, just look at mtime
and reload if changed
2014-03-04 21:29:05 -08:00
Ilya Kreymer
df2f7ba496 warc: add digest filter only if digest is present for url-agnostic load
ensure cdxobject format set on cdx load callback
limit reader: add length wrappign utility func to limitreader
2014-03-05 05:12:25 +00:00
Ilya Kreymer
9690d84798 travis-ci: attempt to fix 2.6 build 2014-03-04 19:36:29 -08:00
Ilya Kreymer
f25de8af2a tweak travis pip install config 2014-03-04 19:17:33 -08:00
Ilya Kreymer
cc22448cc5 fixes for 2.6 and pypy 2014-03-04 19:11:17 -08:00
Ilya Kreymer
2d48f2d733 add testing of 2.6 and pypy (attempt) 2014-03-04 18:12:36 -08:00
Ilya Kreymer
202f6101e0 coverage work! add additional test for wsgi_wrappers
additional test for zipnum bad location
for now, not testing cli interfaces which depend on opt params
2014-03-04 16:13:49 -08:00
Ilya Kreymer
d702a98bbc url-agnostic revisit testing!
add sample warc and cdx for url-agnostic revisits
add unit test and integration test
resolvingloader: pass callback instead of full cdx server
for use for loading cdx in case of url-agnostic revisit
2014-03-04 20:12:09 +00:00
Ilya Kreymer
cf5aaf5de4 add new perms_handler for supporting direct permissions api
currently just returning ["allow"] or ["block"] for a single url
2014-03-03 19:37:37 -08:00
Ilya Kreymer
577c74be49 cdx: move perms related handling to pywb.perms package, support
custom processing ops, of which perms is a specific type
add lazy_ops test to ensure all cdx processing ops are lazy

perms: set up a 'perms policy' factory and perms policy implementation
perms policy setting results in a custom processing op
update tests to work with new config
IndexReader handles both cdx server + perms policy
2014-03-03 18:27:04 -08:00
Ilya Kreymer
e0d5846484 seperate 'perms_checker' config loading as a seperate param
simplify IndexReader wrapper init, just init with a cdx server
2014-03-03 13:40:48 -08:00
Ilya Kreymer
331976748e cdxops: make sure sort reverse and closest are lazy (create generators)
perms: allow_url_lookup() only takes key param for simplicity
2014-03-03 12:16:07 -08:00
ikreymer
5a28bc6992 Merge pull request #28 from ikreymer/pkg-reorg
pywb pkg refactoring: create pywb.framework, pywb.core and pywb.apps
2014-03-03 12:04:12 -08:00
Ilya Kreymer
2d4ae62fbe - cdx handler refactoring: factor out CDXHandler and init to
seperate cdx_handler module
- Make wsgi app a class, add port as an optional field in wsgi app
and router. (not required to be specified)
2014-03-03 10:35:57 -08:00
Ilya Kreymer
0bf651c2e3 add cdx_server app!
port wsgi cdx server tests to test new app!
move base handlers to basehandlers in framework pkg
(remove werkzeug dependency)
2014-03-02 23:41:44 -08:00
Ilya Kreymer
f0a0976038 more refactoring!
create 'framework' subpackage for general purpose components!
contains routing, request/response, exceptions and wsgi wrappers
update framework package for pep8
dsrules: using load_config_yaml() (pushed to utils)
to init default config
2014-03-02 21:42:05 -08:00
Ilya Kreymer
f1acad53fc wsgi wrapper reorg!
support pluggable wsgi apps
utils: BlockLoader() supports loading from package
exceptions: base WbException moved to utils
2014-03-02 19:26:06 -08:00
Ilya Kreymer
47271bbfab remove extra .gz file, change test to use zipnum file instead 2014-03-02 08:55:26 -08:00
Ilya Kreymer
19f86305bf update pkg-reorg with changes from master, including
CDXQuery configuration
2014-03-02 00:26:29 -08:00
Ilya Kreymer
06a22c845b ensure cdx loading happens lazily
add perms test to ensure 'short-circuiting' in case of
permission exception
2014-03-01 18:40:16 -08:00
Ilya Kreymer
15d2cdd1b3 cdx: cleanup regarding and more consistency for RemoteCDXServer
RemoteCDXServer delegates filter/processing and simply proxies response from remote
RemoteCDXSource (and default usage with CDXServer) only fetches the unfiltered/unprocessed
stream and performs cdx ops locally
2014-03-01 16:35:27 -08:00
Ilya Kreymer
739d0a6f93 move CDXQuery to seperate file 2014-03-01 08:57:15 -08:00
Ilya Kreymer
355fa32600 cdx: refactor to create seperate CDXQuery object for wrapping
params passed to load_cdx()
2014-03-01 08:41:24 -08:00
Ilya Kreymer
af9cabdc72 Merge branch 'cdx-server' of git://github.com/kngenie/pywb into kngenie-cdx-server
Kengie's cdx server refactoring and wsgi improvements
2014-02-28 15:28:41 -08:00
Ilya Kreymer
502666fd3d Merge branch 'just-a-cleanup' of git://github.com/kngenie/pywb into kngenie-just-a-cleanup
cleanup setup.py indentation
2014-02-28 12:23:48 -08:00
Kenji Nagahashi
1f65eff828 Merge remote-tracking branch 'origin/master' into cdx-server
Conflicts:
	pywb/cdx/cdxdomainspecific.py
	pywb/cdx/cdxserver.py
	pywb/cdx/test/cdxserver_test.py
	setup.py
	tests/test_integration.py
2014-02-28 19:47:24 +00:00
Ilya Kreymer
c084b45298 Merge master into pkg-reorg 2014-02-28 10:25:36 -08:00
Ilya Kreymer
1e3ef6ec5c cdx: add basic test for CustomUrlCanonicalizer for now
(will likely refactor this configuration)
2014-02-28 09:40:51 -08:00
Ilya Kreymer
304a33aa5b add coverage badge 2014-02-27 18:52:41 -08:00
Ilya Kreymer
921b2eb2e1 improve testing and a few fixes:
archivalrouter: support empty collection, with and without SCRIPT_NAME
cdx: remove cdx source test, including access denied
replay: when content-type present, limit the decompressed stream to content-length
(this ensures last 4 bytes in warc/arc record are not read)
integration tests for identity replay
2014-02-27 18:43:55 -08:00
Kenji Nagahashi
9eda5ad97e address test cases broken by previous commit.
move py.test fixture and fixture classes (TestExclusionPerms, PrintReporter)
  to tests.fixture module. update test_config.yaml accordingly.
2014-02-28 01:39:04 +00:00
Ilya Kreymer
bff39626b5 add first set of zipnum tests #17
still need to test timed reload, multi sources
2014-02-27 12:33:11 -08:00
Ilya Kreymer
7863b2bade add sample data for zipnum #17 2014-02-27 20:10:44 +00:00
Ilya Kreymer
22f1f78fca cdx: clean up filters, add '~' modifier for contains
rules: fix regex to be lazy not greedy, turn off unneeded custom
canonicalizer (need tests for custom canon)
cleanup fuzzy match query
fix data package in setup.py
2014-02-27 18:22:10 +00:00
Ilya Kreymer
453ab678ed refactor domain specific rules:
- head insert callback passed in with rule, up to template
to handle additional inserts based on rule properties
- ability to pass in custom rules config to both cdx server
and content rewriter
- move canonicalize to utils pkg
- add wombat, modify wb.js to remove wombat-related settings
2014-02-26 22:04:37 -08:00
Ilya Kreymer
5a41f59f39 new unified config system, via rules.yaml!
contains configs for cdx canon, fuzzy matching and rewriting!
rewriting: ability to add custom regexs per domain
also, ability to toggle js rewriting and custom rewriting file
(default is wombat.js)
2014-02-26 18:02:01 -08:00
Kenji Nagahashi
2c40c9b112 refactor cdxserver, add tests focused on wsgi_cdxserver, add docstrings.
align cdxops function interfaces - all cdx_iter.
  move module functions / common ops to class methods
  support both 0/1 and true/false for boolean parameters
  move CDXObject to text conversion to wsgi_cdxserver (may have broken
    embedded cdxserver mode).
  pass config object as function arg rather than as global var.
2014-02-27 01:58:07 +00:00
Ilya Kreymer
349a1a7a3a add unit test to timeutils.py
tweak .travis.yml
2014-02-25 15:30:16 -08:00
Kenji Nagahashi
14f4b4d26e Merge branch 'master' into cdx-server 2014-02-25 23:14:15 +00:00
Ilya Kreymer
b5d8accd1d trying different coveralls 2014-02-25 00:19:21 -08:00
Ilya Kreymer
7b9bc1ee3d fix .travis.yml 2014-02-24 23:45:37 -08:00
Ilya Kreymer
bf3a373e7e testing coveralls 2014-02-24 23:43:32 -08:00