1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-16 00:24:48 +01:00

79 Commits

Author SHA1 Message Date
Ilya Kreymer
ec27ccfbb6 fuzzy match rules: to simplify custom fuzzy match use cases, add support
for matching fuzzy match query params as a list
2014-09-21 14:46:10 -07:00
Ilya Kreymer
0b8a8f0ae2 live rewrite: catch errors from live rewrite and raise a new LiveResourceError with a 400 error code,
indicating bad request for live resource. Add test for invalid live rewrite requests
2014-07-21 22:43:34 -07:00
Ilya Kreymer
fb07775d38 tests: add 'bad.cdx' for testing cdx lines with missing original for revisit,
missing/non-existant warc
2014-06-25 12:32:57 -07:00
Ilya Kreymer
913a1e9f31 warc: simplify recordloader a bit more, only response and request records
get parsed as http (excluding dns: and whois: uris)
All others have an '-' status and no headers parsing
tests: add test for zero-length revisits
2014-06-25 12:11:26 -07:00
Ilya Kreymer
88d3e94b36 fixes for pep8, name fixes 2014-06-15 11:57:48 -07:00
Ilya Kreymer
bdafe0938d remove accidental debug commits 2014-06-11 12:44:49 -07:00
Ilya Kreymer
e2349a74e2 replay: better POST support via post query append!
record_loader can optionally parse 'request' records
archiveindexer has -a flag to write all records ('request' included),
-p flag to append post query
post-test.warc.gz and cdx
POST redirects using 307
2014-06-10 19:21:46 -07:00
Ilya Kreymer
1d674d97d8 pep8 pass! 2014-05-16 22:44:26 -07:00
Ilya Kreymer
7d236af7d7 cdx: fix creation and add test for non-surt cdx (pywb-nonsurt/ test)
archiveindexer: -u option to generate non-surt cdx
tests: full test coverage for cdxdomainspecific (fuzzy and custom canon)
2014-05-16 21:16:50 -07:00
Ilya Kreymer
e7957a5cae remove SeekableTextFileReader, replaced with standard file-like objects
and seek(0, 2) and tell() to get file length
2014-05-06 20:54:42 -07:00
Ilya Kreymer
58f261fda4 cdx redis: disable new test until fakeredis supports zrangebylex() 2014-04-25 11:00:49 -07:00
Ilya Kreymer
2b8bea616e when given a redis path of redis://<host>/<db>/<key>, use <key> as a
sorted cdx file with zrangebylex!

modified tests but need zrangebylex() support in fakeredis to finish
2014-04-25 10:52:35 -07:00
Ilya Kreymer
e077c23de7 fuzzy match: modify existing params to ensure any custom params are preserved
templates: add ability to set custom global vars, such as 'static_path'
for all templates
2014-04-04 12:20:54 -07:00
Ilya Kreymer
b0b0adb043 refactor: rename pywb.core -> pywb.webapp
move perms/test/test_perms_policy -> tests/perms_fixture
for rules file, use single DEFAULT_RULES_FILE import
2014-04-04 10:09:26 -07:00
Ilya Kreymer
bd21fec6d4 update run-uwsgi.sh and add run-gunicorn.sh
update README and INSTALL, fix typo
only list wb handlers on home page by default
pep8 fixes
2014-04-03 08:56:18 -07:00
Ilya Kreymer
80f2da9548 refactor: move configs/config.yaml to root again
remove cdx-server specific config, instead make cdx server api-only
path configurable from regular config
2014-04-02 21:26:53 -07:00
Ilya Kreymer
399642d719 add missing cdxserver test file 2014-04-02 18:34:05 -07:00
Ilya Kreymer
8b37fef8e0 tests: add explicit cdxserver config testing with different config variations 2014-04-02 15:01:40 -07:00
Ilya Kreymer
91184426b7 test coverage pass:
refactor and cleanup to improve coverage for corner cases
2014-04-02 13:16:54 -07:00
Ilya Kreymer
2c74ea9f23 fuzzy match: make filter string optionally overridable
setup.py: unset PYWB_CONFIG_ENV
2014-03-27 21:43:30 -07:00
Ilya Kreymer
093d8310e5 config: move config files to ./configs/
PYWB_CONFIG_FILE setting overrides passed in config
2014-03-27 14:31:27 -07:00
Ilya Kreymer
4e53c2e9d8 remote cdx refactoring: refactor remote cdx source and server to support
fuzzy matching
test local cdx server, remote cdx source, local and remote filtering
with self-contained unit tests
map remote cdx httperrors to pywb exceptions
2014-03-26 11:33:46 -07:00
Ilya Kreymer
5847087aae add fakeredis mock, test for RedisCDXSource 2014-03-25 11:02:32 -07:00
Ilya Kreymer
87df7c22f1 standardize test scripts to test_*.py instead of *_test.py 2014-03-25 11:01:51 -07:00
Ilya Kreymer
52d99aef57 misc fixes: RemoteCDXServer throws NotFoundException on 404
fix typo in handlers
make WBHandler overridable in pywb_init
make perms_policy optional in IndexReader
2014-03-17 17:35:10 -07:00
Ilya Kreymer
a1ab54c340 first pass at memento support #10!
memento support enabled by default, togglable via 'enable_memento' config property
supporting timegate and memento apis, no timemap yet
supporting pattern 2.3 for archival and pattern 1.3 for proxy modes
also:
simplify exception hierarchy a bit more, move down to utils
make WbRequest and WbResponse extensible with mixins (eg for memento)
2014-03-14 10:46:20 -07:00
Ilya Kreymer
e346dfb024 remove accidental logging 2014-03-09 23:03:55 -07:00
Ilya Kreymer
68878fa72a update domain-specific rules to make flickr replay work better! 2014-03-08 15:53:52 -08:00
Ilya Kreymer
3b1afc3e3d replace StringIO with BytesIO 2014-03-08 09:30:19 -08:00
Ilya Kreymer
7b5cbaa878 cdx: clean up closest, reverse ops
closest takes precedence over reverse
'reverse closest' not supported, add test to reflect that
2014-03-06 16:11:46 -08:00
Ilya Kreymer
c42a96386f cdx: fix the 'yield nothing' case when limit==1
add additional test case for limit==1 and reverse=True,
as limit is optimized out
2014-03-06 16:01:49 -08:00
Ilya Kreymer
673ff35d15 minor fixes: wombat add document.WB_wombat_location
loaders: file 'urls' starting with . and / are always file paths
pep8 fixes for cdx, utils packages
2014-03-05 17:13:14 -08:00
Kenji Nagahashi
64f4699203 clean up docstrings: fix reST formatting issues.
cherry-picked f03e0a7092 + some more.
2014-03-05 22:07:27 +00:00
Ilya Kreymer
fe1fa43fef zipnum: remove time-based reloading for now, just look at mtime
and reload if changed
2014-03-04 21:29:05 -08:00
Ilya Kreymer
cc22448cc5 fixes for 2.6 and pypy 2014-03-04 19:11:17 -08:00
Ilya Kreymer
d702a98bbc url-agnostic revisit testing!
add sample warc and cdx for url-agnostic revisits
add unit test and integration test
resolvingloader: pass callback instead of full cdx server
for use for loading cdx in case of url-agnostic revisit
2014-03-04 20:12:09 +00:00
Ilya Kreymer
577c74be49 cdx: move perms related handling to pywb.perms package, support
custom processing ops, of which perms is a specific type
add lazy_ops test to ensure all cdx processing ops are lazy

perms: set up a 'perms policy' factory and perms policy implementation
perms policy setting results in a custom processing op
update tests to work with new config
IndexReader handles both cdx server + perms policy
2014-03-03 18:27:04 -08:00
Ilya Kreymer
e0d5846484 seperate 'perms_checker' config loading as a seperate param
simplify IndexReader wrapper init, just init with a cdx server
2014-03-03 13:40:48 -08:00
Ilya Kreymer
331976748e cdxops: make sure sort reverse and closest are lazy (create generators)
perms: allow_url_lookup() only takes key param for simplicity
2014-03-03 12:16:07 -08:00
Ilya Kreymer
0bf651c2e3 add cdx_server app!
port wsgi cdx server tests to test new app!
move base handlers to basehandlers in framework pkg
(remove werkzeug dependency)
2014-03-02 23:41:44 -08:00
Ilya Kreymer
f1acad53fc wsgi wrapper reorg!
support pluggable wsgi apps
utils: BlockLoader() supports loading from package
exceptions: base WbException moved to utils
2014-03-02 19:26:06 -08:00
Ilya Kreymer
06a22c845b ensure cdx loading happens lazily
add perms test to ensure 'short-circuiting' in case of
permission exception
2014-03-01 18:40:16 -08:00
Ilya Kreymer
15d2cdd1b3 cdx: cleanup regarding and more consistency for RemoteCDXServer
RemoteCDXServer delegates filter/processing and simply proxies response from remote
RemoteCDXSource (and default usage with CDXServer) only fetches the unfiltered/unprocessed
stream and performs cdx ops locally
2014-03-01 16:35:27 -08:00
Ilya Kreymer
739d0a6f93 move CDXQuery to seperate file 2014-03-01 08:57:15 -08:00
Ilya Kreymer
355fa32600 cdx: refactor to create seperate CDXQuery object for wrapping
params passed to load_cdx()
2014-03-01 08:41:24 -08:00
Ilya Kreymer
af9cabdc72 Merge branch 'cdx-server' of git://github.com/kngenie/pywb into kngenie-cdx-server
Kengie's cdx server refactoring and wsgi improvements
2014-02-28 15:28:41 -08:00
Kenji Nagahashi
1f65eff828 Merge remote-tracking branch 'origin/master' into cdx-server
Conflicts:
	pywb/cdx/cdxdomainspecific.py
	pywb/cdx/cdxserver.py
	pywb/cdx/test/cdxserver_test.py
	setup.py
	tests/test_integration.py
2014-02-28 19:47:24 +00:00
Ilya Kreymer
1e3ef6ec5c cdx: add basic test for CustomUrlCanonicalizer for now
(will likely refactor this configuration)
2014-02-28 09:40:51 -08:00
Ilya Kreymer
921b2eb2e1 improve testing and a few fixes:
archivalrouter: support empty collection, with and without SCRIPT_NAME
cdx: remove cdx source test, including access denied
replay: when content-type present, limit the decompressed stream to content-length
(this ensures last 4 bytes in warc/arc record are not read)
integration tests for identity replay
2014-02-27 18:43:55 -08:00
Kenji Nagahashi
9eda5ad97e address test cases broken by previous commit.
move py.test fixture and fixture classes (TestExclusionPerms, PrintReporter)
  to tests.fixture module. update test_config.yaml accordingly.
2014-02-28 01:39:04 +00:00