1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

1984 Commits

Author SHA1 Message Date
Ilya Kreymer
738fc0e427 Merge pull request #209 from ikreymer/warcio-split
Warcio split
2017-03-08 16:35:08 -08:00
Ilya Kreymer
98c0475806 test: fix test to use closest='now' for live test 2017-03-08 12:50:51 -08:00
Ilya Kreymer
a2ffbde2f6 dockerfile: add portalocker
rewriterapp: don't add memento headers for ajax responses to avoid replay issues
2017-03-08 12:30:20 -08:00
Ilya Kreymer
0784e4e5aa spin-off warcio!
update imports to point to warcio
warcio rename fixes:
- ArcWarcRecord.stream -> raw_stream
- ArcWarcRecord.status_headers -> http_headers
- ArchiveLoadFailed single param init
2017-03-07 10:58:00 -08:00
Ilya Kreymer
4a94699a65 warc refactor: ArchiveLoadFailed no longer derived from WbException
catch separately, set status to 503 Archive Not Available explicitly
2017-03-01 18:02:35 -08:00
Ilya Kreymer
2b3fde028f refactor: split LimitReader into limitreader.py 2017-03-01 15:13:32 -08:00
Ilya Kreymer
b7285b1a77 refactor: split off BlockLoader support into BlockArcWarcRecordLoader, plain ArcWarcRecordLoader only includes parse_record_stream(), no load()
use BlockArcWarcRecordLoader() only when needed for replay
2017-03-01 14:57:44 -08:00
Ilya Kreymer
1213466afb warc & recorder refactor: split BaseWARCWriter from MultiWARCWriter, move to warc/warcwriter.py, recorder/multifilewarcwriter.py
split indexing functionality from base warc iterator, move to archiveindexer.py
2017-03-01 14:18:44 -08:00
Ilya Kreymer
3faa55906a warcwriter: attempt to separate warc writing semantics from the recorder
use StatusAndHeaders instead of requests CaseInsensitiveDict for consistency
refactor writer api: create_warc_record() for creating new record
copy_warc_record() for copying a full record from a stream
add writer tests, separate from recorder
2017-03-01 12:50:32 -08:00
Ilya Kreymer
c66d251a90 warc: make ArchiveIterator an actual iterator
warc indexing test: add test for reading warc with interspersed empty gzip records, ensure they are ignored
2017-03-01 12:48:06 -08:00
Ilya Kreymer
114ef2a637 autoapp: add OSError for py2.7 2017-02-27 22:13:59 -08:00
Ilya Kreymer
a4b770d34e new-pywb refactor!
frontendapp compatibility
- add support for separate not found page for 404s (not_found.html)
- support for exception handling with error template (error.html)
- support for home page (index.html)
- add memento headers for replay
- add referrer fallback check
- tests: port integration tests for front-end replay, cdx server
- not included: proxy mode, exact redirect mode, non-framed replay
- move unused tests to tests_disabled
- cli: add optional werkzeug profiler with --profile flag
2017-02-27 19:07:51 -08:00
Ilya Kreymer
0dbc803422 webagg: default 'memento+' config to /timemap/link/ instead of /timemap/*/ for greater compatibility 2017-02-26 12:18:31 -08:00
Ilya Kreymer
91e45be75d rewriteinputreq: ensure path is not blank, default to '/' 2017-02-24 15:00:37 -08:00
Ilya Kreymer
070ecca7af redis multi-key index source optimization: support optional 'member_key_templ' for retrieving members instead of using scan_iter()
add scan_keys() which checks member key if provided, otherwise falls back to scan_iter()
2017-02-21 11:25:18 -08:00
Ilya Kreymer
35c298011a rewriterapp: buffer upstream source while rewriting! 2017-02-19 20:49:51 -08:00
Ilya Kreymer
2a7da54be9 Merge branch 'master' into new-pywb 2017-02-17 18:49:01 -08:00
Ilya Kreymer
31bf7a47f1 new-wayback cli script, using new FrontEndApp (rewriting) + AutoConfigApp (config-driven aggregator)
support for dynamic collections: check all .cdxj files in /<coll>/indexes/*.cdxj when accessing /<coll>
support for fixed routes: specified in config.yaml as per https://github.com/ikreymer/pywb/wiki/Distributed-Archive-Config
werkzeug routing in FrontEndApp: default query, replay, search pages working
route listing: /_coll_info.json for listing fixed + dynamic routes
autoindexing enabled, indexing WARCs added to archives directory to .cdxj index
Addresses #196
2017-02-17 18:04:07 -08:00
Ilya Kreymer
60f3c0a213 setup: further update trove classifiers, add python 2 and 3, switch to production stable, closes #208 2017-02-16 15:12:22 -08:00
Ilya Kreymer
77960c1311 setup: bound webtest for 2.6 support 2017-02-16 14:32:14 -08:00
Ilya Kreymer
7f95396be0 setup: use jinja2<2.9 for now 2017-02-16 11:29:30 -08:00
Ilya Kreymer
14e1dbb268 update CHANGELIST for 0.33.1 2017-02-16 11:09:31 -08:00
Ilya Kreymer
4f6fa3ffd8 client-rewrite: disable eval() override for now, needs more testing 2017-02-16 11:04:42 -08:00
Ilya Kreymer
58b141bd53 add python 3 classifiers (#208) 2017-02-16 11:02:53 -08:00
Ilya Kreymer
531422fc1b client-side rewrite improvements:
- add overrides for document.URL, xhr.responseURL, function for general single property override
- postMessage: add overrides for additional MessageEvent properties, target, srcElement, path, eventPhase
- postMessage: avoid duplicate event listeners registered
- check for duplicate postMessage override inits
2017-02-15 17:03:15 -08:00
Ilya Kreymer
a5bc932e0c memento agg test: fix test to reflect change from link->* 2017-02-06 21:17:48 -05:00
Ilya Kreymer
06c6e0c6f8 memento agg: fix test to reflect change 2017-02-06 21:00:08 -05:00
Ilya Kreymer
1d5b48d3b6 indexsource: improve init_from_config() to always use current class
use '*' instead of 'link' for timemap for compatibility (for now)
2017-02-06 20:52:43 -05:00
Ilya Kreymer
564f548afa wombat improvements:
- xhr responseURL override, extract original url
- Worker override: if using 'blob:', extract blob and remove any postMessage() rewriting (workers won't have the __WB_pmw function)
- eval() override: conv to string before rewriting
2017-02-05 02:26:02 -05:00
Ilya Kreymer
7f8562a39d utils: LimitReader tell() proxies to original stream, available only if original has tell() 2017-02-04 22:54:43 -05:00
Ilya Kreymer
1a9f66f8b6 mementoindexsource: treat missing Link header as non-memento/not found 2017-01-27 00:07:38 -08:00
Ilya Kreymer
f92782d1dd utils: LimitReader: support tell() 2017-01-26 23:29:40 -08:00
Ilya Kreymer
9773eba47d setup reqs: use webassets==0.12.1 (with pyinstaller support), remove dependency on custom branch 2017-01-26 01:37:57 -08:00
Ilya Kreymer
84796ba810 setup req: fix jinja<2.9 for now due to issues in 2.9+ 2017-01-26 01:14:10 -08:00
Ilya Kreymer
2d54bb87be setup.py: ensure gevent monkey-patch is called before running tests with python setup.py test 2017-01-26 00:37:35 -08:00
Ilya Kreymer
bb64d0de54 url-rewrite cookie store: decode() only if redis returns byte strings in py3 2017-01-26 00:01:39 -08:00
Ilya Kreymer
2cc6f5b4d6 Merge pull request #203 from atomotic/new-pywb
replace fcntl with portalocker
2016-12-27 12:38:12 -08:00
raffaele messuti
524d9bfd26 portalocker for file locking check instead of fcntl. more portable on windows 2016-12-26 10:27:20 +01:00
Ilya Kreymer
0e414acfda setup: remove pyamf as default dep for now 2016-12-21 17:15:41 -08:00
Ilya Kreymer
3b82416ad3 setup: add specific dependencies for webassets, pyamf 2016-12-21 16:11:48 -08:00
Ilya Kreymer
c52efa0f9b loader improvements: add PackageLoader for pkg:// scheme
use pkgutil.get_data() instead of pkg_resources
template loading: load assets file through load() interface, use standard PackageLoader
2016-12-18 20:57:17 -08:00
Ilya Kreymer
fa85793e97 remove chunk_encoding of wsgi response: per pep 3333 (https://www.python.org/dev/peps/pep-3333/#other-http-features), the application/middleware should *not* add Transfer-Encoding header or chunk encode the response 2016-12-16 13:48:51 -08:00
Ilya Kreymer
5f7a62bd5e utils: expandvars only if not empty 2016-12-16 12:22:06 -08:00
Ilya Kreymer
d104b0f367 rewriterapp: ensure correct sized or chunked response:
if no content-length and http 1.1, chunk encode the response
if no content-length and http 1.0, buffer response and add content-length
utils: port buffer_iter() for buffering iter, returning another iter
utils load_config: expand any env vars
2016-12-16 11:19:40 -08:00
Ilya Kreymer
4ce65c5289 logging: disable excess print statements 2016-12-16 11:13:27 -08:00
Ilya Kreymer
fb91d116a9 urlrewrite cookietracker fix: rewrite Path of cookies retrieved from cookietracker (redis) using custom host scope rewriter (no other filtering) 2016-12-11 18:59:02 -08:00
Ilya Kreymer
bf402e68f6 warc: make ArcWarcRecord a class to allow modifying attribs
warcwriter: add option to not adjust content length if record already prepared
2016-12-09 18:09:42 -08:00
Ilya Kreymer
bbfe3a9d51 bufferedreader: read() op attempts to read entire buff or exact length, retries if boundary reached 2016-12-09 17:51:39 -08:00
Ilya Kreymer
50a3353da3 wsgi server: default to gevent-based wsgi server for all cmd line server apps, add -s command for specifying server #201
cli: add 'webagg-server' cli command for running new webagg system
tests: fix cli test for gevent server
2016-12-09 16:46:33 -08:00
Ilya Kreymer
4f9b963e13 tests: update test to support uncompressed followed after compressed block 2016-12-08 14:20:46 -08:00