Ilya Kreymer
d12f715d81
refactor: split warcserver.utils into utils package:
...
- utils.io for stream/compression related utils
- utils.format for string formatting
- utils.memento for memento
- load_config -> utils.loaders.load_overlay_config
- also: use warcio.utils.to_native_str instead of utils.loaders.to_native_str
2017-06-05 17:43:46 -07:00
Ilya Kreymer
2907ed01c8
refactor:
...
- fix pywb.indexer, pywb.manager, pywb.recorder packages, tests pass
rename geventeventserver -> pywb.utils
move extract_post_query/append_post_query to inputrequest.PostQueryExtractor
remove to_native_str() in pywb.utils, redundant with warcio.utils version
remove obsolete readme, dockerfile
2017-05-23 16:43:41 -07:00
Ilya Kreymer
0784e4e5aa
spin-off warcio!
...
update imports to point to warcio
warcio rename fixes:
- ArcWarcRecord.stream -> raw_stream
- ArcWarcRecord.status_headers -> http_headers
- ArchiveLoadFailed single param init
2017-03-07 10:58:00 -08:00
Ilya Kreymer
2b3fde028f
refactor: split LimitReader into limitreader.py
2017-03-01 15:13:32 -08:00
Ilya Kreymer
7f8562a39d
utils: LimitReader tell() proxies to original stream, available only if original has tell()
2017-02-04 22:54:43 -05:00
Ilya Kreymer
f92782d1dd
utils: LimitReader: support tell()
2017-01-26 23:29:40 -08:00
Ilya Kreymer
c52efa0f9b
loader improvements: add PackageLoader for pkg:// scheme
...
use pkgutil.get_data() instead of pkg_resources
template loading: load assets file through load() interface, use standard PackageLoader
2016-12-18 20:57:17 -08:00
Ilya Kreymer
66ca8d8b26
http block loader: raise exception for 4xx, 5xx responses
...
tests: add tests for limitreader posting, fix charset for frame test
2016-07-31 12:56:00 -04:00
Ilya Kreymer
197ed5be98
loader: profile urls: ensure the profile prefix is removed from url before passing to loader, #180
2016-06-04 14:09:18 -04:00
Ilya Kreymer
8ad66249c7
blockloader: support for loader profiles, specified via 'profile+scheme://...' urls. Profiles specify additional settings (eg. credentials) that are not included in the url. To enabl
...
e custom profiles, BlockLoader.set_profile_loader(callable) to a callable that will return custom config, addresses #180
2016-05-18 16:34:58 -07:00
Ilya Kreymer
d11bd444ad
s3 loader: unurlencode username/password
2016-05-17 19:24:14 -07:00
Ilya Kreymer
119074e0ee
s3 loader improvements: support AWS cred in username and password part of url, stream s3 response directly
2016-05-17 18:55:10 -07:00
Ilya Kreymer
87da25c703
post request mapping improvements: work on #178 , including:
...
- mapping multipart/form-data same as x-www-form-urlencoded
- parsing application/x-amf with pyamf
- RewriteContentAMF for rewriting AMF response to match request
- default encoding of other POST data as base64 encoded __wb_post_data param
2016-05-06 10:19:08 -07:00
Ilya Kreymer
4b753d2612
Merge branch '0.11.5' into develop
2016-03-31 13:16:53 -07:00
Ilya Kreymer
b5cf79072d
loaders: ensure loader stream closed in load_yaml_config()
2016-03-31 12:42:23 -07:00
Ilya Kreymer
f8f0c3a76e
loader: ensure file closed in load_yaml_config()
2016-03-27 13:56:19 -04:00
Ilya Kreymer
e5ca9bf601
Merge branch 'master' into py3
2016-03-10 10:53:30 -08:00
Mat Kelly
96da397456
Quick comment fix
2016-03-04 11:17:35 -05:00
Ilya Kreymer
a6dc57cf4a
post query: ensure post query optional buffer is a byte not string buffer
...
exceptions: move LiveRequestException to wbexceptions
cdx query: support for 'alt_url' which, if set, is used to create start_key and end_key
2016-03-03 13:13:44 -08:00
Ilya Kreymer
3a584a1ec3
py3: all tests pass, at last!
...
but not yet py2... need to resolve encoding in rewriting issues
2016-02-23 13:26:53 -08:00
Ilya Kreymer
bd841b91a9
more python 3 support work -- pywb.cdx, pywb.warc tests succeed
...
most relative imports replaced with absolute
2016-02-18 21:26:40 -08:00
Ilya Kreymer
3c85f7b7ac
py3: make pywb.utils work with python 3!
2016-02-16 14:52:20 -08:00
Ilya Kreymer
75085ad91b
loaders: fix loader inits, don't inherit from BlockLoader #135
2015-10-20 10:33:24 -07:00
Ilya Kreymer
94095e452a
loaders: refactor BlockLoader to use an extensible dict of loaders
...
individual HttpLoader, LocalFileLoader and S3Loader supported by default
Loaders created via BlockLoader also cached for reuse, closes #135
2015-10-19 11:59:35 -07:00
Ilya Kreymer
e435242d38
wombat: Date: fixes to Date override, guard against double override
...
document.write: use shared rewrite_html() method, issue single write call
loaders: read_http() don't use range request if no range is set
2015-07-17 18:40:25 -07:00
Ilya Kreymer
2d0c526053
post handling: when reading post data in extract_post_query(), add optional buffer_stream which would hold the original POST
...
data. This is necessary to override the `wsgi.input` to allow the post data to be read again via a fallback handler, even
after reading POST query data in replay handler, addresses #117
2015-06-25 15:58:58 -07:00
Ilya Kreymer
52a7dd87c6
loaders: s3: import boto just once, store s3_avail flag
2015-04-17 11:02:57 -07:00
Ilya Kreymer
c8a9a3ddd4
loaders: add support for loading from s3:// using boto
...
if auth connection fails, attempt anon connection, #97
2015-04-17 11:02:57 -07:00
Ilya Kreymer
fc9d659b5d
loaders: switch BlockLoader to use requests instead of urliib2
2015-03-28 16:41:52 -07:00
Ilya Kreymer
2af5a25009
zipnum: support for pagination api! #34 and #83 . cdx server now bounded by pageSize (default 10 blocks),
...
showNumPages=true returns json indicating num pages, page=N can be set to page number 0-numPages - 1
loaders: add read_last_line() to read last line of a seekable file, used to read last line of index file when
at end
tests: additional test for binsearch boundary conditions
zipnum: secondary index output supports json also
2015-03-24 18:56:13 -07:00
Ilya Kreymer
ac525b0937
tests: add tests for extract_post_query()
...
add test for HttpsUrlRewriter, remove unnecessary check in
bufferedreader
2015-01-11 23:54:29 -08:00
Ilya Kreymer
cf0a21509b
loaders: add to_file_url() for converting between filename and file://,
...
used in live rewrite and tests
2015-01-11 13:05:48 -08:00
Ilya Kreymer
1eb0f96f92
windows support work: fix loaders to use pathname2url to convert to
...
file:/// url, use urlopen to open file paths
fix some tests to use universal line breaks
2015-01-10 14:06:15 -08:00
Ilya Kreymer
e8d3965269
pep8 style fixes, remove unused methods
2014-10-21 19:06:16 -07:00
Ilya Kreymer
50bf7d2634
rewrite: move extract_client_cookie to utils for access at rewrite
...
root cookie_rewriter: keep max-age
add csrf token copying (experimental)
update tests
2014-10-12 03:07:54 -07:00
Ilya Kreymer
71e8ada57d
rewrite: add test for banner-only mode, rewriting w/o a head using local
...
'sample_no_head' file.
query.html: use client side rewriting for calendar dates
rewrite: remove unused decode stuff
2014-08-04 20:45:02 -07:00
Ilya Kreymer
0c9d88f032
POST replay: treat POST form data same as get query, no '&&&' marker
...
additional testing POST
2014-06-11 11:17:06 -07:00
Ilya Kreymer
e2349a74e2
replay: better POST support via post query append!
...
record_loader can optionally parse 'request' records
archiveindexer has -a flag to write all records ('request' included),
-p flag to append post query
post-test.warc.gz and cdx
POST redirects using 307
2014-06-10 19:21:46 -07:00
Ilya Kreymer
e7957a5cae
remove SeekableTextFileReader, replaced with standard file-like objects
...
and seek(0, 2) and tell() to get file length
2014-05-06 20:54:42 -07:00
Ilya Kreymer
64eef7063d
record reading: better handling of empty arc (or warc) records
...
for indexing, index empty/invalid length as '-' status code
for reading, serve as 204 no content.
ensure that StatusAndHeaders has a valid statusline when serving
if http content-length is valid,, limit stream to that content-length
as well as record content-length (whichever is smaller)
replace content-length when buffering
2014-04-07 17:08:39 -07:00
Ilya Kreymer
28d65ce717
archiveindexer major refactoring using zlib only
...
supports warc.gz, arc.gz, warc, arc and optional sorting
outputs cdx 11 but possible to extend to other formats
(additional edge case testing needed)
DecompressingBufferedReader refactoring to support multi-member gzip
Unit tests for indexer, addtional unit tests for bufferedreaders and loaders,
and recordloaders
2014-03-30 23:47:33 -07:00
Ilya Kreymer
14a12f95b2
pep8 fixes, improve docs for proxy
...
move CaptureException into replay_views
2014-03-14 11:02:03 -07:00
Ilya Kreymer
3b1afc3e3d
replace StringIO with BytesIO
2014-03-08 09:30:19 -08:00
Ilya Kreymer
673ff35d15
minor fixes: wombat add document.WB_wombat_location
...
loaders: file 'urls' starting with . and / are always file paths
pep8 fixes for cdx, utils packages
2014-03-05 17:13:14 -08:00
Ilya Kreymer
df2f7ba496
warc: add digest filter only if digest is present for url-agnostic load
...
ensure cdxobject format set on cdx load callback
limit reader: add length wrappign utility func to limitreader
2014-03-05 05:12:25 +00:00
Ilya Kreymer
0bf651c2e3
add cdx_server app!
...
port wsgi cdx server tests to test new app!
move base handlers to basehandlers in framework pkg
(remove werkzeug dependency)
2014-03-02 23:41:44 -08:00
Ilya Kreymer
f0a0976038
more refactoring!
...
create 'framework' subpackage for general purpose components!
contains routing, request/response, exceptions and wsgi wrappers
update framework package for pep8
dsrules: using load_config_yaml() (pushed to utils)
to init default config
2014-03-02 21:42:05 -08:00
Ilya Kreymer
f1acad53fc
wsgi wrapper reorg!
...
support pluggable wsgi apps
utils: BlockLoader() supports loading from package
exceptions: base WbException moved to utils
2014-03-02 19:26:06 -08:00
Ilya Kreymer
5a41f59f39
new unified config system, via rules.yaml!
...
contains configs for cdx canon, fuzzy matching and rewriting!
rewriting: ability to add custom regexs per domain
also, ability to toggle js rewriting and custom rewriting file
(default is wombat.js)
2014-02-26 18:02:01 -08:00
Ilya Kreymer
1754f15831
Combine FileLoader/HttpLoader into a single BlockLoader which
...
delegates based on scheme
2014-02-22 16:49:26 -08:00