Ilya Kreymer
527a3bc89c
bufferedreader: be lenient of partially decompressed data: return what was decompressed, rather than just throw exception
...
esp. useful if record was decompressed, but an error in crc check
may add additional options for toggling 'leniency' if needed
2016-06-12 00:37:14 -04:00
Ilya Kreymer
197ed5be98
loader: profile urls: ensure the profile prefix is removed from url before passing to loader, #180
2016-06-04 14:09:18 -04:00
Ilya Kreymer
8ad66249c7
blockloader: support for loader profiles, specified via 'profile+scheme://...' urls. Profiles specify additional settings (eg. credentials) that are not included in the url. To enabl
...
e custom profiles, BlockLoader.set_profile_loader(callable) to a callable that will return custom config, addresses #180
2016-05-18 16:34:58 -07:00
Ilya Kreymer
d11bd444ad
s3 loader: unurlencode username/password
2016-05-17 19:24:14 -07:00
Ilya Kreymer
119074e0ee
s3 loader improvements: support AWS cred in username and password part of url, stream s3 response directly
2016-05-17 18:55:10 -07:00
Ilya Kreymer
87da25c703
post request mapping improvements: work on #178 , including:
...
- mapping multipart/form-data same as x-www-form-urlencoded
- parsing application/x-amf with pyamf
- RewriteContentAMF for rewriting AMF response to match request
- default encoding of other POST data as base64 encoded __wb_post_data param
2016-05-06 10:19:08 -07:00
Ilya Kreymer
4b753d2612
Merge branch '0.11.5' into develop
2016-03-31 13:16:53 -07:00
Ilya Kreymer
b5cf79072d
loaders: ensure loader stream closed in load_yaml_config()
2016-03-31 12:42:23 -07:00
Ilya Kreymer
f8f0c3a76e
loader: ensure file closed in load_yaml_config()
2016-03-27 13:56:19 -04:00
Ilya Kreymer
2051785e6b
statusandheaders: add to_str() method with 'exclude_list' to support converting to str with certain headers
...
excluded. also supported by to_bytes()
2016-03-11 11:02:13 -08:00
Ilya Kreymer
e5ca9bf601
Merge branch 'master' into py3
2016-03-10 10:53:30 -08:00
Ilya Kreymer
b3372f64c3
timeutils: add datetime_to_iso_date
2016-03-08 08:39:45 -08:00
Ilya Kreymer
5ad01f7d64
statusandheaders: add a to_bytes() func for serializing header
2016-03-08 08:26:51 -08:00
Ilya Kreymer
648e567805
statusandheaders: add __str__ func to reconstruct statusline + headers text
2016-03-04 12:48:36 -08:00
Mat Kelly
96da397456
Quick comment fix
2016-03-04 11:17:35 -05:00
Ilya Kreymer
a6dc57cf4a
post query: ensure post query optional buffer is a byte not string buffer
...
exceptions: move LiveRequestException to wbexceptions
cdx query: support for 'alt_url' which, if set, is used to create start_key and end_key
2016-03-03 13:13:44 -08:00
Ilya Kreymer
3a584a1ec3
py3: all tests pass, at last!
...
but not yet py2... need to resolve encoding in rewriting issues
2016-02-23 13:26:53 -08:00
Ilya Kreymer
bd841b91a9
more python 3 support work -- pywb.cdx, pywb.warc tests succeed
...
most relative imports replaced with absolute
2016-02-18 21:26:40 -08:00
Ilya Kreymer
3c85f7b7ac
py3: make pywb.utils work with python 3!
2016-02-16 14:52:20 -08:00
Ilya Kreymer
79d5ec2b2d
statusheaders: when not verifying protocol line, avoid indexerror when no space in first line, add tests
2015-12-18 21:46:00 -08:00
Ilya Kreymer
75085ad91b
loaders: fix loader inits, don't inherit from BlockLoader #135
2015-10-20 10:33:24 -07:00
Ilya Kreymer
94095e452a
loaders: refactor BlockLoader to use an extensible dict of loaders
...
individual HttpLoader, LocalFileLoader and S3Loader supported by default
Loaders created via BlockLoader also cached for reuse, closes #135
2015-10-19 11:59:35 -07:00
Ilya Kreymer
db4fbe79ec
tests: add test for BufferedReader 'deflate' (w/o gzip header)
2015-10-11 17:47:19 -07:00
Ilya Kreymer
c3aab1514c
query/cdx: support from
and to
cdx query arguments, support ranged calendar query,
...
eg. /[from]*[to]/[url] or /[from]-[to]/[url], with both from and to optional, closes #130
exposes lower and upper bound timestamps in timeutils, pad_timestamp
2015-10-07 10:44:12 -07:00
Ilya Kreymer
e435242d38
wombat: Date: fixes to Date override, guard against double override
...
document.write: use shared rewrite_html() method, issue single write call
loaders: read_http() don't use range request if no range is set
2015-07-17 18:40:25 -07:00
Ilya Kreymer
2d0c526053
post handling: when reading post data in extract_post_query(), add optional buffer_stream which would hold the original POST
...
data. This is necessary to override the `wsgi.input` to allow the post data to be read again via a fallback handler, even
after reading POST query data in replay handler, addresses #117
2015-06-25 15:58:58 -07:00
Ilya Kreymer
06fcc89de6
readers: support 'content-encoding: deflate' using different zlib decompression options
...
support default and alt settings for attempting to decompress deflate stream
tests: add tests with httpbin.org/deflate Fixes #115
2015-06-24 13:11:33 -07:00
Ilya Kreymer
08064f3806
warc load: make http response/request protocol/verb validation optional
...
enabled for replay, disabled by default for cdx-indexing, though can
be enabled with -v option #99
2015-04-20 08:29:18 -07:00
Ilya Kreymer
1d49a9fd3b
tests: improved tests for loaders module
2015-04-17 11:02:57 -07:00
Ilya Kreymer
52a7dd87c6
loaders: s3: import boto just once, store s3_avail flag
2015-04-17 11:02:57 -07:00
Ilya Kreymer
c8a9a3ddd4
loaders: add support for loading from s3:// using boto
...
if auth connection fails, attempt anon connection, #97
2015-04-17 11:02:57 -07:00
Ilya Kreymer
c378cb5188
rewrite: check for closed before any use of readline() (2.6 may throw if closed),
...
only use readline() if line alignment needed (non-html), related to #86 work
2015-04-01 07:54:17 -07:00
Ilya Kreymer
8e60a6464c
chunkeddatareader: read(): catch ValueError when attempting to read again in case stream is already closed
2015-03-31 23:31:49 -07:00
Ilya Kreymer
199f552f73
rewrite: if no charset specified, attempt to read first 1024 bytes and set charset in header,
...
to avoid charset warning if head insert exceeds 1024 bytes (#86 )
also encode head insert with detected charset, if possible
chunkeddatareader: add read() function to ensure read will read upto specified
length across chunks
2015-03-31 22:38:20 -07:00
Ilya Kreymer
30ab27bb1c
indexing: support indexing (and even replay of) records where target-uri is a 'urn:' identifier ( #91 )
...
for canonicalzation, treat urns as is, already canonical
for wburl, don't add http:// prefix if urn: prefix is present
add example-wpull warc for testing
2015-03-30 17:23:50 -07:00
Ilya Kreymer
fc9d659b5d
loaders: switch BlockLoader to use requests instead of urliib2
2015-03-28 16:41:52 -07:00
Ilya Kreymer
2af5a25009
zipnum: support for pagination api! #34 and #83 . cdx server now bounded by pageSize (default 10 blocks),
...
showNumPages=true returns json indicating num pages, page=N can be set to page number 0-numPages - 1
loaders: add read_last_line() to read last line of a seekable file, used to read last line of index file when
at end
tests: additional test for binsearch boundary conditions
zipnum: secondary index output supports json also
2015-03-24 18:56:13 -07:00
Ilya Kreymer
b417b47835
collections manager: support for merge when adding warc, explicit --index-warcs
...
option to index and merge instead of reindexing whole dir, #74
additional testing for recursive indexing, index merge
timeutils: add timestamp20_now() function
2015-03-14 14:56:15 -07:00
Ilya Kreymer
499e21233e
statusandheaders: make protocol check case-insensitive, eg. accept HTTP/1.0 and http/1.0 for better compatibility
2015-03-07 11:37:06 -08:00
Ilya Kreymer
80dcb6ff27
rewrite: improvements to non-exact replay mode, redir_to_exact option set to false
...
frames: add request_ts to wbinfo and use that as the timestamp in the top-frame. for exact replay, request_ts == timestamp
for latest replay / no timestamp / memento timegate, redirect to current time instead of time of last capture, while serving
last capture.
timeutils: add timestamp_now() function to return timestamp of current datetime
Add extra tests for this mode
Tracked via #72
2015-02-17 17:51:45 -08:00
Ilya Kreymer
ac525b0937
tests: add tests for extract_post_query()
...
add test for HttpsUrlRewriter, remove unnecessary check in
bufferedreader
2015-01-11 23:54:29 -08:00
Ilya Kreymer
8449647c5f
wbexception: remove unused status in WbException, set default error for
...
any uncaught exception to 500, instead of 400
2015-01-11 23:53:34 -08:00
Ilya Kreymer
db75bda736
file open() pass: convert all read and write to ensure binary 'b' flag is set ( #56 )
2015-01-11 18:54:11 -08:00
Ilya Kreymer
cf0a21509b
loaders: add to_file_url() for converting between filename and file://,
...
used in live rewrite and tests
2015-01-11 13:05:48 -08:00
Ilya Kreymer
d5c22e3649
test loaders: fix file:// prefix
2015-01-10 15:27:45 -08:00
Ilya Kreymer
1eb0f96f92
windows support work: fix loaders to use pathname2url to convert to
...
file:/// url, use urlopen to open file paths
fix some tests to use universal line breaks
2015-01-10 14:06:15 -08:00
Ilya Kreymer
181c18a1b8
pep8 pass: fix spacing, line length, issues
...
also remove references to obsolete cached_replay, hostnames in pywb_init
2014-12-23 15:14:03 -08:00
Ilya Kreymer
51919ed1e7
replay: make range cache available by default in replay_views since its
...
inited on first use. remove
separate subclass. 'enable_ranges' can be set to false to disable range
cache altogether
improve tests
2014-12-23 14:34:59 -08:00
Ilya Kreymer
0f2c96879c
refactor: split out optional cached replay components into cached_replay,
...
toggleable via 'enable_cache' in config -- regular replayview does not
need any cache info
move add_range() components to statusandheaders from wbrequestresponse
add x-pywb-noredirect' header which disables date related redirect
video replay works w/o cache if supported by frontend (nginx)
2014-12-19 18:40:45 -08:00
Ilya Kreymer
00121aa165
statusandheaders parsing: properly skip multiline bad headers (missing
...
header name and ':'), fixes #49
2014-11-05 20:26:23 -08:00