1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-04-02 20:16:14 +02:00

80 Commits

Author SHA1 Message Date
Mat Kelly
96da397456 Quick comment fix 2016-03-04 11:17:35 -05:00
Ilya Kreymer
79d5ec2b2d statusheaders: when not verifying protocol line, avoid indexerror when no space in first line, add tests 2015-12-18 21:46:00 -08:00
Ilya Kreymer
75085ad91b loaders: fix loader inits, don't inherit from BlockLoader #135 2015-10-20 10:33:24 -07:00
Ilya Kreymer
94095e452a loaders: refactor BlockLoader to use an extensible dict of loaders
individual HttpLoader, LocalFileLoader and S3Loader supported by default
Loaders created via BlockLoader also cached for reuse, closes #135
2015-10-19 11:59:35 -07:00
Ilya Kreymer
db4fbe79ec tests: add test for BufferedReader 'deflate' (w/o gzip header) 2015-10-11 17:47:19 -07:00
Ilya Kreymer
c3aab1514c query/cdx: support from and to cdx query arguments, support ranged calendar query,
eg. /[from]*[to]/[url] or /[from]-[to]/[url], with both from and to optional, closes #130
exposes lower and upper bound timestamps in timeutils, pad_timestamp
2015-10-07 10:44:12 -07:00
Ilya Kreymer
e435242d38 wombat: Date: fixes to Date override, guard against double override
document.write: use shared rewrite_html() method, issue single write call
loaders: read_http() don't use range request if no range is set
2015-07-17 18:40:25 -07:00
Ilya Kreymer
2d0c526053 post handling: when reading post data in extract_post_query(), add optional buffer_stream which would hold the original POST
data. This is necessary to override the `wsgi.input` to allow the post data to be read again via a fallback handler, even
after reading POST query data in replay handler, addresses #117
2015-06-25 15:58:58 -07:00
Ilya Kreymer
06fcc89de6 readers: support 'content-encoding: deflate' using different zlib decompression options
support default and alt settings for attempting to decompress deflate stream
tests: add tests with httpbin.org/deflate Fixes #115
2015-06-24 13:11:33 -07:00
Ilya Kreymer
08064f3806 warc load: make http response/request protocol/verb validation optional
enabled for replay, disabled by default for cdx-indexing, though can
be enabled with -v option #99
2015-04-20 08:29:18 -07:00
Ilya Kreymer
1d49a9fd3b tests: improved tests for loaders module 2015-04-17 11:02:57 -07:00
Ilya Kreymer
52a7dd87c6 loaders: s3: import boto just once, store s3_avail flag 2015-04-17 11:02:57 -07:00
Ilya Kreymer
c8a9a3ddd4 loaders: add support for loading from s3:// using boto
if auth connection fails, attempt anon connection, #97
2015-04-17 11:02:57 -07:00
Ilya Kreymer
c378cb5188 rewrite: check for closed before any use of readline() (2.6 may throw if closed),
only use readline() if line alignment needed (non-html), related to #86 work
2015-04-01 07:54:17 -07:00
Ilya Kreymer
8e60a6464c chunkeddatareader: read(): catch ValueError when attempting to read again in case stream is already closed 2015-03-31 23:31:49 -07:00
Ilya Kreymer
199f552f73 rewrite: if no charset specified, attempt to read first 1024 bytes and set charset in header,
to avoid charset warning if head insert exceeds 1024 bytes (#86)
also encode head insert with detected charset, if possible
chunkeddatareader: add read() function to ensure read will read upto specified
length across chunks
2015-03-31 22:38:20 -07:00
Ilya Kreymer
30ab27bb1c indexing: support indexing (and even replay of) records where target-uri is a 'urn:' identifier (#91)
for canonicalzation, treat urns as is, already canonical
for wburl, don't add http:// prefix if urn: prefix is present
add example-wpull warc for testing
2015-03-30 17:23:50 -07:00
Ilya Kreymer
fc9d659b5d loaders: switch BlockLoader to use requests instead of urliib2 2015-03-28 16:41:52 -07:00
Ilya Kreymer
2af5a25009 zipnum: support for pagination api! #34 and #83. cdx server now bounded by pageSize (default 10 blocks),
showNumPages=true returns json indicating num pages, page=N can be set to page number 0-numPages - 1
loaders: add read_last_line() to read last line of a seekable file, used to read last line of index file when
at end
tests: additional test for binsearch boundary conditions
zipnum: secondary index output supports json also
2015-03-24 18:56:13 -07:00
Ilya Kreymer
b417b47835 collections manager: support for merge when adding warc, explicit --index-warcs
option to index and merge instead of reindexing whole dir, #74
additional testing for recursive indexing, index merge
timeutils: add timestamp20_now() function
2015-03-14 14:56:15 -07:00
Ilya Kreymer
499e21233e statusandheaders: make protocol check case-insensitive, eg. accept HTTP/1.0 and http/1.0 for better compatibility 2015-03-07 11:37:06 -08:00
Ilya Kreymer
80dcb6ff27 rewrite: improvements to non-exact replay mode, redir_to_exact option set to false
frames: add request_ts to wbinfo and use that as the timestamp in the top-frame. for exact replay, request_ts == timestamp
for latest replay / no timestamp / memento timegate, redirect to current time instead of time of last capture, while serving
last capture.
timeutils: add timestamp_now() function to return timestamp of current datetime
Add extra tests for this mode
Tracked via #72
2015-02-17 17:51:45 -08:00
Ilya Kreymer
ac525b0937 tests: add tests for extract_post_query()
add test for HttpsUrlRewriter, remove unnecessary check in
bufferedreader
2015-01-11 23:54:29 -08:00
Ilya Kreymer
8449647c5f wbexception: remove unused status in WbException, set default error for
any uncaught exception to 500, instead of 400
2015-01-11 23:53:34 -08:00
Ilya Kreymer
db75bda736 file open() pass: convert all read and write to ensure binary 'b' flag is set (#56) 2015-01-11 18:54:11 -08:00
Ilya Kreymer
cf0a21509b loaders: add to_file_url() for converting between filename and file://,
used in live rewrite and tests
2015-01-11 13:05:48 -08:00
Ilya Kreymer
d5c22e3649 test loaders: fix file:// prefix 2015-01-10 15:27:45 -08:00
Ilya Kreymer
1eb0f96f92 windows support work: fix loaders to use pathname2url to convert to
file:/// url, use urlopen to open file paths
fix some tests to use universal line breaks
2015-01-10 14:06:15 -08:00
Ilya Kreymer
181c18a1b8 pep8 pass: fix spacing, line length, issues
also remove references to obsolete cached_replay, hostnames in pywb_init
2014-12-23 15:14:03 -08:00
Ilya Kreymer
51919ed1e7 replay: make range cache available by default in replay_views since its
inited on first use. remove
separate subclass. 'enable_ranges' can be set to false to disable range
cache altogether
improve tests
2014-12-23 14:34:59 -08:00
Ilya Kreymer
0f2c96879c refactor: split out optional cached replay components into cached_replay,
toggleable via 'enable_cache' in config -- regular replayview does not
need any cache info
move add_range() components to statusandheaders from wbrequestresponse
add x-pywb-noredirect' header which disables date related redirect
video replay works w/o cache if supported by frontend (nginx)
2014-12-19 18:40:45 -08:00
Ilya Kreymer
00121aa165 statusandheaders parsing: properly skip multiline bad headers (missing
header name and ':'), fixes #49
2014-11-05 20:26:23 -08:00
Ilya Kreymer
e8d3965269 pep8 style fixes, remove unused methods 2014-10-21 19:06:16 -07:00
Ilya Kreymer
50bf7d2634 rewrite: move extract_client_cookie to utils for access at rewrite
root cookie_rewriter: keep max-age
add csrf token copying (experimental)
update tests
2014-10-12 03:07:54 -07:00
Ilya Kreymer
a2d86fa495 Merge branch 'develop' into https-proxy 2014-08-04 22:01:16 -07:00
Ilya Kreymer
e1e8f679b2 rewrite/testing: add additional test for live rewrite post, invalid post
htmlrewrite: annotate untestable sections (unimplemented, 2.6 only exceptions)
2014-08-04 21:59:46 -07:00
Ilya Kreymer
71e8ada57d rewrite: add test for banner-only mode, rewriting w/o a head using local
'sample_no_head' file.
query.html: use client side rewriting for calendar dates
rewrite: remove unused decode stuff
2014-08-04 20:45:02 -07:00
Ilya Kreymer
92726309fc proxy: add 'extra_headers' to be added to proxy responses, customizable via proxy_options
defaults include no-cache and p3p policy (needed for IE default settings)
fix link generation for proxy_select page, better exception handling of ssl errors
2014-08-02 04:27:51 -07:00
Ilya Kreymer
0c9d88f032 POST replay: treat POST form data same as get query, no '&&&' marker
additional testing POST
2014-06-11 11:17:06 -07:00
Ilya Kreymer
e2349a74e2 replay: better POST support via post query append!
record_loader can optionally parse 'request' records
archiveindexer has -a flag to write all records ('request' included),
-p flag to append post query
post-test.warc.gz and cdx
POST redirects using 307
2014-06-10 19:21:46 -07:00
Ilya Kreymer
2600d870d7 improved test: dsrules remove redundant check
static: check invalid static paths and file_wrapper
memento: check non-memento paths
test debug handlers and custom '-cdx' suffix
2014-05-16 22:17:51 -07:00
Ilya Kreymer
89da165467 exceptions: add optional url param to WbException, move handler_exception()
into WSGIApp for easier customization
2014-05-13 01:54:12 -07:00
Ilya Kreymer
e7957a5cae remove SeekableTextFileReader, replaced with standard file-like objects
and seek(0, 2) and tell() to get file length
2014-05-06 20:54:42 -07:00
Ilya Kreymer
9cf5327e88 bufferedreader cleanup:
* BufferedReader defaults to no decompression
* DecompressingBufferedReader defaults to gzip decomp
* ChunkedDataReader defaults to no gzip decomp, but decomp
can be set later via set_decomp().
This allow chunked responses to be de-chunked but not decompressed
(eg for non-text responses)
2014-04-28 20:15:31 -07:00
Ilya Kreymer
e4262502b0 fix ChunkedDataReader chunked + gzip decomp: if reading one chunk yields no data
(due to more data being needed for gzip decomp), keep reading more blocks until there is data
or last block is reached (or error). Ensure a single read() call will return some data if there is any
2014-04-25 10:30:22 -07:00
Ilya Kreymer
cd017669ae bugfix: ChunkedDataReader handles zero-length chunk properly, add test 2014-04-23 10:00:25 -07:00
Ilya Kreymer
bfc2e63793 live rewriter: integrate handler with rewrite_live.py module,
clean up css, add unit and integration tests
clean up cli server now known as 'live-rewrite-server', which performs live rewrite using
iframe paradigm
2014-04-09 15:49:55 -07:00
Ilya Kreymer
b4f30a770f ChunkDataReader: if determined to be non-chunked, read full buffer
unchunked
2014-04-09 15:49:55 -07:00
Ilya Kreymer
8897a0a7c9 decompressingbufferedreader: default to 'gzip' decompression instead of
none. ChunkedDataReader also automatically attempts decompression, by default
Add tests to verify
2014-04-08 21:49:04 -07:00
Ilya Kreymer
02fe78cb0b update changes, add more tests 2014-04-07 17:41:14 -07:00