1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

441 Commits

Author SHA1 Message Date
Ilya Kreymer
377ea33bc8 tests: add test for wombat top 2014-06-28 11:53:23 -07:00
Ilya Kreymer
b0f7fdbed8 regexrewrite: fix rewrite for 'top' 2014-06-28 11:50:11 -07:00
Ilya Kreymer
f2bfc96002 Merge branch 'develop' into binary-parse 2014-06-28 11:04:43 -07:00
Ilya Kreymer
83b69e8447 indexing: don't include records of type 'application/warc-fields' unless all records are being included 2014-06-28 11:03:44 -07:00
Ilya Kreymer
70b7e29b36 pass raw bytes to htmlparser, assuming ascii-compatibility
(todo: add tests for non-ascii compatible encodings)
improved rendering of certain pages, needs more testing

lxml: remove lxml and complexity associated with having the parser,
as its too unpredictable for older html, does its own decoding.
2014-06-27 19:03:06 -07:00
Ilya Kreymer
dd9f138bab disable decoding, by default, of content for html parser 2014-06-27 16:53:33 -07:00
Ilya Kreymer
fb07775d38 tests: add 'bad.cdx' for testing cdx lines with missing original for revisit,
missing/non-existant warc
2014-06-25 12:32:57 -07:00
Ilya Kreymer
913a1e9f31 warc: simplify recordloader a bit more, only response and request records
get parsed as http (excluding dns: and whois: uris)
All others have an '-' status and no headers parsing
tests: add test for zero-length revisits
2014-06-25 12:11:26 -07:00
Ilya Kreymer
6761f5697f indexing: refactor cdxindexer interface to better allow custom writers
record loader: skip whois: and dns: records, better skipping of arc headers
(todo: need more unit tests)
2014-06-24 17:08:10 -07:00
Ilya Kreymer
3965fad4dd cdx indexing: add support for 9-field cdx output,
request merge: store referer if available, check for record id matching
2014-06-19 16:51:23 -07:00
Ilya Kreymer
694b97e67f archive indexing: Refactor, split into ArchiveIterator generic iteration and cdx-indexer,
which writes out CDX specifically
recordloader: always load request, limit stream before headers are loaded
2014-06-19 13:37:42 -07:00
Ilya Kreymer
de65b68edc rules: additions to rules for FB 2014-06-18 16:45:54 -07:00
Ilya Kreymer
22a2da6e0c rewrite: for WB_wombat_top rewriting, select next-to-top instead of self 2014-06-16 19:42:15 -07:00
Ilya Kreymer
e1c1d23a9f framed replay: improved url update support, ensure update url is actually
the url of the frame (ignore ajax requests)
2014-06-16 18:46:01 -07:00
Ilya Kreymer
ac3efec4bc update develop to 0.4.6
improved regex for top -> WB_wombat_top rewriting
2014-06-16 15:57:22 -07:00
Ilya Kreymer
f26b0ddbe4 update setup.py version 2014-06-15 12:35:20 -07:00
Ilya Kreymer
987a9ee58f update README for master 2014-06-15 12:34:14 -07:00
Ilya Kreymer
c4e3f25f9a Merge branch 'develop' for 0.4.5 release 2014-06-15 12:32:47 -07:00
Ilya Kreymer
4767ab0fdd Update CHANGES.rst to 4.5 2014-06-15 12:09:10 -07:00
Ilya Kreymer
88d3e94b36 fixes for pep8, name fixes 2014-06-15 11:57:48 -07:00
Ilya Kreymer
073f1e142e test_config: test lxml parser still 2014-06-14 21:33:08 -07:00
Ilya Kreymer
80e80e97d3 replay: support 'framed_replay' option in config for both replay and live rewrite
split replay view into BaseContentView and ReplayView
refactor RewriteLiveHandler into RewriteLiveView
add additional tests for framed and non-framed mode
default to framed replay!
2014-06-14 18:26:19 -07:00
Ilya Kreymer
d21f8079ca cookie rewrite: remove max-age, add test 2014-06-14 10:04:31 -07:00
Ilya Kreymer
ceeb25a899 rewrite: fix unit tests, add extra closed check for 2.6 (not sure why its needed now) 2014-06-14 01:02:00 -07:00
Ilya Kreymer
028e274b22 rewrite tests: improve POST test, only add header if not empty 2014-06-14 00:18:35 -07:00
Ilya Kreymer
d7516f4cd7 rewrite: fix <base> rewriting, urlrewriter replacement
turn off lxml rewriter by default
2014-06-13 16:44:37 -07:00
Ilya Kreymer
0d3f663ef1 rewrite: disable refer-redirect in case of POST, handle request w/o redirect
(can't use 307 because of FF)
2014-06-13 16:23:11 -07:00
Ilya Kreymer
dfef05a74d rewrite: live rewrite: switch to including all headers rather than a whitelist for proxying 2014-06-13 16:22:18 -07:00
Ilya Kreymer
41e1809039 update wombat.js (support for write override, fill in WB_wombat_location on new iframe)
disable 307 redirects as FF always displays modal confirmation for these, even for same host
2014-06-11 20:12:05 -07:00
Ilya Kreymer
bdafe0938d remove accidental debug commits 2014-06-11 12:44:49 -07:00
Ilya Kreymer
14ed6c5898 remove accidental changes 2014-06-11 12:42:44 -07:00
Ilya Kreymer
0c9d88f032 POST replay: treat POST form data same as get query, no '&&&' marker
additional testing POST
2014-06-11 11:17:06 -07:00
Ilya Kreymer
e2349a74e2 replay: better POST support via post query append!
record_loader can optionally parse 'request' records
archiveindexer has -a flag to write all records ('request' included),
-p flag to append post query
post-test.warc.gz and cdx
POST redirects using 307
2014-06-10 19:21:46 -07:00
Ilya Kreymer
028cdaa22e bump version to 0.4.1 2014-06-05 14:10:30 -07:00
Ilya Kreymer
cf119174ea rewrite: for rewriting purposes, use original cdx url, not the request url
(significance if trailing '/' is present)
2014-06-05 14:09:30 -07:00
Ilya Kreymer
2c65521ea3 final README.rst edits 0.4.0 2014-05-30 12:52:43 -07:00
Ilya Kreymer
18f7031423 add bullet points to README! 2014-05-30 12:45:59 -07:00
Ilya Kreymer
e3bbf95280 merge develop for 0.4.0, update paths to master branch 2014-05-30 12:39:37 -07:00
Ilya Kreymer
05812060c0 Merge branch 'develop' 2014-05-30 12:37:59 -07:00
Ilya Kreymer
6d6f2452fc update README and CHANGES for release 2014-05-30 12:37:30 -07:00
Ilya Kreymer
9519e8d6f1 Update CHANGES.rst 2014-05-30 12:27:20 -07:00
Ilya Kreymer
f9710d033c fix integration test for 307
update head_insert for new wombat
remove redundant host jinja func, use 'urlsplit' instead
2014-05-30 11:17:12 -07:00
Ilya Kreymer
52040127b3 update wombat.js to latest
rewrite live: add another rewrite live header,
use 307 for archival referer based redirects
2014-05-30 11:03:22 -07:00
Ilya Kreymer
de69372b9f Update CHANGES.rst 2014-05-30 10:54:17 -07:00
Ilya Kreymer
9340165014 Changes for 0.4.0 2014-05-30 10:52:59 -07:00
Ilya Kreymer
eaf9cce261 Update README.rst
update for 0.4.0
2014-05-30 10:29:22 -07:00
Ilya Kreymer
9b732def93 cookie_rewriting: if domain is specified, apply cookie to coll root
rather than rewritten path.. needed in order for subdomain cookies to be
detected properly
2014-05-18 21:51:07 -07:00
Ilya Kreymer
8c15ac16fd search page template: add 'prefix' to search page template 2014-05-18 21:27:53 -07:00
Ilya Kreymer
1d674d97d8 pep8 pass! 2014-05-16 22:44:26 -07:00
Ilya Kreymer
923421d637 rewrite_content: add a few tests for cs_, js_, remove redundant except 2014-05-16 22:43:53 -07:00