Ilya Kreymer
daffc7ff5d
header rewrite: pass through 'content-range' header
2014-07-07 17:02:44 -07:00
Ilya Kreymer
02326a2b12
bump dev version to 0.4.8
2014-07-07 17:02:28 -07:00
Ilya Kreymer
7694bf0678
update README.rst for master 0.4.7
2014-07-01 16:22:38 -07:00
Ilya Kreymer
46b16c61d5
update changelist, version to 0.4.7
2014-07-01 16:15:25 -07:00
Ilya Kreymer
2a2240a23a
fix 'bad.cdx' sorting order
2014-07-01 15:36:13 -07:00
Ilya Kreymer
1a42331e69
Merge branch 'develop' into binary-parse
2014-07-01 10:00:05 -07:00
Ilya Kreymer
1980b66127
warc indexing: in include_all mode, pass 'warcinfo' records to writer, allowing it to option to handle or ignore
2014-07-01 09:59:16 -07:00
Ilya Kreymer
57a38dedce
Merge branch 'develop' into binary-parse
2014-06-28 11:53:50 -07:00
Ilya Kreymer
377ea33bc8
tests: add test for wombat top
2014-06-28 11:53:23 -07:00
Ilya Kreymer
b0f7fdbed8
regexrewrite: fix rewrite for 'top'
2014-06-28 11:50:11 -07:00
Ilya Kreymer
f2bfc96002
Merge branch 'develop' into binary-parse
2014-06-28 11:04:43 -07:00
Ilya Kreymer
83b69e8447
indexing: don't include records of type 'application/warc-fields' unless all records are being included
2014-06-28 11:03:44 -07:00
Ilya Kreymer
70b7e29b36
pass raw bytes to htmlparser, assuming ascii-compatibility
...
(todo: add tests for non-ascii compatible encodings)
improved rendering of certain pages, needs more testing
lxml: remove lxml and complexity associated with having the parser,
as its too unpredictable for older html, does its own decoding.
2014-06-27 19:03:06 -07:00
Ilya Kreymer
dd9f138bab
disable decoding, by default, of content for html parser
2014-06-27 16:53:33 -07:00
Ilya Kreymer
fb07775d38
tests: add 'bad.cdx' for testing cdx lines with missing original for revisit,
...
missing/non-existant warc
2014-06-25 12:32:57 -07:00
Ilya Kreymer
913a1e9f31
warc: simplify recordloader a bit more, only response and request records
...
get parsed as http (excluding dns: and whois: uris)
All others have an '-' status and no headers parsing
tests: add test for zero-length revisits
2014-06-25 12:11:26 -07:00
Ilya Kreymer
6761f5697f
indexing: refactor cdxindexer interface to better allow custom writers
...
record loader: skip whois: and dns: records, better skipping of arc headers
(todo: need more unit tests)
2014-06-24 17:08:10 -07:00
Ilya Kreymer
3965fad4dd
cdx indexing: add support for 9-field cdx output,
...
request merge: store referer if available, check for record id matching
2014-06-19 16:51:23 -07:00
Ilya Kreymer
694b97e67f
archive indexing: Refactor, split into ArchiveIterator generic iteration and cdx-indexer,
...
which writes out CDX specifically
recordloader: always load request, limit stream before headers are loaded
2014-06-19 13:37:42 -07:00
Ilya Kreymer
de65b68edc
rules: additions to rules for FB
2014-06-18 16:45:54 -07:00
Ilya Kreymer
22a2da6e0c
rewrite: for WB_wombat_top rewriting, select next-to-top instead of self
2014-06-16 19:42:15 -07:00
Ilya Kreymer
e1c1d23a9f
framed replay: improved url update support, ensure update url is actually
...
the url of the frame (ignore ajax requests)
2014-06-16 18:46:01 -07:00
Ilya Kreymer
ac3efec4bc
update develop to 0.4.6
...
improved regex for top -> WB_wombat_top rewriting
2014-06-16 15:57:22 -07:00
Ilya Kreymer
f26b0ddbe4
update setup.py version
2014-06-15 12:35:20 -07:00
Ilya Kreymer
987a9ee58f
update README for master
2014-06-15 12:34:14 -07:00
Ilya Kreymer
c4e3f25f9a
Merge branch 'develop' for 0.4.5 release
2014-06-15 12:32:47 -07:00
Ilya Kreymer
4767ab0fdd
Update CHANGES.rst to 4.5
2014-06-15 12:09:10 -07:00
Ilya Kreymer
88d3e94b36
fixes for pep8, name fixes
2014-06-15 11:57:48 -07:00
Ilya Kreymer
073f1e142e
test_config: test lxml parser still
2014-06-14 21:33:08 -07:00
Ilya Kreymer
80e80e97d3
replay: support 'framed_replay' option in config for both replay and live rewrite
...
split replay view into BaseContentView and ReplayView
refactor RewriteLiveHandler into RewriteLiveView
add additional tests for framed and non-framed mode
default to framed replay!
2014-06-14 18:26:19 -07:00
Ilya Kreymer
d21f8079ca
cookie rewrite: remove max-age, add test
2014-06-14 10:04:31 -07:00
Ilya Kreymer
ceeb25a899
rewrite: fix unit tests, add extra closed check for 2.6 (not sure why its needed now)
2014-06-14 01:02:00 -07:00
Ilya Kreymer
028e274b22
rewrite tests: improve POST test, only add header if not empty
2014-06-14 00:18:35 -07:00
Ilya Kreymer
d7516f4cd7
rewrite: fix <base> rewriting, urlrewriter replacement
...
turn off lxml rewriter by default
2014-06-13 16:44:37 -07:00
Ilya Kreymer
0d3f663ef1
rewrite: disable refer-redirect in case of POST, handle request w/o redirect
...
(can't use 307 because of FF)
2014-06-13 16:23:11 -07:00
Ilya Kreymer
dfef05a74d
rewrite: live rewrite: switch to including all headers rather than a whitelist for proxying
2014-06-13 16:22:18 -07:00
Ilya Kreymer
41e1809039
update wombat.js (support for write override, fill in WB_wombat_location on new iframe)
...
disable 307 redirects as FF always displays modal confirmation for these, even for same host
2014-06-11 20:12:05 -07:00
Ilya Kreymer
bdafe0938d
remove accidental debug commits
2014-06-11 12:44:49 -07:00
Ilya Kreymer
14ed6c5898
remove accidental changes
2014-06-11 12:42:44 -07:00
Ilya Kreymer
0c9d88f032
POST replay: treat POST form data same as get query, no '&&&' marker
...
additional testing POST
2014-06-11 11:17:06 -07:00
Ilya Kreymer
e2349a74e2
replay: better POST support via post query append!
...
record_loader can optionally parse 'request' records
archiveindexer has -a flag to write all records ('request' included),
-p flag to append post query
post-test.warc.gz and cdx
POST redirects using 307
2014-06-10 19:21:46 -07:00
Ilya Kreymer
028cdaa22e
bump version to 0.4.1
2014-06-05 14:10:30 -07:00
Ilya Kreymer
cf119174ea
rewrite: for rewriting purposes, use original cdx url, not the request url
...
(significance if trailing '/' is present)
2014-06-05 14:09:30 -07:00
Ilya Kreymer
2c65521ea3
final README.rst edits
0.4.0
2014-05-30 12:52:43 -07:00
Ilya Kreymer
18f7031423
add bullet points to README!
2014-05-30 12:45:59 -07:00
Ilya Kreymer
e3bbf95280
merge develop for 0.4.0, update paths to master branch
2014-05-30 12:39:37 -07:00
Ilya Kreymer
05812060c0
Merge branch 'develop'
2014-05-30 12:37:59 -07:00
Ilya Kreymer
6d6f2452fc
update README and CHANGES for release
2014-05-30 12:37:30 -07:00
Ilya Kreymer
9519e8d6f1
Update CHANGES.rst
2014-05-30 12:27:20 -07:00
Ilya Kreymer
f9710d033c
fix integration test for 307
...
update head_insert for new wombat
remove redundant host jinja func, use 'urlsplit' instead
2014-05-30 11:17:12 -07:00