Ilya Kreymer
1980b66127
warc indexing: in include_all mode, pass 'warcinfo' records to writer, allowing it to option to handle or ignore
2014-07-01 09:59:16 -07:00
Ilya Kreymer
377ea33bc8
tests: add test for wombat top
2014-06-28 11:53:23 -07:00
Ilya Kreymer
b0f7fdbed8
regexrewrite: fix rewrite for 'top'
2014-06-28 11:50:11 -07:00
Ilya Kreymer
83b69e8447
indexing: don't include records of type 'application/warc-fields' unless all records are being included
2014-06-28 11:03:44 -07:00
Ilya Kreymer
fb07775d38
tests: add 'bad.cdx' for testing cdx lines with missing original for revisit,
...
missing/non-existant warc
2014-06-25 12:32:57 -07:00
Ilya Kreymer
913a1e9f31
warc: simplify recordloader a bit more, only response and request records
...
get parsed as http (excluding dns: and whois: uris)
All others have an '-' status and no headers parsing
tests: add test for zero-length revisits
2014-06-25 12:11:26 -07:00
Ilya Kreymer
6761f5697f
indexing: refactor cdxindexer interface to better allow custom writers
...
record loader: skip whois: and dns: records, better skipping of arc headers
(todo: need more unit tests)
2014-06-24 17:08:10 -07:00
Ilya Kreymer
3965fad4dd
cdx indexing: add support for 9-field cdx output,
...
request merge: store referer if available, check for record id matching
2014-06-19 16:51:23 -07:00
Ilya Kreymer
694b97e67f
archive indexing: Refactor, split into ArchiveIterator generic iteration and cdx-indexer,
...
which writes out CDX specifically
recordloader: always load request, limit stream before headers are loaded
2014-06-19 13:37:42 -07:00
Ilya Kreymer
de65b68edc
rules: additions to rules for FB
2014-06-18 16:45:54 -07:00
Ilya Kreymer
22a2da6e0c
rewrite: for WB_wombat_top rewriting, select next-to-top instead of self
2014-06-16 19:42:15 -07:00
Ilya Kreymer
e1c1d23a9f
framed replay: improved url update support, ensure update url is actually
...
the url of the frame (ignore ajax requests)
2014-06-16 18:46:01 -07:00
Ilya Kreymer
ac3efec4bc
update develop to 0.4.6
...
improved regex for top -> WB_wombat_top rewriting
2014-06-16 15:57:22 -07:00
Ilya Kreymer
f26b0ddbe4
update setup.py version
2014-06-15 12:35:20 -07:00
Ilya Kreymer
987a9ee58f
update README for master
2014-06-15 12:34:14 -07:00
Ilya Kreymer
c4e3f25f9a
Merge branch 'develop' for 0.4.5 release
2014-06-15 12:32:47 -07:00
Ilya Kreymer
4767ab0fdd
Update CHANGES.rst to 4.5
2014-06-15 12:09:10 -07:00
Ilya Kreymer
88d3e94b36
fixes for pep8, name fixes
2014-06-15 11:57:48 -07:00
Ilya Kreymer
073f1e142e
test_config: test lxml parser still
2014-06-14 21:33:08 -07:00
Ilya Kreymer
80e80e97d3
replay: support 'framed_replay' option in config for both replay and live rewrite
...
split replay view into BaseContentView and ReplayView
refactor RewriteLiveHandler into RewriteLiveView
add additional tests for framed and non-framed mode
default to framed replay!
2014-06-14 18:26:19 -07:00
Ilya Kreymer
d21f8079ca
cookie rewrite: remove max-age, add test
2014-06-14 10:04:31 -07:00
Ilya Kreymer
ceeb25a899
rewrite: fix unit tests, add extra closed check for 2.6 (not sure why its needed now)
2014-06-14 01:02:00 -07:00
Ilya Kreymer
028e274b22
rewrite tests: improve POST test, only add header if not empty
2014-06-14 00:18:35 -07:00
Ilya Kreymer
d7516f4cd7
rewrite: fix <base> rewriting, urlrewriter replacement
...
turn off lxml rewriter by default
2014-06-13 16:44:37 -07:00
Ilya Kreymer
0d3f663ef1
rewrite: disable refer-redirect in case of POST, handle request w/o redirect
...
(can't use 307 because of FF)
2014-06-13 16:23:11 -07:00
Ilya Kreymer
dfef05a74d
rewrite: live rewrite: switch to including all headers rather than a whitelist for proxying
2014-06-13 16:22:18 -07:00
Ilya Kreymer
41e1809039
update wombat.js (support for write override, fill in WB_wombat_location on new iframe)
...
disable 307 redirects as FF always displays modal confirmation for these, even for same host
2014-06-11 20:12:05 -07:00
Ilya Kreymer
bdafe0938d
remove accidental debug commits
2014-06-11 12:44:49 -07:00
Ilya Kreymer
14ed6c5898
remove accidental changes
2014-06-11 12:42:44 -07:00
Ilya Kreymer
0c9d88f032
POST replay: treat POST form data same as get query, no '&&&' marker
...
additional testing POST
2014-06-11 11:17:06 -07:00
Ilya Kreymer
e2349a74e2
replay: better POST support via post query append!
...
record_loader can optionally parse 'request' records
archiveindexer has -a flag to write all records ('request' included),
-p flag to append post query
post-test.warc.gz and cdx
POST redirects using 307
2014-06-10 19:21:46 -07:00
Ilya Kreymer
028cdaa22e
bump version to 0.4.1
2014-06-05 14:10:30 -07:00
Ilya Kreymer
cf119174ea
rewrite: for rewriting purposes, use original cdx url, not the request url
...
(significance if trailing '/' is present)
2014-06-05 14:09:30 -07:00
Ilya Kreymer
2c65521ea3
final README.rst edits
0.4.0
2014-05-30 12:52:43 -07:00
Ilya Kreymer
18f7031423
add bullet points to README!
2014-05-30 12:45:59 -07:00
Ilya Kreymer
e3bbf95280
merge develop for 0.4.0, update paths to master branch
2014-05-30 12:39:37 -07:00
Ilya Kreymer
05812060c0
Merge branch 'develop'
2014-05-30 12:37:59 -07:00
Ilya Kreymer
6d6f2452fc
update README and CHANGES for release
2014-05-30 12:37:30 -07:00
Ilya Kreymer
9519e8d6f1
Update CHANGES.rst
2014-05-30 12:27:20 -07:00
Ilya Kreymer
f9710d033c
fix integration test for 307
...
update head_insert for new wombat
remove redundant host jinja func, use 'urlsplit' instead
2014-05-30 11:17:12 -07:00
Ilya Kreymer
52040127b3
update wombat.js to latest
...
rewrite live: add another rewrite live header,
use 307 for archival referer based redirects
2014-05-30 11:03:22 -07:00
Ilya Kreymer
de69372b9f
Update CHANGES.rst
2014-05-30 10:54:17 -07:00
Ilya Kreymer
9340165014
Changes for 0.4.0
2014-05-30 10:52:59 -07:00
Ilya Kreymer
eaf9cce261
Update README.rst
...
update for 0.4.0
2014-05-30 10:29:22 -07:00
Ilya Kreymer
9b732def93
cookie_rewriting: if domain is specified, apply cookie to coll root
...
rather than rewritten path.. needed in order for subdomain cookies to be
detected properly
2014-05-18 21:51:07 -07:00
Ilya Kreymer
8c15ac16fd
search page template: add 'prefix' to search page template
2014-05-18 21:27:53 -07:00
Ilya Kreymer
1d674d97d8
pep8 pass!
2014-05-16 22:44:26 -07:00
Ilya Kreymer
923421d637
rewrite_content: add a few tests for cs_, js_, remove redundant except
2014-05-16 22:43:53 -07:00
Ilya Kreymer
2600d870d7
improved test: dsrules remove redundant check
...
static: check invalid static paths and file_wrapper
memento: check non-memento paths
test debug handlers and custom '-cdx' suffix
2014-05-16 22:17:51 -07:00
Ilya Kreymer
ca33287051
test: move non-surt-cdx sample to non-surt-cdx/ dir for clarity / avoid confusion
...
when bulk loading cdx/ dir (surt and non-surt cdx should NOT be mixed)
2014-05-16 21:21:14 -07:00