1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 08:04:49 +01:00

13 Commits

Author SHA1 Message Date
Ilya Kreymer
626da99899
POST request handling and indexing improvements (#636)
* post append improvements:
- parse json primitives for post query
- for text/plain, attempt to parse as json, then as binary
- standardize post append indexing
- include '__wb_method' in urlkey
- add 'requestBody' and 'method' to cdxj
- support unique dupe params for json-to-query conversion

* test fixes:
- update tests for test_inputreq,
- update post-test.cdxj and post-test.cdx

* ci: fixes
- tox: run full test suite!
- disable appveyor

* inputrequest buffering fix:
- never truncate reading POST request, must read entire POST data to avoid hung request in live mode
- truncate final query string to 4096
2021-04-27 20:52:24 -07:00
Ilya Kreymer
7e56ca8ca2
RC7 Fixes (#561)
* misc fixes for 2.4.0rc7:
- warcserver: when parsing headers to check for redirect, reserialized headers
may be of different length then original, causing warcserver->app response to hang
now adjusting the content-length on the warc record and also not including a fixed
length when serving warcserver->app, possible fix for ukwa/ukwa-pywb#53
- undo change in path resolvers to use os.path.join, just concatenate full_path + filename
- rewrite 'date' -> 'x-orig-archive-date' header to avoid confusion (eg. #548)
- bump version to rc7

* ci: attempt to fix travis build for 27, 35
2020-04-30 22:39:47 -07:00
Ilya Kreymer
9eba59d8b4 warcserver: resource load: only read headers for self-redirect for response or revisit records
tests: add test with resource record (new warc/cdxj) to ensure correct read of resource records
2017-11-30 14:13:47 -08:00
Ilya Kreymer
2a2240a23a fix 'bad.cdx' sorting order 2014-07-01 15:36:13 -07:00
Ilya Kreymer
fb07775d38 tests: add 'bad.cdx' for testing cdx lines with missing original for revisit,
missing/non-existant warc
2014-06-25 12:32:57 -07:00
Ilya Kreymer
913a1e9f31 warc: simplify recordloader a bit more, only response and request records
get parsed as http (excluding dns: and whois: uris)
All others have an '-' status and no headers parsing
tests: add test for zero-length revisits
2014-06-25 12:11:26 -07:00
Ilya Kreymer
0c9d88f032 POST replay: treat POST form data same as get query, no '&&&' marker
additional testing POST
2014-06-11 11:17:06 -07:00
Ilya Kreymer
e2349a74e2 replay: better POST support via post query append!
record_loader can optionally parse 'request' records
archiveindexer has -a flag to write all records ('request' included),
-p flag to append post query
post-test.warc.gz and cdx
POST redirects using 307
2014-06-10 19:21:46 -07:00
Ilya Kreymer
79da12348f limit stream by warc/arc record length instead of
http content length.
track length of StatusAndHeaders also.
add tests to verify content length correct for identity
arc and arcgz replays as well
2014-03-22 11:30:51 -07:00
Ilya Kreymer
d702a98bbc url-agnostic revisit testing!
add sample warc and cdx for url-agnostic revisits
add unit test and integration test
resolvingloader: pass callback instead of full cdx server
for use for loading cdx in case of url-agnostic revisit
2014-03-04 20:12:09 +00:00
Ilya Kreymer
47271bbfab remove extra .gz file, change test to use zipnum file instead 2014-03-02 08:55:26 -08:00
Ilya Kreymer
5345459298 pywb 0.2!
move to distinct packages: pywb.utils, pywb.cdx, pywb.warc, pywb.util, pywb.rewrite!
each package will have its own README and tests
shared sample_data and install
2014-02-17 10:01:09 -08:00
Ilya Kreymer
43a46b373d move sample/test data to ./sample_archive/warcs and ./sample_archive/cdx
pywb_init now driven by config.yaml! (#14)

Not yet supporting customized handlers, views, etc...
2014-01-28 22:03:01 -08:00