1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-25 23:47:47 +01:00

10 Commits

Author SHA1 Message Date
Ilya Kreymer
181c18a1b8 pep8 pass: fix spacing, line length, issues
also remove references to obsolete cached_replay, hostnames in pywb_init
2014-12-23 15:14:03 -08:00
Ilya Kreymer
49e98e0cdc archiveiterator/cdxindexer: cleaner load path for compressed and
uncompressed, ability to distinguish between chunked and non-chunked
warcs/arcs
Raise error for non-chunked gzip warcs as they can not be indexed for
replay, addressing #48
add 'bad' non-chunked gzip file for testing, using custom ext
2014-11-06 01:32:42 -08:00
Ilya Kreymer
841fd3f7b4 warc: add ability to set read block size (def 16384) in archiveiterator 2014-11-01 13:29:37 -07:00
Ilya Kreymer
61ce53a0e0 warc/cdx: include metadata and resource records in default cdx index
emit 200 and 204 responses for metadata and resource, though write '-'
to cdx (for compatibility for now)
include content-length in resource/metadata records
2014-10-28 10:29:50 -07:00
Ilya Kreymer
fa813bdd19 pep8 cleanup pass 2014-07-20 18:26:16 -07:00
Ilya Kreymer
1980b66127 warc indexing: in include_all mode, pass 'warcinfo' records to writer, allowing it to option to handle or ignore 2014-07-01 09:59:16 -07:00
Ilya Kreymer
83b69e8447 indexing: don't include records of type 'application/warc-fields' unless all records are being included 2014-06-28 11:03:44 -07:00
Ilya Kreymer
6761f5697f indexing: refactor cdxindexer interface to better allow custom writers
record loader: skip whois: and dns: records, better skipping of arc headers
(todo: need more unit tests)
2014-06-24 17:08:10 -07:00
Ilya Kreymer
3965fad4dd cdx indexing: add support for 9-field cdx output,
request merge: store referer if available, check for record id matching
2014-06-19 16:51:23 -07:00
Ilya Kreymer
694b97e67f archive indexing: Refactor, split into ArchiveIterator generic iteration and cdx-indexer,
which writes out CDX specifically
recordloader: always load request, limit stream before headers are loaded
2014-06-19 13:37:42 -07:00