backup/pywb - pywb - Source code and issue tracker for Open Eggbert

mirror of https://github.com/webrecorder/pywb.git synced 2025-03-25 23:47:47 +01:00

Author	SHA1	Message	Date
Ilya Kreymer	181c18a1b8	pep8 pass: fix spacing, line length, issues also remove references to obsolete cached_replay, hostnames in pywb_init	2014-12-23 15:14:03 -08:00
Ilya Kreymer	49e98e0cdc	archiveiterator/cdxindexer: cleaner load path for compressed and uncompressed, ability to distinguish between chunked and non-chunked warcs/arcs Raise error for non-chunked gzip warcs as they can not be indexed for replay, addressing #48 add 'bad' non-chunked gzip file for testing, using custom ext	2014-11-06 01:32:42 -08:00
Ilya Kreymer	841fd3f7b4	warc: add ability to set read block size (def 16384) in archiveiterator	2014-11-01 13:29:37 -07:00
Ilya Kreymer	61ce53a0e0	warc/cdx: include metadata and resource records in default cdx index emit 200 and 204 responses for metadata and resource, though write '-' to cdx (for compatibility for now) include content-length in resource/metadata records	2014-10-28 10:29:50 -07:00
Ilya Kreymer	fa813bdd19	pep8 cleanup pass	2014-07-20 18:26:16 -07:00
Ilya Kreymer	1980b66127	warc indexing: in include_all mode, pass 'warcinfo' records to writer, allowing it to option to handle or ignore	2014-07-01 09:59:16 -07:00
Ilya Kreymer	83b69e8447	indexing: don't include records of type 'application/warc-fields' unless all records are being included	2014-06-28 11:03:44 -07:00
Ilya Kreymer	6761f5697f	indexing: refactor cdxindexer interface to better allow custom writers record loader: skip whois: and dns: records, better skipping of arc headers (todo: need more unit tests)	2014-06-24 17:08:10 -07:00
Ilya Kreymer	3965fad4dd	cdx indexing: add support for 9-field cdx output, request merge: store referer if available, check for record id matching	2014-06-19 16:51:23 -07:00
Ilya Kreymer	694b97e67f	archive indexing: Refactor, split into ArchiveIterator generic iteration and cdx-indexer, which writes out CDX specifically recordloader: always load request, limit stream before headers are loaded	2014-06-19 13:37:42 -07:00

10 Commits