1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-23 14:42:24 +01:00
pywb/pywb/warc/README.md
2014-02-19 00:13:15 -08:00

33 lines
954 B
Markdown

### pywb.warc
This is the WARC/ARC record loading component of pywb wayback tool suite.
The package provides the following facilities:
* Resolve relative WARC/ARC filenames to a full path based on configurable resolvers
* Resolve 'revisit' records from provided index to find a full record with headers and payload content
* Load WARC/ARC records either locally or via http using http 1.1 range requests
When loading archived content, the format type (WARC vs ARC) and compressed ARCs/WARCs
are decompressed automatically.
No assumption is made about format based on filename, content type
or other external parameters other than the content itself.
### Tests
This package will includes a test suite for loading a variety of WARC and ARC records.
Tests so far:
* Compressed WARC, ARC Records
* Uncompressed ARC Records
* Compressed WARC created by wget 1.14
* Same Url revisit record resolving
TODO:
* Different url revisit record resolving