1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-21 19:12:10 +01:00

18 Commits

Author SHA1 Message Date
Sara Tavares
c441d83435
chore(typos): fix typos across codebase (#811)
Co-authored-by: stavares843 <stavares843@users.noreply.github.com>
2023-02-15 13:04:20 -05:00
oskarhek
3050fd2b2b
issue_792 catch warcio exception (#793) 2023-01-05 17:15:49 -05:00
Lauren Ko
5c35a43dac
Modify examples in cdx-indexer help text to do as stated (#683) 2021-12-07 16:09:44 -08:00
Ilya Kreymer
626da99899
POST request handling and indexing improvements (#636)
* post append improvements:
- parse json primitives for post query
- for text/plain, attempt to parse as json, then as binary
- standardize post append indexing
- include '__wb_method' in urlkey
- add 'requestBody' and 'method' to cdxj
- support unique dupe params for json-to-query conversion

* test fixes:
- update tests for test_inputreq,
- update post-test.cdxj and post-test.cdx

* ci: fixes
- tox: run full test suite!
- disable appveyor

* inputrequest buffering fix:
- never truncate reading POST request, must read entire POST data to avoid hung request in live mode
- truncate final query string to 4096
2021-04-27 20:52:24 -07:00
Ilya Kreymer
4683d95580
cdx sorted output: switch to default list.sort() for cdx output, fixes #608 (#609) 2021-01-26 14:35:30 -08:00
Lauren Ko
b66608c5f3
Handle Content-Type multipart/form-data without boundary (#599)
* Handle Content-Type multipart/form-data without boundary

* Add tests for multipart/form-data change
2020-12-16 19:00:02 -08:00
Ilya Kreymer
9b8c187b3a
2.4.2 Develop->Master (#572)
* ensure that the RemoteCDXIndexSource also adds a 'matchType=' param, fix for ukwa-pywb/ukwa#57

* 2.4.2 fixes:
- cdxindexer: don't treat first param as output, require '-o <output>' instead, update tests
- cleanup: move url-polyfill.min.js to correct static dir, addresses #571
- update to latest wombat
- move logo to ./pywb/static, fix README path
- tests: update indexing tests for cdx-indexer fix
- bump version to 2.4.2
- Fix link in access-control docs to use RST instead of MD syntax (#568) (by @machawk1)
2020-07-10 20:22:58 -07:00
Ilya Kreymer
7e56ca8ca2
RC7 Fixes (#561)
* misc fixes for 2.4.0rc7:
- warcserver: when parsing headers to check for redirect, reserialized headers
may be of different length then original, causing warcserver->app response to hang
now adjusting the content-length on the warc record and also not including a fixed
length when serving warcserver->app, possible fix for ukwa/ukwa-pywb#53
- undo change in path resolvers to use os.path.join, just concatenate full_path + filename
- rewrite 'date' -> 'x-orig-archive-date' header to avoid confusion (eg. #548)
- bump version to rc7

* ci: attempt to fix travis build for 27, 35
2020-04-30 22:39:47 -07:00
Ilya Kreymer
cf5aceb4f5
HTML Unescape Improvements (#500)
* html-unescape fix:
- unescape any url that contains '&#' as it may be html-encoded
- unescape css blocks that contain '&#' as well, as they may contain css urls that need rewriting
* misc fixes:
- Update CHANGES
- Update to latest wombat
- Update reqs to surt 0.3.1, fix tests
2019-08-22 18:35:32 -07:00
Ilya Kreymer
455efb17ad
Support for default timestamp/date for proxy mode (#454)
* proxy: add option to set default timestamp for proxy mode, fixes #452
- set via flag --proxy-default-timestamp or config 'proxy_options.default_timestamp'
- can be iso date or all-digit timestamp
- overridable via accept-datetime header

* docs: update docs for proxy timestamp
- add docs on memento support in proxy mode

* update-version: script can update version only, commit with 'update-version.sh commit'

* indexer post append: remove 'WB_wombat_' from POST query, could have been added in previous versions of pywb!
2019-03-11 16:28:09 -07:00
John Berlin
dfc3033117 Added skipping of metadata records with mime = text/anvl to cdxindexer.py. (#366)
Updated test_indexing.py to include a test for no-indexing metadata records with mime == text/anvl
Fixes https://github.com/webrecorder/webrecorderplayer-electron/issues/63.
2018-08-20 15:04:09 -07:00
Ilya Kreymer
9acad27801 indexing: py2 fix: if decoding error while writing utf-8 encoded url, try decoding as utf-8. avoids indexing error in py2 when if warc has non-ascii urls, fix for #312
test: add test for decoding utf-8 url
2018-04-28 23:31:42 -07:00
Ilya Kreymer
9eba59d8b4 warcserver: resource load: only read headers for self-redirect for response or revisit records
tests: add test with resource record (new warc/cdxj) to ensure correct read of resource records
2017-11-30 14:13:47 -08:00
Ilya Kreymer
3e9087df3c http OPTIONS and HEAD canonicalization: (#260)
* http OPTIONS canonicalization:
- rename PostQueryExtractor to generic MethodQueryCanonicalizer, handles OPTIONS verb in addition to POST
- use more generic 'query' instead of 'post_query' for method-query canonicalization
- append '__pywb_method=options' to OPTIONS responses to distinguish from get in MethodQueryCanonicalizer

* method canon: also add HEAD to __pywb_method query canonicalization
2017-10-23 17:15:06 -07:00
Sebastian Nagel
09295747b7 Extract WARC field "WARC-Identified-Payload-Type" (#251)
and add it as field "mime-detected" to index entry
2017-10-13 08:13:55 -07:00
Ilya Kreymer
d12f715d81 refactor: split warcserver.utils into utils package:
- utils.io for stream/compression related utils
- utils.format for string formatting
- utils.memento for memento
- load_config -> utils.loaders.load_overlay_config
- also: use warcio.utils.to_native_str instead of utils.loaders.to_native_str
2017-06-05 17:43:46 -07:00
Ilya Kreymer
2907ed01c8 refactor:
- fix pywb.indexer, pywb.manager, pywb.recorder packages, tests pass
rename geventeventserver -> pywb.utils
move extract_post_query/append_post_query to inputrequest.PostQueryExtractor
remove to_native_str() in pywb.utils, redundant with warcio.utils version
remove obsolete readme, dockerfile
2017-05-23 16:43:41 -07:00
Ilya Kreymer
ad33dc6728 refactor: webagg -> warcserver rename
- ResAggApp -> BaseWarcServer
- AutoApp -> WarcServer
- move index related files to warcserver.index package, tests to warcserver.index.test
- move resource loading related files to warcserver.resource package, tests to warcserver.resource.test
- pywb.cdx -> pywb.warcserver.index
- split pywb.warc -> pywb.warcserver.resource or pywb.indexer (for cdx generation)
- bump to 0.51.0 for now!
- tests for pywb.warcserver should be working
2017-05-23 09:21:43 -07:00