1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 00:03:28 +01:00

Commit Graph

  • fb07775d38 tests: add 'bad.cdx' for testing cdx lines with missing original for revisit, missing/non-existant warc Ilya Kreymer 2014-06-25 12:32:57 -07:00
  • 913a1e9f31 warc: simplify recordloader a bit more, only response and request records get parsed as http (excluding dns: and whois: uris) All others have an '-' status and no headers parsing tests: add test for zero-length revisits Ilya Kreymer 2014-06-25 12:11:26 -07:00
  • 6761f5697f indexing: refactor cdxindexer interface to better allow custom writers record loader: skip whois: and dns: records, better skipping of arc headers (todo: need more unit tests) Ilya Kreymer 2014-06-24 17:08:10 -07:00
  • 3965fad4dd cdx indexing: add support for 9-field cdx output, request merge: store referer if available, check for record id matching Ilya Kreymer 2014-06-19 16:51:23 -07:00
  • 694b97e67f archive indexing: Refactor, split into ArchiveIterator generic iteration and cdx-indexer, which writes out CDX specifically recordloader: always load request, limit stream before headers are loaded Ilya Kreymer 2014-06-19 13:37:42 -07:00
  • de65b68edc rules: additions to rules for FB Ilya Kreymer 2014-06-18 16:45:54 -07:00
  • 22a2da6e0c rewrite: for WB_wombat_top rewriting, select next-to-top instead of self Ilya Kreymer 2014-06-16 19:42:15 -07:00
  • e1c1d23a9f framed replay: improved url update support, ensure update url is actually the url of the frame (ignore ajax requests) Ilya Kreymer 2014-06-16 18:46:01 -07:00
  • ac3efec4bc update develop to 0.4.6 improved regex for top -> WB_wombat_top rewriting Ilya Kreymer 2014-06-16 15:57:22 -07:00
  • f26b0ddbe4 update setup.py version 0.4.5 Ilya Kreymer 2014-06-15 12:35:20 -07:00
  • 987a9ee58f update README for master Ilya Kreymer 2014-06-15 12:34:14 -07:00
  • c4e3f25f9a Merge branch 'develop' for 0.4.5 release Ilya Kreymer 2014-06-15 12:32:47 -07:00
  • 4767ab0fdd Update CHANGES.rst to 4.5 Ilya Kreymer 2014-06-15 12:09:10 -07:00
  • 88d3e94b36 fixes for pep8, name fixes Ilya Kreymer 2014-06-15 11:57:48 -07:00
  • 073f1e142e test_config: test lxml parser still Ilya Kreymer 2014-06-14 21:33:08 -07:00
  • 80e80e97d3 replay: support 'framed_replay' option in config for both replay and live rewrite split replay view into BaseContentView and ReplayView refactor RewriteLiveHandler into RewriteLiveView add additional tests for framed and non-framed mode default to framed replay! Ilya Kreymer 2014-06-14 18:26:19 -07:00
  • d21f8079ca cookie rewrite: remove max-age, add test Ilya Kreymer 2014-06-14 10:04:31 -07:00
  • ceeb25a899 rewrite: fix unit tests, add extra closed check for 2.6 (not sure why its needed now) Ilya Kreymer 2014-06-14 01:02:00 -07:00
  • 028e274b22 rewrite tests: improve POST test, only add header if not empty Ilya Kreymer 2014-06-14 00:18:35 -07:00
  • d7516f4cd7 rewrite: fix <base> rewriting, urlrewriter replacement turn off lxml rewriter by default Ilya Kreymer 2014-06-13 16:44:37 -07:00
  • 0d3f663ef1 rewrite: disable refer-redirect in case of POST, handle request w/o redirect (can't use 307 because of FF) Ilya Kreymer 2014-06-13 16:23:11 -07:00
  • dfef05a74d rewrite: live rewrite: switch to including all headers rather than a whitelist for proxying Ilya Kreymer 2014-06-13 16:22:18 -07:00
  • 41e1809039 update wombat.js (support for write override, fill in WB_wombat_location on new iframe) disable 307 redirects as FF always displays modal confirmation for these, even for same host Ilya Kreymer 2014-06-11 20:12:05 -07:00
  • bdafe0938d remove accidental debug commits Ilya Kreymer 2014-06-11 12:44:49 -07:00
  • 14ed6c5898 remove accidental changes Ilya Kreymer 2014-06-11 12:42:44 -07:00
  • 0c9d88f032 POST replay: treat POST form data same as get query, no '&&&' marker additional testing POST Ilya Kreymer 2014-06-11 11:17:06 -07:00
  • e2349a74e2 replay: better POST support via post query append! record_loader can optionally parse 'request' records archiveindexer has -a flag to write all records ('request' included), -p flag to append post query post-test.warc.gz and cdx POST redirects using 307 Ilya Kreymer 2014-06-10 19:21:46 -07:00
  • 028cdaa22e bump version to 0.4.1 Ilya Kreymer 2014-06-05 14:10:30 -07:00
  • cf119174ea rewrite: for rewriting purposes, use original cdx url, not the request url (significance if trailing '/' is present) Ilya Kreymer 2014-06-05 14:09:30 -07:00
  • 2c65521ea3 final README.rst edits 0.4.0 0.4.0 Ilya Kreymer 2014-05-30 12:52:43 -07:00
  • 18f7031423 add bullet points to README! Ilya Kreymer 2014-05-30 12:45:59 -07:00
  • e3bbf95280 merge develop for 0.4.0, update paths to master branch Ilya Kreymer 2014-05-30 12:39:37 -07:00
  • 05812060c0 Merge branch 'develop' Ilya Kreymer 2014-05-30 12:37:59 -07:00
  • 6d6f2452fc update README and CHANGES for release Ilya Kreymer 2014-05-30 12:37:30 -07:00
  • 9519e8d6f1 Update CHANGES.rst Ilya Kreymer 2014-05-30 12:27:20 -07:00
  • f9710d033c fix integration test for 307 update head_insert for new wombat remove redundant host jinja func, use 'urlsplit' instead Ilya Kreymer 2014-05-30 11:17:12 -07:00
  • 52040127b3 update wombat.js to latest rewrite live: add another rewrite live header, use 307 for archival referer based redirects Ilya Kreymer 2014-05-30 11:03:22 -07:00
  • de69372b9f Update CHANGES.rst Ilya Kreymer 2014-05-30 10:54:17 -07:00
  • 9340165014 Changes for 0.4.0 Ilya Kreymer 2014-05-30 10:52:59 -07:00
  • eaf9cce261 Update README.rst Ilya Kreymer 2014-05-30 10:29:22 -07:00
  • 9b732def93 cookie_rewriting: if domain is specified, apply cookie to coll root rather than rewritten path.. needed in order for subdomain cookies to be detected properly Ilya Kreymer 2014-05-18 21:51:07 -07:00
  • 8c15ac16fd search page template: add 'prefix' to search page template Ilya Kreymer 2014-05-18 21:27:53 -07:00
  • 1d674d97d8 pep8 pass! Ilya Kreymer 2014-05-16 22:44:26 -07:00
  • 923421d637 rewrite_content: add a few tests for cs_, js_, remove redundant except Ilya Kreymer 2014-05-16 22:43:53 -07:00
  • 2600d870d7 improved test: dsrules remove redundant check static: check invalid static paths and file_wrapper memento: check non-memento paths test debug handlers and custom '-cdx' suffix Ilya Kreymer 2014-05-16 22:17:51 -07:00
  • ca33287051 test: move non-surt-cdx sample to non-surt-cdx/ dir for clarity / avoid confusion when bulk loading cdx/ dir (surt and non-surt cdx should NOT be mixed) Ilya Kreymer 2014-05-16 21:21:14 -07:00
  • 7d236af7d7 cdx: fix creation and add test for non-surt cdx (pywb-nonsurt/ test) archiveindexer: -u option to generate non-surt cdx tests: full test coverage for cdxdomainspecific (fuzzy and custom canon) Ilya Kreymer 2014-05-16 21:16:50 -07:00
  • 8758e60590 update to latest wombat.js Ilya Kreymer 2014-05-16 09:58:07 -07:00
  • 5285723ccf cookie_rewriter: catch CookieError and ignore erroring cookies Ilya Kreymer 2014-05-15 22:37:08 -07:00
  • 1d8c68b745 rewrite: only translate non-empty header values Ilya Kreymer 2014-05-13 17:42:55 -07:00
  • 871cc26fa4 rewrite: add optional cookie_rewriter, created by urlrewriter and called from header_rewriter cookie_rewriter works correctly with a concatenated set-cookie list, returns a list of rewritten 'set-cookie' headers rewrite_live: add proxying of Host, Origin, additional headers split header rewriter tests into test_header_rewriter, add test_cookie_rewriter bump version to 0.4.0! Ilya Kreymer 2014-05-13 17:07:41 -07:00
  • 89da165467 exceptions: add optional url param to WbException, move handler_exception() into WSGIApp for easier customization Ilya Kreymer 2014-05-13 01:54:12 -07:00
  • e7957a5cae remove SeekableTextFileReader, replaced with standard file-like objects and seek(0, 2) and tell() to get file length Ilya Kreymer 2014-05-06 20:54:42 -07:00
  • 46449ac188 rewrite: pass wburl mod to rewritier, so that css/js rewriting rules may override default content-type (in cases where it is incorrect) allows for rule based cusomization (to be added later) Ilya Kreymer 2014-05-05 22:12:45 -07:00
  • d2795dfdaa minor cleanup: wburl: add is_url_query() check views: add kwargs to J2HtmlCapturesView for better extensibility query_handler: simplify make_cdx_response() arguments Ilya Kreymer 2014-05-01 11:58:34 -07:00
  • 4c075d14af views: actually encode template result as utf-8! Ilya Kreymer 2014-04-30 21:16:05 -07:00
  • 9cf5327e88 bufferedreader cleanup: * BufferedReader defaults to no decompression * DecompressingBufferedReader defaults to gzip decomp * ChunkedDataReader defaults to no gzip decomp, but decomp can be set later via set_decomp(). This allow chunked responses to be de-chunked but not decompressed (eg for non-text responses) Ilya Kreymer 2014-04-28 20:15:31 -07:00
  • 53ad67eb9c rewrite: disable one 'top' rewriting rule (should move to seperate mixin) views: add urlsplit jinja2 filter Ilya Kreymer 2014-04-27 01:04:20 -07:00
  • 09653cf77e rewrite: more nuanced 'top' rewriting, fix wombat frame mode detection Ilya Kreymer 2014-04-26 18:43:25 -07:00
  • 58f261fda4 cdx redis: disable new test until fakeredis supports zrangebylex() Ilya Kreymer 2014-04-25 11:00:49 -07:00
  • 2b8bea616e when given a redis path of redis://<host>/<db>/<key>, use <key> as a sorted cdx file with zrangebylex! Ilya Kreymer 2014-04-25 10:52:35 -07:00
  • e4262502b0 fix ChunkedDataReader chunked + gzip decomp: if reading one chunk yields no data (due to more data being needed for gzip decomp), keep reading more blocks until there is data or last block is reached (or error). Ensure a single read() call will return some data if there is any Ilya Kreymer 2014-04-25 10:30:22 -07:00
  • 53f0cb540f url rewriter: add optional 'full prefix', check and don't rewrite urls if starting with prefix or full prefix wbrequest: if no scheme present (shouldn't happen with wsgi) default to http Ilya Kreymer 2014-04-24 10:44:08 -07:00
  • cd017669ae bugfix: ChunkedDataReader handles zero-length chunk properly, add test Ilya Kreymer 2014-04-23 10:00:25 -07:00
  • 48e8e8eb1c allow passing optional kwargs to render search page add configutable 'default_mod' param Ilya Kreymer 2014-04-22 16:33:47 -07:00
  • 2ad41e2b94 rewrite: rewrite data-* attributes if they look like links (http, https, //) Ilya Kreymer 2014-04-22 16:32:36 -07:00
  • 6eef0afb86 add new custom rewriting rule (flickr) Ilya Kreymer 2014-04-20 21:40:27 -07:00
  • e1e55ac061 minor tweaks: rewrite 'crossorigin' -> '_crossorigin' param to disable crossorigin as it may interfere with loading rewritten content, add tests for html and lxml parsers add server_cls as optional param to QueryHandler.init_from_config() for easier customization views: dont create template if empty template file specified Ilya Kreymer 2014-04-19 12:04:43 -07:00
  • 23bb5bd175 rewrite: wombat update 2.0! Using Object.defineProperty() to better override .href and .hash properties when possible. .href returns original url, but on assignment rewrites before redirecting .hash proxies to location.hash Also added: - window.top -> window.WB_wombat_top - document.referrer -> document.WB_wombat_referrer - <source> html tag rewriting Ilya Kreymer 2014-04-18 19:30:48 -07:00
  • e011da43f2 live rewrite: use custom REL_REFERER field don't overrie HTTP_REFERER if REL_REFERER not set, don't send any referrer Ilya Kreymer 2014-04-15 16:44:02 -07:00
  • 85593696fa remove rfc3987 validation, was rejecting valid urls add extract_referer_wburl_str() to extract WbUrl str, if any, from the referrer. Use that for live_rewrite_handler to override default referrer Ilya Kreymer 2014-04-15 16:38:53 -07:00
  • 611b9093bd html insert: add include_ts option to optionally not add timestamp Ilya Kreymer 2014-04-13 18:17:31 -07:00
  • d8c9a803f6 add support for optional proxies (verify set to false for now) Ilya Kreymer 2014-04-13 17:50:26 -07:00
  • 7636c9d3f7 fix: when reading response, only readline() if previous read() was non-empty Ilya Kreymer 2014-04-09 16:44:45 -07:00
  • bfc2e63793 live rewriter: integrate handler with rewrite_live.py module, clean up css, add unit and integration tests clean up cli server now known as 'live-rewrite-server', which performs live rewrite using iframe paradigm Ilya Kreymer 2014-04-09 15:46:03 -07:00
  • 11202c462f support both frames and non-frames mode add automatic framing when in framed mode Ilya Kreymer 2014-04-09 10:57:43 -07:00
  • b4f30a770f ChunkDataReader: if determined to be non-chunked, read full buffer unchunked Ilya Kreymer 2014-04-09 10:06:09 -07:00
  • 19f2df4717 refactor: - move is_identity(), is_embed() to wburl from wbrequest - add is_mainpage() predicate - add create_template() to each J2TemplateView to create itself - add HeadInsertView to create a reusable head insert for RewriteContent - add 'mp_' as modifier for frames mode to be used as possible modifier with HTMLRewriter Ilya Kreymer 2014-04-09 10:01:44 -07:00
  • 1fb6f5eff7 add rewriter_handler, frame wrapper support! Ilya Kreymer 2014-04-08 22:43:32 -07:00
  • 8897a0a7c9 decompressingbufferedreader: default to 'gzip' decompression instead of none. ChunkedDataReader also automatically attempts decompression, by default Add tests to verify Ilya Kreymer 2014-04-08 21:49:04 -07:00
  • 02fe78cb0b update changes, add more tests Ilya Kreymer 2014-04-07 17:41:14 -07:00
  • a331061691 minor tweaks: add default static_path for jinja, remove unused import Ilya Kreymer 2014-04-07 17:19:07 -07:00
  • c23dd7bda4 wombat update: - support scheme-relative (//) urls - override dom manipulation (appendChild, insertBefore, replaceChild) - disable Worker() interface for now Ilya Kreymer 2014-04-07 17:17:08 -07:00
  • 2a318527df lxml: use lxml's parse interface instead of feed interface to allow xml to handle decoding unicode data, better address #36 Ilya Kreymer 2014-04-07 17:13:43 -07:00
  • 890c323617 update bad.arc with empty record example Ilya Kreymer 2014-04-07 17:12:33 -07:00
  • 64eef7063d record reading: better handling of empty arc (or warc) records for indexing, index empty/invalid length as '-' status code for reading, serve as 204 no content. ensure that StatusAndHeaders has a valid statusline when serving if http content-length is valid,, limit stream to that content-length as well as record content-length (whichever is smaller) replace content-length when buffering Ilya Kreymer 2014-04-07 17:08:39 -07:00
  • d8c20a59cf update to version 0.3.1 Ilya Kreymer 2014-04-06 11:46:43 -07:00
  • dd8396a339 update to 0.3.0 version instead of 0.2.2 0.3.0 Ilya Kreymer 2014-04-06 11:24:49 -07:00
  • d6006acdc3 rewrite: when using lxml parser, just pass raw stream to lxml without decoding. lxml parser expects to have raw bytes and will determine encoding on its own. then serve back as utf-8 if no encoding specified. should address #36 Ilya Kreymer 2014-04-06 09:47:34 -07:00
  • 3c0ca9d874 update README.rst for master 0.3.0 0.2.2 Ilya Kreymer 2014-04-04 13:04:30 -07:00
  • e077c23de7 fuzzy match: modify existing params to ensure any custom params are preserved templates: add ability to set custom global vars, such as 'static_path' for all templates Ilya Kreymer 2014-04-04 12:20:54 -07:00
  • b0b0adb043 refactor: rename pywb.core -> pywb.webapp move perms/test/test_perms_policy -> tests/perms_fixture for rules file, use single DEFAULT_RULES_FILE import Ilya Kreymer 2014-04-04 10:09:26 -07:00
  • 3aa4a4da7a rewrite: ensure lxml parser closes gracefully on no input Ilya Kreymer 2014-04-03 13:00:22 -07:00
  • 5388a0b03b Merge branch 'develop' of https://github.com/ikreymer/pywb into develop Ilya Kreymer 2014-04-03 12:45:54 -07:00
  • 5dd586cf07 refactor: simplify rewrite_content and replay_views, remove redundant code.. everything goes through rewrite_content(), is sanitized (for transfer encoding) if needed additional testing for decode_buff fix failed_files bug in resolvingloader, add tests Ilya Kreymer 2014-04-03 12:44:00 -07:00
  • 5155a5c842 fix README headings Ilya Kreymer 2014-04-03 09:25:10 -07:00
  • bd21fec6d4 update run-uwsgi.sh and add run-gunicorn.sh update README and INSTALL, fix typo only list wb handlers on home page by default pep8 fixes Ilya Kreymer 2014-04-03 08:56:18 -07:00
  • 1e7ecb901a tweak README, add no cover pragmas to blocking cli apps (for now) Ilya Kreymer 2014-04-02 21:43:09 -07:00
  • 80f2da9548 refactor: move configs/config.yaml to root again remove cdx-server specific config, instead make cdx server api-only path configurable from regular config Ilya Kreymer 2014-04-02 21:26:53 -07:00
  • 8bdafeb040 Update README.rst Ilya Kreymer 2014-04-02 20:29:00 -07:00