1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

108 Commits

Author SHA1 Message Date
Ilya Kreymer
ccc13b427f dockerfile: update to latest pywb
urlrewrite: upstream url avoid adding empty '&'
2016-10-02 11:29:51 -07:00
Ilya Kreymer
5c499753f8 webrecore Docker: update Docker file to latest pywb, python, starting to use versioning! 2016-09-16 18:43:26 -07:00
Ilya Kreymer
2fb1df34c9 recorder: add upload/streaming support with put_record=stream where the content being uploaded is already in WARC record form 2016-08-12 21:23:25 -04:00
Ilya Kreymer
c8b6a48005 webagg: use prepare_auth() to ensure Authorization header is set for http://user:pass@host urls 2016-08-12 21:22:17 -04:00
Ilya Kreymer
82d3b61523 recorder: catch exception in close_idle_files() if file no longer exists and ensure it's removed 2016-08-12 01:19:30 -04:00
Ilya Kreymer
594aff86d3 webagg: response self-redir: don't check if live, throw correct exception 2016-08-10 00:50:43 -04:00
Ilya Kreymer
cca0c01547 urlrewrite misc fixes:
- ensure content-length is converted to str
- templateview: support optional extensions
- fix test
2016-08-09 19:53:22 -04:00
Ilya Kreymer
c93d7ecafc webagg: Fix loading of url-lookup (url agnostic) revisits, ensure all params passed to cdx lookup, add tests for url-agnostic revisit lookup 2016-08-04 16:53:24 -04:00
Ilya Kreymer
20b161bf90 debug: print stracktrace when debugging 2016-08-01 02:12:15 -04:00
Ilya Kreymer
db3b92e228 writing: add write_stream_to_file()function to be able to write to a WARC an existing input stream
refactor _do_write_req_resp to pass callback to actual writing (eg. _write_to_file)
2016-07-31 00:49:57 -04:00
Ilya Kreymer
1b09015954 recorder: split up _open_file() into get_new_filename() and allow_new_file() to customize skipping recording by returning false
from allow_new_file()
create_warcinfo_record() - switch to dict args over kwargs, update tests
2016-07-30 13:11:12 -04:00
Ilya Kreymer
498f87fb54 add Dockerfile to git! 2016-07-26 19:42:59 -04:00
Ilya Kreymer
a5696fc2d4 rewriter: range massage for patch as well as record 2016-07-26 19:42:32 -04:00
Ilya Kreymer
14cf68e4e5 custom record: don't override WARC-Date if provided in request header,
return chosen WARC-Date in json response
2016-07-26 19:41:47 -04:00
Ilya Kreymer
34a710e51a custom response: add utf-8 encoding, unless framed replay 2016-07-24 00:14:43 -04:00
Ilya Kreymer
9588e8622f responseloader: quote/unquote Webagg-Source-Coll header as source may contain unicode chars 2016-07-23 21:57:24 -04:00
Ilya Kreymer
ae290587f6 temp cookie store: add add_cookie() function for explicitly adding cookie, make expiry configurable
related to webrecorder/webrecorder#79
2016-07-01 10:15:59 -04:00
Ilya Kreymer
bc36ae1302 rewriter: update for moved RewriterAMF in pywb 2016-06-14 00:14:29 -04:00
Ilya Kreymer
c1d7111841 webagg: store original 'source' value in cdx for properly mapping in WARC file resolver
error handling: ensure 'last_exc' is a string
2016-06-14 00:13:01 -04:00
Ilya Kreymer
4c7da0f6ef recorder: support overridings get_params() in subclass
multiwarcwriter: support multiple warcs in same dir, support random component in path, and a custom
key template for selecting current warc file, not related to current directory
2016-06-07 12:55:04 -04:00
Ilya Kreymer
3fec766e39 webagg: redis lookup: if url contains wildcard, scan redis keys to check multiple keys until one is found
webagg tests: fix test to include mime in live cdx
2016-06-07 12:54:28 -04:00
Ilya Kreymer
d7c74b68de video loader support: add VideoLoader, which uses youtube-dl to create a metadata record
of video info. Activated with explicit content_type param 'application/vnd.youtube-dl_formats+json'
2016-05-28 15:01:33 -07:00
Ilya Kreymer
30f9d0aca7 recorder put custom record: add support for put/post of a custom record. If put_record= param is included, the request body
is written to the specified record type.
move record creation functions to the warcwriter
add tests for custom record
2016-05-26 20:49:40 -07:00
Ilya Kreymer
ea3efdf84d responseloader: use PreparedRequest() to ensure url properly formatted
tests: update tests for latest, live data
2016-05-24 18:01:44 -07:00
Ilya Kreymer
80d9805a58 webagg: tests: flush fakeredis for reentrancy
utils: add load_config() with option for main and override configs
2016-05-19 17:01:09 -07:00
Ilya Kreymer
45c8fcddbd recorder: add max_idle_secs / close_idle_files() to close any open files that have not been modified longer than set threshold, in prep for webrecorder/webrecorder#92
indexer: add 'full_warc_prefix' for setting full path prefix in add_warc_file() (eg. for http load) for webrecorder/webrecorder#95
2016-05-11 21:40:02 -07:00
Ilya Kreymer
94d6098238 app: separate json_encode() func
compat: py2 fixes
2016-05-11 11:38:59 -07:00
Ilya Kreymer
c45f5cb749 webagg: use werkzeug routing instead of wrapping Bottle app 2016-05-10 16:31:44 -07:00
Ilya Kreymer
464eca2fa0 test apps: enable debugging for test apps
test recorder: write to a temp dir for each run
2016-05-06 16:33:18 -07:00
Ilya Kreymer
e64ae780c6 urlrewrite: improve POST request support for ikreymer/pywb#178 2016-05-06 16:32:13 -07:00
Ilya Kreymer
ab3af90df2 cookie_tracker: add support for redis-based subdomain cookie tracker, which temp caches cookies with Domain= set in redis and passes them upstream
when rewriting. addresses webrecorder/webrecorder#79
2016-05-04 16:39:47 -07:00
Ilya Kreymer
228ca58c5b recorer: actually fix content-type on warcinfo, add to test! 2016-04-30 13:07:53 -07:00
Ilya Kreymer
0fbae1c7f8 recorder: ensure warcinfo record has a content-type 2016-04-30 10:19:20 -07:00
Ilya Kreymer
7a0dd463cd webagg: responseloader: use urllib3 directly instead of requests to
take advantage of connection pooling w/o storing/sharing cookies
2016-04-27 10:16:54 -07:00
Ilya Kreymer
9010e52663 urlrewrite: refactor simpleapp to support live/record/replay 2016-04-27 10:15:48 -07:00
Ilya Kreymer
f119d05724 recorder: fix simplerec init
tests: improve tests for skipping request and response headers
2016-04-27 09:52:56 -07:00
Ilya Kreymer
a82e2785c7 tests: add basic test for rewriterapp 2016-04-25 14:29:28 -07:00
Ilya Kreymer
3b6cab1730 urlrewrite: remove dependency on bottle from rewriterapp,
add overridable error and query views, with extensible get_query_params() and process_cdx_query()
to extend cdx for query view
add get_top_url() for adding custom top_url for frame insert
add call_with_params() for adding custom params to environ
2016-04-25 12:05:43 -07:00
Ilya Kreymer
b056acd88e urlrewrite: add support for index query 2016-04-15 04:01:36 +00:00
Ilya Kreymer
0370470e68 urlrewrite: http range: support skipping record for range requests not starting at 0-
and performing async request,
support converting unbounded 0- to non-ranged and back
2016-04-15 02:21:39 +00:00
Ilya Kreymer
0b255819ff recorder warcwriter: allow skipping writing of only request or only response by overriding _is_write_req and _is_write_resp in subclass
(todo: rethink the interface)
2016-04-15 02:19:34 +00:00
Ilya Kreymer
a93f75dca2 webagg: add preliminary 'fuzzy matching' fallback support, currently enabled for all sources
(todo: need to only include sources that support it)
2016-04-15 02:18:20 +00:00
Ilya Kreymer
00bdddd1e9 recorder: SkipDupePolicy only skips if url is an exact match (not just by urlkey) 2016-04-07 10:44:05 -07:00
Ilya Kreymer
f4cc143dc7 urlrewrite: generalize support for overridable handle_custom_response() callback for handling modifiers (default support top-frame)
pass headers to add_custom_params, include error message on error if available
headers: use add_header() to support multiple headers with same name
is_ajax(): check for X-Pywb-Requested-With header to make as ajax and not pass to upstream
2016-04-07 10:39:12 -07:00
Ilya Kreymer
fa5d5e6bcc urlrewrite templates: add get_top_frame_params() callback for adding custom params for top frame,
also inject env['webrec.template_params'] if set
2016-04-05 02:45:00 -07:00
Ilya Kreymer
d40edfc22d warcwriter: add create_warcinfo_record() for creating a warcinfo and a SimpleTempWARCWriter for writing records to temp buff/file 2016-04-03 12:19:54 -07:00
Ilya Kreymer
fd76030cb3 urlrewriter: allow passing in existing jinja_env wrapper 2016-04-02 21:36:54 -07:00
Ilya Kreymer
01c21d3a43 recorder: redis indexer accepts arg list, supports separate redis and key_template args
add length param to add_urls_to_index() in redis indexer, return cdx list
2016-04-02 21:36:36 -07:00
Ilya Kreymer
6157cebcc9 testutils: when mock patching FakeStrictRedis, use a subclass with a shared pubsub (to match real redis) 2016-04-02 21:33:39 -07:00
Ilya Kreymer
ddee9236c6 webagg: rename key_prefix -> key_template 2016-04-02 21:33:23 -07:00