Ilya Kreymer
8fe2c1b5bd
apps & cli: remove old apps, keep:
...
- webagg-server
- wayback
- live-rewrite-server
support adding custom settings to AutoApp
support for --live flag that automatically adds live-web source at '/live'
tests: disable cdx_server tests as old cdx_server removed
2017-03-12 12:21:54 -07:00
Ilya Kreymer
0784e4e5aa
spin-off warcio!
...
update imports to point to warcio
warcio rename fixes:
- ArcWarcRecord.stream -> raw_stream
- ArcWarcRecord.status_headers -> http_headers
- ArchiveLoadFailed single param init
2017-03-07 10:58:00 -08:00
Ilya Kreymer
8ef6eb97b8
cdx: encoding: use to_native_str() consistently for better py2 compat
2016-05-23 11:47:44 -07:00
Ilya Kreymer
dd8ac42f2c
encoding: ensure cdx fields are in the native encoding, except filename, which should stay as unicode in py2 for further use
2016-04-30 16:08:43 -07:00
Ilya Kreymer
e8c77c0538
encoding: encode before quote
...
setup: enable zip_safe=True again
2016-04-30 15:15:35 -07:00
Ilya Kreymer
ab8b4efaec
encoding: cdx: only quote-encode 'url'
...
warc: ensure path index loads are utf-8 decoded
2016-04-30 14:38:48 -07:00
Ilya Kreymer
4b753d2612
Merge branch '0.11.5' into develop
2016-03-31 13:16:53 -07:00
Ilya Kreymer
e5ef51363c
zipnum: backport fix for #173 , paths specified in a zipnum .loc file are relative to the .loc file, not to
...
the working dir of the application
warnings: don't warn on .gz cdx files
2016-03-31 13:09:57 -07:00
Ilya Kreymer
5fd49f35ee
zipnum: when using .loc file, resolve shard paths relative to the .loc file, not from working directory, fixes #173
2016-03-22 11:31:08 -07:00
Ilya Kreymer
0f6e3da127
cdx: tests: add tests for comparison ops
2016-03-10 12:47:36 -08:00
Ilya Kreymer
c1bdeac92b
redis: fix redis key lookup, add tests for zrangebylex with new fakeredis
2016-03-09 18:33:04 -08:00
Ilya Kreymer
3af2979cf1
cdx: skip any fields starting with '_' when serializing
2016-03-08 08:38:55 -08:00
Ilya Kreymer
a6dc57cf4a
post query: ensure post query optional buffer is a byte not string buffer
...
exceptions: move LiveRequestException to wbexceptions
cdx query: support for 'alt_url' which, if set, is used to create start_key and end_key
2016-03-03 13:13:44 -08:00
Ilya Kreymer
3a584a1ec3
py3: all tests pass, at last!
...
but not yet py2... need to resolve encoding in rewriting issues
2016-02-23 13:26:53 -08:00
Ilya Kreymer
0dff388e4e
cdx: CDXQuery takes params dict not **params
...
CDXObject comparison using to_json()
2016-02-23 01:36:39 -08:00
Ilya Kreymer
57991fd0cf
cdx: ensure url param required check is performed on init
2016-02-22 13:59:07 -08:00
Ilya Kreymer
af7c876263
cdx: ensure CDXQuery computes key and end_key automatically
...
key and end_key encoded as utf-8 by default
2016-02-22 13:39:47 -08:00
Ilya Kreymer
bd841b91a9
more python 3 support work -- pywb.cdx, pywb.warc tests succeed
...
most relative imports replaced with absolute
2016-02-18 21:26:40 -08:00
Ilya Kreymer
a3a8b777d2
cdx: don't warn on .loc files, zipnum: add newline to page info response
2015-10-07 17:16:39 -07:00
Ilya Kreymer
c3aab1514c
query/cdx: support from
and to
cdx query arguments, support ranged calendar query,
...
eg. /[from]*[to]/[url] or /[from]-[to]/[url], with both from and to optional, closes #130
exposes lower and upper bound timestamps in timeutils, pad_timestamp
2015-10-07 10:44:12 -07:00
Ilya Kreymer
e201824be6
cdxops: when resolving cdx fields, use get with default '-' for old cdxs where some fields (eg. length) may be missing
2015-08-26 15:24:28 +03:00
Ilya Kreymer
bc40352bed
cdx: add support for 10-field cdx format (old OpenWayback format) to ensure it can be converted to cdxj
...
manager: fix convert-cdx -> cdx-convert as explained in the README
2015-08-25 22:54:38 +03:00
Ilya Kreymer
e9d04c71d3
uwsgi.ini: check if VIRTUAL_ENV actually set
...
remove debug print for fuzzy matching
2015-07-28 14:25:45 -07:00
Ilya Kreymer
27212488e3
tests: zipnum: better test coverage for incorrect idx or loc files, add invalid sample files zipnum-bad{.idx, .loc}, #112
2015-06-05 17:46:45 -07:00
Ilya Kreymer
bb250cafbc
zipnum: add query arg to location resolver
2015-05-29 12:52:35 -07:00
Ilya Kreymer
a51b2936f3
zipnum: fix bug with urls in last block not being accessible. when iter_range() fails, if check to see if last_line == end_line,
...
and if so, check if start_line should also be end_line #112
support non-linenumbered idx files w/o pagination queries
add new zipnum-sample to test cdx lines in last block (previous sample had only one line in last block except the first)
2015-05-29 11:46:00 -07:00
Ilya Kreymer
179f11198b
fuzzy match: look at first occurence, not last of match seperator
...
rules: add new rule for yt comments
2015-05-21 23:52:09 +00:00
Ilya Kreymer
5aee4a193b
cdx fuzzy match: fix when fuzzy replace string is >1 char, keep full replace string (to be examined further)
2015-04-13 09:45:02 -07:00
Ilya Kreymer
8513575dea
cdx: small refactor to CDXFile and RedisCDXSource to facilitate better extensions,
...
move generic methods to statics, add overridable params
2015-04-10 14:36:19 -07:00
Ilya Kreymer
97b4081d89
cdx redis: for empty, use iter instead of list for consistency
2015-04-04 12:56:15 -07:00
Ilya Kreymer
273176bce5
cdx: when reading cdxj, and run into non-ascii chars in url, utf-8 encode and %-encode
2015-03-29 09:21:50 -07:00
Ilya Kreymer
f3a066f58b
cdx-server query & zipnum: fixes for showNumPages query:
...
- if query contained in <1 secondary index block, must read first line of cdx to determine if any matches
- if no matches, don't throw 404 exception but always return json info with 0 pages
2015-03-28 16:15:24 -07:00
Ilya Kreymer
d2be90d4a1
test case tweak
2015-03-27 08:56:43 -07:00
Ilya Kreymer
41487dd9d4
update changelist for 0.9.2
...
cdx: include match type in cdx query error
2015-03-27 07:58:51 -07:00
Ilya Kreymer
85082e46bf
cdxj: ensure revisit resolve is skipped if the digest is missing, as may be case in cdxj ( #85 )
2015-03-26 11:11:10 -07:00
Ilya Kreymer
1cfe73c9db
zipnum: fix block count off-by-1 error in showNumPages query
2015-03-25 20:43:59 -07:00
Ilya Kreymer
6a3ca566db
zipnum: cleanup shared location resolution, in addition .loc file,
...
support a prefix resolver, where can be a regex replacement on the index path
(default is unchanged index path) (#83 )
2015-03-25 09:07:54 -07:00
Ilya Kreymer
1a8211d752
cdx server: add simplified matchType notation, using host* for prefix and *.host for domain matchType
...
(#34 )
2015-03-24 19:49:54 -07:00
Ilya Kreymer
2af5a25009
zipnum: support for pagination api! #34 and #83 . cdx server now bounded by pageSize (default 10 blocks),
...
showNumPages=true returns json indicating num pages, page=N can be set to page number 0-numPages - 1
loaders: add read_last_line() to read last line of a seekable file, used to read last line of index file when
at end
tests: additional test for binsearch boundary conditions
zipnum: secondary index output supports json also
2015-03-24 18:56:13 -07:00
Ilya Kreymer
ea460bb0f0
cdxj: support cdx json output from cdx server with output='json' (not yet default)
...
cdx field renaming: canonical cdx field name changes
statuscode -> status
mimetype -> mime
original -> url
old names still accept for query/filtering, however, cdx json will use new names
ensures consistency between .cdxj field names and names used by cdx server json output
collections manager now creates .cdxj by default
bump version to 0.9.0b2!
2015-03-19 13:33:49 -07:00
Ilya Kreymer
fe1c32c8f7
cdxj: support loading cdxj ( #76 )
...
cdx obj: allow alt field names to be used (eg. mime, mimetype, m)
(status/statuscode/s) in querying and reading cdx
cdx minimal: (#75 ) now implies cdxj to avoid more formats
minimal includes digest always and mime when warc/revisit
tests for cdxj loading
indexing optimization: reuse same entry obj for records of same type
2015-03-19 12:36:49 -07:00
Ilya Kreymer
db75bda736
file open() pass: convert all read and write to ensure binary 'b' flag is set ( #56 )
2015-01-11 18:54:11 -08:00
Ilya Kreymer
8d6845a552
fuzzy match: add support for specifying regex and args seperately for
...
fuzzy_lookup match
2014-12-26 14:29:51 -08:00
Ilya Kreymer
181c18a1b8
pep8 pass: fix spacing, line length, issues
...
also remove references to obsolete cached_replay, hostnames in pywb_init
2014-12-23 15:14:03 -08:00
Ilya Kreymer
3e3a74619f
various fixes: wombat: add Date.UTC and Date.parse
...
rewrite: support vi_ https -> metadata
video: fallback to vi_ call on current page
remove debug logging
2014-11-25 00:21:28 -08:00
Ilya Kreymer
c10df57e07
rules: add support for customizing matchType prefix, adding multiple
...
filters
2014-11-24 11:10:49 -08:00
Ilya Kreymer
5e4b830fa7
cdx: ensure cdx file is closed when iterator is done, since cdx files
...
are opened per-lookup, related to #45
2014-11-04 09:42:53 -08:00
Ilya Kreymer
e8d3965269
pep8 style fixes, remove unused methods
2014-10-21 19:06:16 -07:00
Ilya Kreymer
319b8124be
cdxobject: add ability to create empty CDXObject(), add tests for
...
CDXObject/IDXObject checking for supported and unsupported number of
fields
2014-09-22 21:12:25 -07:00
Ilya Kreymer
e2f8594ea7
rules: add [?&] prefix to query match, use {0} instead of {} for 2.6
...
compatibility
2014-09-21 20:04:51 -07:00