1
0
mirror of https://github.com/webrecorder/pywb.git synced 2025-03-15 16:14:48 +01:00

1052 Commits

Author SHA1 Message Date
Ilya Kreymer
b43a7f94f3 manager: add cdx -> cdxj migration tool #80, which will convert all cdxs in a directory to cdxj, removing original files
migration will also recanonicalize the urlkey to surt form
add migration test using non-surt, 9-field cdx (created from samples)
cdxindexer: fix multi warc->multi cdx indexing options
2015-03-19 20:57:33 -07:00
Ilya Kreymer
c5b5c8ee4b manager: fix index path to index.cdxj 2015-03-19 13:41:48 -07:00
Ilya Kreymer
ea460bb0f0 cdxj: support cdx json output from cdx server with output='json' (not yet default)
cdx field renaming: canonical cdx field name changes
statuscode -> status
mimetype -> mime
original -> url
old names still accept for query/filtering, however, cdx json will use new names
ensures consistency between .cdxj field names and names used by cdx server json output
collections manager now creates .cdxj by default
bump version to 0.9.0b2!
2015-03-19 13:33:49 -07:00
Ilya Kreymer
5221cbc64a add cdxj sample 2015-03-19 12:49:46 -07:00
Ilya Kreymer
fe1c32c8f7 cdxj: support loading cdxj (#76)
cdx obj: allow alt field names to be used (eg. mime, mimetype, m)
(status/statuscode/s) in querying and reading cdx
cdx minimal: (#75) now implies cdxj to avoid more formats
minimal includes digest always and mime when warc/revisit
tests for cdxj loading
indexing optimization: reuse same entry obj for records of same type
2015-03-19 12:36:49 -07:00
Ilya Kreymer
73f24f5a2b manager: fixes for windows: use shutil.move instead of os.rename to allow move to
existing file
tests: reset workdir before deleting temp dir
2015-03-18 13:14:05 -07:00
Ilya Kreymer
3f084625b0 indexing: cdx json support (#76): use OrderedDict when indexing json to ensure consistent ordering
skip empty or '-' fields
add tests for cdx json
2015-03-17 21:11:35 -07:00
Ilya Kreymer
6f9808f090 indexing: refactor ArchiveIndexEntry to be a dict instead of adding attrib. Allows for better track of indexed properties.
Add json-based cdx! (cdxj) output where all fields except url + key are in json dict. Support for both minimal and full json cdx, tracked via #76
2015-03-17 19:11:55 -07:00
Ilya Kreymer
bfe590996b auto-config: add support for loading from root ./static/ directory,
available under /static/__shared/ path
default path changed from /static/default -> /static/__pywb/
rename wayback-manager to wb-manager
2015-03-17 19:05:39 -07:00
Ilya Kreymer
0b8fd1e82e fix readme typos 2015-03-17 09:58:18 -07:00
Ilya Kreymer
0345e36daa readme: improve samples section 2015-03-17 01:13:10 -07:00
Ilya Kreymer
5b7215a6b1 readme tweaks and typo fixes 2015-03-17 01:06:06 -07:00
Ilya Kreymer
32ed176988 Update CHANGELIST for 0.9.0b1 2015-03-17 00:39:24 -07:00
Ilya Kreymer
e9e0412e1d More README tweaks 2015-03-17 00:28:14 -07:00
Ilya Kreymer
a60a735bd0 Update INSTALL.rst for 0.9.0 2015-03-17 00:14:10 -07:00
Ilya Kreymer
ab89ecd445 Brand new README for 0.9.0! 2015-03-17 00:01:32 -07:00
Ilya Kreymer
4b45e789df templates: ensure shared templates are loaded from root templates/ subdir
manager: add shared templates to templates subdir, not root dir #55 and #74
2015-03-16 19:57:28 -07:00
Ilya Kreymer
138aed3ddd change version to 0.9.0b1 2015-03-16 19:15:00 -07:00
Ilya Kreymer
2f6780a576 rename for 0.9.0:
rename default templates package from ui/* templates to templates/*
rename default subdirs: warcs -> archive, cdx -> indexes
2015-03-16 18:48:09 -07:00
Ilya Kreymer
19b8650891 manager: templates: add collections manager (#74) commands for adding, removing and listing
available ui templates. Support for both collection and shared templates.
confirmation for overwrite/remove
updated full template list in default_config and added tests
2015-03-16 16:55:06 -07:00
Ilya Kreymer
3d53fdde9e cleanup: remove unused __str__ from Handlers / Route, not as useful anymore 2015-03-15 22:55:23 -07:00
Ilya Kreymer
be5139b635 fix tests for coll listing, #78
config override: when loading from coll-specific config.yaml, resolve
relative paths to that collection, not to root #55
2015-03-15 22:23:08 -07:00
Ilya Kreymer
30454abb6b metadata: add support for user-defined per-collection metadata! #78
metadata stored in wbrequest.user_metadata and available to all templates

collections manager: refactor to use subparsers, add list collections and set metadata commands
update tests for new commands
index template: use user metadata title for collections listing
search template: display all metadata and title, if available
2015-03-15 21:24:15 -07:00
Ilya Kreymer
b417b47835 collections manager: support for merge when adding warc, explicit --index-warcs
option to index and merge instead of reindexing whole dir, #74
additional testing for recursive indexing, index merge
timeutils: add timestamp20_now() function
2015-03-14 14:56:15 -07:00
Ilya Kreymer
759d151551 tests: add test for directory auto collection loader,
collection manager and new 6-field minimal cdx format
2015-03-13 19:53:50 -07:00
Ilya Kreymer
1ba24de357 Merge branch 'develop' into config-work 2015-03-13 11:53:27 -07:00
Ilya Kreymer
b4b92482ad Merge branch 'develop' for 0.8.3 2015-03-13 11:06:52 -07:00
Ilya Kreymer
b2ce3feb80 readme fix 2015-03-13 11:05:32 -07:00
Ilya Kreymer
3e3794d4dc Update CHANGES for 0.8.3 2015-03-13 11:04:37 -07:00
Ilya Kreymer
24021fcd57 html rewrite: add trailing slash for <base> tag rewrite if url is a scheme://host
with no path component #77
cleanup: remove unused code path for tags with no rewriting -- all tags
now checked for dynamic attrs which may need rewriting
update tests, including live rewrite test dependent on live site (FB)
2015-03-13 10:53:57 -07:00
Ilya Kreymer
fe1683da56 indexing: for minimal index, use a single -m flag to create a 6 field index.
minimal index also skips parsing contents of warc/arc records altogether
add cli docs for minimal index, tracked via #75
2015-03-07 11:56:17 -08:00
Ilya Kreymer
499e21233e statusandheaders: make protocol check case-insensitive, eg. accept HTTP/1.0 and http/1.0 for better compatibility 2015-03-07 11:37:06 -08:00
Ilya Kreymer
5aa497dc68 Merge branch 'develop' into config-work 2015-03-06 21:09:21 -08:00
Ilya Kreymer
1fb631870b wb_frame: fix extra slash typo in replaced frame url 2015-03-05 17:04:44 -08:00
Ilya Kreymer
f2d7bd074a bump version to 0.8.3
cookie rewrite: remove 'secure' flag if present
2015-03-05 16:18:56 -08:00
Ilya Kreymer
1eadd35598 Merge branch 'develop' for 0.8.2 2015-02-28 09:05:09 -08:00
Ilya Kreymer
6c8cb806d9 update 0.8.2 changelist, minor fixes 2015-02-28 09:04:15 -08:00
Ilya Kreymer
48eab2662d cdx indexer: refactor indexer into mixins for differnt formats for easier customization 2015-02-25 16:45:47 -08:00
Ilya Kreymer
ee1fabf600 config fix: check for existance of root 'collections dir', #55 2015-02-25 13:51:12 -08:00
Ilya Kreymer
11c8cc92f3 add beta to README 2015-02-25 13:33:42 -08:00
Ilya Kreymer
671f45f69f cdx indexing: wrap record iterator global functions in class DefaultRecordIter to allow for better extensibility
add 'minimal' option to skip digest/mime/status extraction only include minimal data (url+timestamp)
cdx-indexer: add -6 option to create 6-field index
2015-02-25 13:31:37 -08:00
Ilya Kreymer
1d4c54deaa frames ui: update frames to use <!DOCTYPE html>, improved css and html5 compatibility 2015-02-25 13:25:05 -08:00
Ilya Kreymer
60f33412ff collections manager: add new collections manager, first pass #74
add cli 'wb-manager' tool
very preliminary, needs testing still
2015-02-25 13:19:20 -08:00
Ilya Kreymer
69613a0e25 tests: disable 'invalid config' test as its no longer applicable, fix default banner to just 'banner.html' 2015-02-25 13:18:32 -08:00
Ilya Kreymer
5c67782a2c config system: some fixes for auto-init, add trailing '/' for dir paths, #55 2015-02-25 13:15:48 -08:00
Ilya Kreymer
7c60bf17f7 bump version to 0.9.0-beta! 2015-02-24 16:54:49 -08:00
Ilya Kreymer
e39d6e207c config & collections: auto static path and templates working! #55 2015-02-24 14:32:51 -08:00
Ilya Kreymer
a932235f85 Merge branch 'develop' into config-work 2015-02-24 10:40:58 -08:00
Ilya Kreymer
cb857df125 memento: fix MementoTimemapView to have consistent signature with other query views 2015-02-24 10:35:49 -08:00
Ilya Kreymer
39824711f0 memento tweak: ensure rel=memento link for timegate uses exact in Location (cdx original) as opposed to url from request 2015-02-23 23:21:39 -08:00