diff --git a/.gitignore b/.gitignore index 3979206a..7c9e4a8a 100644 --- a/.gitignore +++ b/.gitignore @@ -37,3 +37,5 @@ nosetests.xml # Other globalwb.py + +.vagrant diff --git a/README.md b/README.md index c35486dd..e91b0a15 100644 --- a/README.md +++ b/README.md @@ -58,17 +58,39 @@ To start a pywb with sample data 2. Install with `python setup.py install` 3. Run pywb by via script `run.sh` (script currently assumes a default python and uwsgi install, feel free to edit as needed) - + 4. Test pywb in your browser! (pywb is set to run on port 8080 by default.) If everything worked, the following pages should be loading (served from *sample_archive* dir): | Original Url | Latest Capture | List of All Captures | -| ------------- | ------------- | ----------------------- | +| ------------- | ------------- | ----------------------- | | `http://example.com` | [http://localhost:8080/pywb/example.com](http://localhost:8080/pywb/example.com) | [http://localhost:8080/pywb/*/example.com](http://localhost:8080/pywb/*/example.com) | | `http://iana.org` | [http://localhost:8080/pywb/iana.org](http://localhost:8080/pywb/iana.org) | [http://localhost:8080/pywb/*/iana.org](http://localhost:8080/pywb/*/iana.org) | + +### Vagrant + +pywb comes with a Vagrantfile to help you set up a VM quickly for testing. +If you have [Vagrant](http://www.vagrantup.com/) and [VirtualBox](https://www.virtualbox.org/) +installed, then you can start a test instance of pywb like so: + +```bash +git clone https://github.com/ikreymer/pywb.git +cd pywb +vagrant up +``` + +After pywb and all its dependencies are installed, the uwsgi server will start up and you should see: + +``` +spawned uWSGI worker 1 (and the only) (pid: 123, cores: 1) +``` + +At this point, you can open a web browser and navigate to `http://localhost:8080` for testing. + + ### Automated Tests Currently pywb consists of numerous doctests against the sample archive. @@ -129,8 +151,8 @@ the location of those files. #### SURT -By default, pywb expects the cdx files to be Sort-Friendly-Url-Transform (SURT) ordering. -This is an ordering that transforms: `example.com` -> `com,example)/` to faciliate better search. +By default, pywb expects the cdx files to be Sort-Friendly-Url-Transform (SURT) ordering. +This is an ordering that transforms: `example.com` -> `com,example)/` to faciliate better search. It is recommended for future indexing, but is not required. Non-SURT ordered cdx indexs will work as well, but be sure to specify: @@ -164,7 +186,7 @@ The directions are for running in a shell: 4. Ensure sort order set to byte-order `export LC_ALL=C` to ensure proper sorting. -5. From the directory of the warc(s), run `/warctools/hanzo/cdx_writer mypath/warcs/mywarc.gz | sort > mypath/cdx/mywarc.cdx` +5. From the directory of the warc(s), run `/warctools/hanzo/cdx_writer mypath/warcs/mywarc.gz | sort > mypath/cdx/mywarc.cdx` This will create a sorted `mywarc.cdx` for `mywarc.gz`. Then point `pywb` to the `mypath/warcs` and `mypath/cdx` directories in the yaml config. diff --git a/Vagrantfile b/Vagrantfile new file mode 100644 index 00000000..5bd21e51 --- /dev/null +++ b/Vagrantfile @@ -0,0 +1,137 @@ +# -*- mode: ruby -*- +# vi: set ft=ruby : + +$script = <