Add Vagrantfile
# Other
# Other # Other
globalwb.py

@ -58,17 +58,39 @@ To start a pywb with sample data
2. Install with `python setup.py install` 2. Install with `python setup.py install`
3. Run pywb by via script `run.sh` (script currently assumes a default python and uwsgi install, feel free to edit as needed) 3. Run pywb by via script `run.sh` (script currently assumes a default python and uwsgi install, feel free to edit as needed)
4. Test pywb in your browser! (pywb is set to run on port 8080 by default.) 4. Test pywb in your browser! (pywb is set to run on port 8080 by default.)
If everything worked, the following pages should be loading (served from *sample_archive* dir): If everything worked, the following pages should be loading (served from *sample_archive* dir):
| Original Url | Latest Capture | List of All Captures | | Original Url | Latest Capture | List of All Captures |
| ------------- | ------------- | ----------------------- | | ------------- | ------------- | ----------------------- |
| `http://example.com` | [http://localhost:8080/pywb/example.com](http://localhost:8080/pywb/example.com) | [http://localhost:8080/pywb/*/example.com](http://localhost:8080/pywb/*/example.com) | | `http://example.com` | [http://localhost:8080/pywb/example.com](http://localhost:8080/pywb/example.com) | [http://localhost:8080/pywb/*/example.com](http://localhost:8080/pywb/*/example.com) |
| `http://iana.org` | [http://localhost:8080/pywb/iana.org](http://localhost:8080/pywb/iana.org) | [http://localhost:8080/pywb/*/iana.org](http://localhost:8080/pywb/*/iana.org) | | `http://iana.org` | [http://localhost:8080/pywb/iana.org](http://localhost:8080/pywb/iana.org) | [http://localhost:8080/pywb/*/iana.org](http://localhost:8080/pywb/*/iana.org) |
### Vagrant
pywb comes with a Vagrantfile to help you set up a VM quickly for testing.
If you have [Vagrant](http://www.vagrantup.com/) and [VirtualBox](https://www.virtualbox.org/)
installed, then you can start a test instance of pywb like so:
git clone https://github.com/ikreymer/pywb.git
cd pywb
vagrant up
After pywb and all its dependencies are installed, the uwsgi server will start up and you should see:
spawned uWSGI worker 1 (and the only) (pid: 123, cores: 1)
At this point, you can open a web browser and navigate to `http://localhost:8080` for testing.
### Automated Tests ### Automated Tests
Currently pywb consists of numerous doctests against the sample archive. Currently pywb consists of numerous doctests against the sample archive.
@ -129,8 +151,8 @@ the location of those files.
#### SURT #### SURT
By default, pywb expects the cdx files to be Sort-Friendly-Url-Transform (SURT) ordering. By default, pywb expects the cdx files to be Sort-Friendly-Url-Transform (SURT) ordering.
This is an ordering that transforms: `example.com` -> `com,example)/` to faciliate better search. This is an ordering that transforms: `example.com` -> `com,example)/` to faciliate better search.
It is recommended for future indexing, but is not required. It is recommended for future indexing, but is not required.
Non-SURT ordered cdx indexs will work as well, but be sure to specify: Non-SURT ordered cdx indexs will work as well, but be sure to specify:
@ -164,7 +186,7 @@ The directions are for running in a shell:
4. Ensure sort order set to byte-order `export LC_ALL=C` to ensure proper sorting. 4. Ensure sort order set to byte-order `export LC_ALL=C` to ensure proper sorting.
5. From the directory of the warc(s), run `<FULL PATH>/warctools/hanzo/cdx_writer mypath/warcs/mywarc.gz | sort > mypath/cdx/mywarc.cdx` 5. From the directory of the warc(s), run `<FULL PATH>/warctools/hanzo/cdx_writer mypath/warcs/mywarc.gz | sort > mypath/cdx/mywarc.cdx`
This will create a sorted `mywarc.cdx` for `mywarc.gz`. Then point `pywb` to the `mypath/warcs` and `mypath/cdx` directories in the yaml config. This will create a sorted `mywarc.cdx` for `mywarc.gz`. Then point `pywb` to the `mypath/warcs` and `mypath/cdx` directories in the yaml config.

@ -0,0 +1,137 @@
# -*- mode: ruby -*-
# vi: set ft=ruby :
$script = <<SCRIPT
apt-get update
apt-get install -y python-dev
apt-get install -y git
apt-get install -y python-pip
pip install virtualenv
sudo -u vagrant virtualenv pywb_env
echo Installing pywb and dependencies via pip... This may take a while.
if [ ! -d pywb ]; then
git clone https://github.com/ikreymer/pywb.git;
cd pywb
sudo -u vagrant ../pywb_env/bin/pip install .
sudo -u vagrant -H sh -c ". ../pywb_env/bin/activate; ./run.sh"
