archive.is is a golang package for archiving web pages via archive.is
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Jay Taylor 7a04128696 Added some TODOs. 2 years ago
_examples Added archive.is-snapshots command. 2 years ago
cmd Added --anyways/-a flag for forcing archival even when there is a recent snapshot. 2 years ago
.gitignore Initial commit. 2 years ago
.travis.yml Go v1.9 or newer to reduce travis churn. 2 years ago
LICENSE Initial commit. 2 years ago
README.md Added some TODOs. 2 years ago
capture.go Added --anyways/-a flag for forcing archival even when there is a recent snapshot. 2 years ago
capture_test.go Crawl result validation. 2 years ago
check_crawl_result_test.go Crawl result validation. 2 years ago
http.go Added archive.is-snapshots command. 2 years ago
search.go Added archive.is-snapshots command. 2 years ago

README.md

archiveis

Documentation Build Status Report Card

About

archive.is is a golang package for archiving web pages via archive.is.

Please be mindful and responsible and go easy on them, we want archive.is to last forever!

Created by Jay Taylor.

Also see: archive.org golang package

TODO

  • Add timeout to .Capture.
  • Consider unifying to single binary

Requirements

  • Go version 1.9 or newer

Installation

go get jaytaylor.com/archive.is/...

Usage

Command-line programs

archive.is <url>

Archive a fresh new copy of an HTML page

archive.is-snapshots <url>

Search for existing page snapshots

Search query examples:

  • microsoft.com for snapshots from the host microsoft.com
  • *.microsoft.com for snapshots from microsoft.com and all its subdomains (e.g. www.microsoft.com)
  • http://twitter.com/burgerking for snapshots from exact url (search is case-sensitive)
  • http://twitter.com/burg* for snapshots from urls starting with http://twitter.com/burg

Go package interfaces

Capture URL HTML Page Content

capture.go:

package main

import (
	"fmt"

	"github.com/jaytaylor/archive.is"
)

var captureURL = "https://jaytaylor.com/"

func main() {
	archiveURL, err := archiveis.Capture(captureURL)
	if err != nil {
		panic(err)
	}
	fmt.Printf("Successfully archived %v via archive.is: %v\n", captureURL, archiveURL)
}

// Output:
//
// Successfully archived https://jaytaylor.com/ via archive.is: https://archive.is/i2PiW
Search for Existing Snapshots

search.go:

package main

import (
    "fmt"
    "time"

    "github.com/jaytaylor/archive.is"
)

var searchURL = "https://jaytaylor.com/"

func main() {
    snapshots, err := archiveis.Search(searchURL, 10*time.Second)
    if err != nil {
        panic(err)
    }
    fmt.Printf("%# v\n", snapshots)
}

// Output:
//
//

Running the test suite

go test ./...

License

Permissive MIT license, see the LICENSE file for more information.