From fad7292ef338b99cffa893ad2e85f96a0123a5c2 Mon Sep 17 00:00:00 2001 From: Andrew Jackson Date: Fri, 16 Jun 2017 15:10:57 +0100 Subject: [PATCH 1/4] Some clean up and added Slack. --- README.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index b9699c2..af88571 100644 --- a/README.md +++ b/README.md @@ -2,9 +2,10 @@ ## Introduction -An Awesome List for getting started with web archiving. Inspired by the [awesome](https://github.com/sindresorhus/awesome) list. +An [Awesome List](https://github.com/sindresorhus/awesome/blob/master/awesome.md) for getting started with web archiving. Inspired by the [awesome](https://github.com/sindresorhus/awesome) list. ## Table of Contents + * [Training/Documentation](#) * [Tools & Software](#) * [Community Resources](#) @@ -84,7 +85,7 @@ To the extent possible under law, the owner has waived all copyright and related * [Warcat](https://github.com/chfoo/warcat) (Stable) - Tool and library for handling Web ARChive (WARC) files. -### Analysis +#### Analysis * [ArchiveSpark](https://github.com/helgeho/ArchiveSpark) (Stable) - An Apache Spark framework (not only) for Web Archives that enables easy data processing, extraction as well as derivation. @@ -103,6 +104,12 @@ To the extent possible under law, the owner has waived all copyright and related * [IIPC](https://twitter.com/NetPreserve) * [#webarchives](https://twitter.com/search?f=tweets&vertical=default&q=%23webarchives&src=typd) +#### Slack + +* Ask [@netpreserve](https://twitter.com/NetPreserve) for access to the [IIPC Slack](https://iipc.slack.com/) + +---- + ### Deprecated * [pywb Wayback Web Recorder (Archiver)](https://github.com/ikreymer/pywb-webrecorder) (Sunsetted) - A bare-bones example of how to create a simple web recording and replay system. From 9566307af5b341d97f2d71b4565446c8c6be7ad3 Mon Sep 17 00:00:00 2001 From: Andrew Jackson Date: Fri, 16 Jun 2017 16:14:52 +0100 Subject: [PATCH 2/4] Separate the basic and mroe advanced stuff, and add the intro video in. --- README.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index af88571..e17ec3d 100644 --- a/README.md +++ b/README.md @@ -33,9 +33,12 @@ To the extent possible under law, the owner has waived all copyright and related ### Training/Documentation -* [Awesome Memento](https://github.com/machawk1/awesome-memento) -* [Glossary of Archive-It and Web Archiving Terms](https://support.archive-it.org/hc/en-us/articles/208111686-Glossary-of-Archive-It-and-Web-Archiving-Terms) -* [The WARC Ecosystem](http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem) +* Introductions to web archiving concepts: + * [What is a web archive?](https://youtu.be/ubDHY-ynWi0) video from [the UK Web Archive YouTube Channel](https://www.youtube.com/channel/UCJukhTSw8VRj-VNTpBcqWkw) + * [Glossary of Archive-It and Web Archiving Terms](https://support.archive-it.org/hc/en-us/articles/208111686-Glossary-of-Archive-It-and-Web-Archiving-Terms) +* More advanced material: + * [Awesome Memento](https://github.com/machawk1/awesome-memento) + * [The WARC Ecosystem](http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem) ### Tools & Software From f1989be9a474028afa4d5c8dfe8cee41929d46ca Mon Sep 17 00:00:00 2001 From: Andrew Jackson Date: Mon, 26 Jun 2017 16:35:58 +0100 Subject: [PATCH 3/4] Added some new links and detail responding to #22. --- README.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/README.md b/README.md index 7991aec..19f902e 100644 --- a/README.md +++ b/README.md @@ -35,15 +35,25 @@ To the extent possible under law, the owner has waived all copyright and related * Introductions to web archiving concepts: * [What is a web archive?](https://youtu.be/ubDHY-ynWi0) video from [the UK Web Archive YouTube Channel](https://www.youtube.com/channel/UCJukhTSw8VRj-VNTpBcqWkw) + * [Wikipedia's List of Web Archiving Initiatives](https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives) * [Glossary of Archive-It and Web Archiving Terms](https://support.archive-it.org/hc/en-us/articles/208111686-Glossary-of-Archive-It-and-Web-Archiving-Terms) + * [The Web Archiveing Lifecycle Model](https://archive-it.org/blog/post/announcing-the-web-archiving-life-cycle-model/) -- The Web Archiving Lifecycle Model is an attempt to incorporate the technological and programmatic arms of the web archiving into a framework that will be relevant to any organization seeking to archive content from the web. Archive-It, the web archiving service from the Internet Archive, developed the model based on its work with memory institutions around the world. +* [Awesome Memento](https://github.com/machawk1/awesome-memento) * More advanced material: * [Awesome Memento](https://github.com/machawk1/awesome-memento) * [docs.warcbase.org](https://lintool.github.io/warcbase-docs/) * [The WARC Ecosystem](http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem) + * [The Web Crawl section of COPTR](http://coptr.digipres.org/Category:Web_Crawl) * [warcbase workshop](https://github.com/web-archive-group/warcbase_workshop_vagrant) ### Tools & Software +This list of tools and software is intended to briefly describe some of the most important and widely-used tools related to web archiving. For more details, we recommend you refer to (and contribute to!) these excellent resources from other groups: + +* [Comparison of web archiving software](https://github.com/archivers-space/research/tree/master/web_archiving) +* [Awesome Website Change Monitoring](https://github.com/edgi-govdata-archiving/awesome-website-change-monitoring) +* [Web Crawl @ COPTR](http://coptr.digipres.org/Category:Web_Crawl) + #### Acquisition * [ArchiveFacebook](https://addons.mozilla.org/en-US/firefox/addon/archivefacebook/) (Stable) - A [Mozilla Firefox](http://www.mozilla.org/en-US/firefox/new/) add-on for individuals to archive their Facebook accounts. From 46f6d4ac197edc7cd2726c946387f0ba2538df0f Mon Sep 17 00:00:00 2001 From: Andrew Jackson Date: Wed, 26 Sep 2018 10:07:44 +0100 Subject: [PATCH 4/4] Add specific section for web publishers. --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index 603f329..463a003 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,7 @@ An [Awesome List](https://github.com/sindresorhus/awesome/blob/master/awesome.md ## Table of Contents * [Training/Documentation](#trainingdocumentation) + * [Resources for Web Publishers](#resources-for-web-publishers) * [Tools & Software](#tools--software) * [Community Resources](#community-resources) * [Deprecated](#deprecated) @@ -47,6 +48,13 @@ To the extent possible under law, the owner has waived all copyright and related * [The Web Crawl section of COPTR](http://coptr.digipres.org/Category:Web_Crawl) * [warcbase workshop](https://github.com/web-archive-group/warcbase_workshop_vagrant) +### Resources for Web Publishers + +These resources can help when working with individuals or organisations who publish on the web, and who want to make sure their site can be archived. + +* [Stanford Libraries' Archivability pages](https://library.stanford.edu/projects/web-archiving/archivability) +* The [Archive Ready](http://archiveready.com/) tool, for estimating how likely a web page will be archived successfully. + ### Tools & Software This list of tools and software is intended to briefly describe some of the most important and widely-used tools related to web archiving. For more details, we recommend you refer to (and contribute to!) these excellent resources from other groups: