Skip to content

Commit

Permalink
url pattern matcher
Browse files Browse the repository at this point in the history
  • Loading branch information
ianmilligan1 committed Mar 22, 2016
1 parent 93783c4 commit d0e9212
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions code/Warcbase/Graphx-URL-Pattern.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import org.warcbase.spark.rdd.RecordRDD._
import org.warcbase.spark.matchbox.RecordLoader
import org.warcbase.spark.matchbox.ExtractGraph

val recs = RecordLoader.loadArchives("/collections/webarchives/geocities/warcs/*", sc).keepUrlPatterns(Set("http://geocities.com/EnchantedForest/.*".r))
val graph = ExtractGraph(recs)
graph.writeAsJson("nodes-accession01-static", "links-accession01-static")

0 comments on commit d0e9212

Please sign in to comment.