Skip to content

Commit

Permalink
use footnotes for some of the references
Browse files Browse the repository at this point in the history
  • Loading branch information
ruebot committed Jun 23, 2017
1 parent 157c744 commit 22ae65e
Show file tree
Hide file tree
Showing 5 changed files with 21 additions and 53 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,13 @@
\providecommand\HyField@AuxAddToFields[1]{}
\providecommand\HyField@AuxAddToCoFields[2]{}
\citation{Milligan_etal_JCDL2016}
\citation{twitter_filter}
\citation{twitter_search}
\citation{Driscoll_etal_IJC2014}
\citation{summers_twarc_2015}
\citation{ruest_2016}
\bibstyle{ACM-Reference-Format}
\bibdata{paper7}
\bibcite{Driscoll_etal_IJC2014}{{1}{2014}{{Driscoll and Walker}}{{Driscoll and Walker}}}
\bibcite{Milligan_etal_JCDL2016}{{2}{2016}{{Milligan et~al\unhbox \voidb@x \hbox {.}}}{{Milligan, Ruest, and Lin}}}
\bibcite{ruest_2016}{{3}{2016}{{Ruest}}{{Ruest}}}
\bibcite{summers_twarc_2015}{{4}{2017}{{Summers et~al\unhbox \voidb@x \hbox {.}}}{{Summers, van Kemenadem, Binkley, Ruest, recrm, Costa, Phetteplace, Badger, Matienzo, Blakk, Chudnov, and Nelson}}}
\bibcite{twitter_filter}{{5}{[n. d.]a}{{Twitter}}{{Twitter}}}
\bibcite{twitter_search}{{6}{[n. d.]b}{{Twitter}}{{Twitter}}}
\newlabel{tocindent-1}{0pt}
\newlabel{tocindent0}{0pt}
\newlabel{tocindent1}{4.185pt}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,42 +25,10 @@ @article{Driscoll_etal_IJC2014
year = 2014,
}

@misc{summers_twarc_2015,
author = {Summers, Ed and
van Kemenadem, Hugo and
Binkley, Peter and
Ruest, Nick and
recrm and
Costa, Stefano and
Phetteplace, Eric and
The Gitter Badger and
Matienzo, Mx. A and
Blakk, Lukas and
Chudnov, Dan and
Nelson, Chad},
title = {twarc},
year = {2013--2017},
url = {http://github.com/docnow/twarc}
}

@misc{ruest_2016,
author = "Nick Ruest",
title = "1,203,867 \#elxn42 images",
month = mar,
year = 2016,
url = {http://ruebot.net/post/1203867-elxn42-images}
}

@misc{twitter_search,
author = "Twitter",
title = "The Search API",
url = {https://dev.twitter.com/rest/public/search},
urldate = {2017}
}
@misc{twitter_filter,
author = "Twitter",
title = "Public Streams",
url = {https://dev.twitter.com/streaming/public},
urldate = {2017}
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) (preloaded format=pdflatex 2017.6.22) 22 JUN 2017 15:33
This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) (preloaded format=pdflatex 2017.6.22) 23 JUN 2017 08:57
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
Expand Down Expand Up @@ -953,6 +953,12 @@ Underfull \hbox (badness 10000) in paragraph at lines 45--46

[]

LaTeX Font Info: Font shape `OT1/LinuxLibertineT-TLF/m/n' will be
(Font) scaled to size 9.0pt on input line 47.
LaTeX Font Info: Font shape `T1/LinuxLibertineT-TLF/m/n' will be
(Font) scaled to size 7.3pt on input line 47.
LaTeX Font Info: Font shape `T1/LinuxLibertineT-TLF/m/n' will be
(Font) scaled to size 5.5pt on input line 47.

Underfull \hbox (badness 10000) in paragraph at lines 47--48

Expand Down Expand Up @@ -987,13 +993,13 @@ Package rerunfilecheck Info: File `paper7.out' has not changed.
Package atveryend Info: Empty hook `AtVeryVeryEnd' on input line 58.
)
Here is how much of TeX's memory you used:
17134 strings out of 493029
257255 string characters out of 6136234
368364 words of memory out of 5000000
20089 multiletter control sequences out of 15000+600000
104151 words of font info for 158 fonts, out of 8000000 for 9000
17161 strings out of 493029
257768 string characters out of 6136234
369364 words of memory out of 5000000
20107 multiletter control sequences out of 15000+600000
119035 words of font info for 172 fonts, out of 8000000 for 9000
1302 hyphenation exceptions out of 8191
60i,11n,94p,673b,466s stack positions out of 5000i,500n,10000p,200000b,80000s
60i,11n,94p,753b,466s stack positions out of 5000i,500n,10000p,200000b,80000s
{/usr/share/texlive/texmf-dist/fonts/enc/dvips/libertine/lbtn_naooyc.enc}{/us
r/share/texlive/texmf-dist/fonts/enc/dvips/libertine/lbtn_7grukw.enc}{/usr/shar
e/texlive/texmf-dist/fonts/enc/dvips/libertine/lbtn_nh77jq.enc}{/usr/share/texl
Expand All @@ -1003,10 +1009,10 @@ st/fonts/type1/public/libertine/LinBiolinumTB.pfb></usr/share/texlive/texmf-dis
t/fonts/type1/public/libertine/LinLibertineT.pfb></usr/share/texlive/texmf-dist
/fonts/type1/public/libertine/LinLibertineTB.pfb></usr/share/texlive/texmf-dist
/fonts/type1/public/libertine/LinLibertineTI.pfb>
Output written on paper7.pdf (1 page, 292767 bytes).
Output written on paper7.pdf (1 page, 292060 bytes).
PDF statistics:
75 PDF objects out of 1000 (max. 8388607)
58 compressed objects within 1 object stream
9 named destinations out of 1000 (max. 500000)
34329 words of extra memory for PDF output out of 35830 (max. 10000000)
72 PDF objects out of 1000 (max. 8388607)
55 compressed objects within 1 object stream
10 named destinations out of 1000 (max. 500000)
37401 words of extra memory for PDF output out of 42996 (max. 10000000)

Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,11 @@

\#WomensMarch, \#Aleppo, \#paris, \#bataclan, \#parisattacks, \#porteouverte, \#jesuischarlie, \#jesuisahmed, \#jesuisjuif, \#charliehebdo, \#panamanpapers, and \#exln42 are all different hashtags, but they share several things in common. They are all large newsworthy events. They are datasets that each contain over a million tweets. Most importantly these collections raise some interesting insights in collecting, processing, and analyzing large newsworthy events\cite{Milligan_etal_JCDL2016}.\\

Collecting tweets from these events can be challenging because of timing. Tweets can be collected from the Filter API\cite{twitter_filter} and Search API\cite{twitter_search}. Both having their own caveats. The Filter API only captures the current Twitter stream, and is limited to collecting up to 1\% of the overall Twitter stream. The Search API allows you to collect more than 1\% of the overall Twitter stream\cite{Driscoll_etal_IJC2014}, but one can only collect up to 18,000 every 15 minutes, and is limited to a 7 day window. Generally, using a strategy of using the Filter and Search API to capture a given event is the best.\\
Collecting tweets from these events can be challenging because of timing. Tweets can be collected from the Filter API\footnote{https://dev.twitter.com/streaming/public} and Search API\footnote{https://dev.twitter.com/rest/public/search}. Both having their own caveats. The Filter API only captures the current Twitter stream, and is limited to collecting up to 1\% of the overall Twitter stream. The Search API allows you to collect more than 1\% of the overall Twitter stream\cite{Driscoll_etal_IJC2014}, but one can only collect up to 18,000 every 15 minutes, and is limited to a 7 day window. Generally, using a strategy of using the Filter and Search API to capture a given event is the best.\\

DocNow's twarc\cite{summers_twarc_2015} includes a number of utilities to process a dataset after collection. These tools allow a researcher, librarian, or archivist to filter their dataset(s) down to what is needed for appraisal, and then accession. Noteworthy tools include; deduplication, source, retweets, date/times, users, and hashtags.\\
DocNow's twarc\footnote{http://github.com/docnow/twarc} includes a number of utilities to process a dataset after collection. These tools allow a researcher, librarian, or archivist to filter their dataset(s) down to what is needed for appraisal, and then accession. Noteworthy tools include; deduplication, source, retweets, date/times, users, and hashtags.\\

DocNow's utilities can be further used to curate related collections. One can extract all the urls of a dataset, unshorten them, and extract the unique urls to use as a seed list for a web crawler to capture websites related to a given event. One can also extract all of the image urls, and download all images associated with a dataset, which then can be used for image analysis\cite{ruest_2016}, presentation, and/or preservation.\\
DocNow's utilities\footnote{https://github.com/DocNow/twarc/tree/master/utils} can be further used to curate related collections. One can extract all the urls of a dataset, unshorten them, and extract the unique urls to use as a seed list for a web crawler to capture websites related to a given event. One can also extract all of the image urls, and download all images associated with a dataset, which then can be used for image analysis\cite{ruest_2016}, presentation, and/or preservation.\\

In conclusion, this presentation will provide an overview of collection strategy, insights from processing and analysis, ensuing web crawls, and image presentation from each collection.\\

Expand Down

0 comments on commit 22ae65e

Please sign in to comment.