Skip to content

Instantly share code, notes, and snippets.

@radianttap
Created June 23, 2020 17:21
Show Gist options
  • Save radianttap/37675488a8d4d10506da2b70db87a91a to your computer and use it in GitHub Desktop.
Save radianttap/37675488a8d4d10506da2b70db87a91a to your computer and use it in GitHub Desktop.
Script to download all WWDC 2020 videos, HD video files.
#!/bin/bash
#Setup the environment
mkdir tmp_download
cd tmp_download
#Extract IDs
echo "Downloading the index"
wget -q https://developer.apple.com/videos/wwdc2020/ -O index.html
# find parts of the document where data-released=true, all the way to the first H4 header where title of that talk is
# then find lines containing "videos/play/wwdc2020", then remove all chars except session number, then clean duplicated lines
cat index.html | sed -n '/data-released=\"true\"/,/class=\"video-image-link\"/p' | grep videos/play/wwdc2020 | sed -e 's/.*wwdc2020\///' -e 's/\/\"\ .*//' | sed '$!N; /^\(.*\)\n\1$/!P; D' > ../downloadData
rm index.html
#Iterate through the talk IDs
while read -r line
do
echo "Trying $line"
#Download the page with the real download URL and the talk name
wget -q "https://developer.apple.com/videos/play/wwdc2020/$line/" -O webpage
#We grab the title of the page then clean it up
talkName=$(cat webpage | grep "<title" | sed -e "s/.*\<title\>//" -e "s/ \- WWDC 2020.*//")
#We grep "_hd_" which bring up the download URL, then some cleanup
#If we were to want SD video, all we would have to do is replace _hd_ by _sd_
dlURL=$(cat webpage | grep _hd | sed -e "s/.*href\=//" -e "s/\>.*//" -e "s/\"//g")
pdfURL=$(cat webpage | grep .pdf | grep devstreaming | sed -e "s/.*href\=//" -e "s/\>.*//" -e "s/\"//g" -e "s/ .*$//g")
rm webpage
#Is there a video URL?
if [ -z "$dlURL" ]; then
echo
else
echo "Video $line ($talkName)"
echo " url: $dlURL"
#Great, we download the file
wget -c "$dlURL" -O "../$line - $talkName.mp4"
fi
#Is there a PDF URL?
if [ -z "$pdfURL" ]; then
echo
else
echo "PDF $line ($talkName)"
echo " url: $pdfURL"
#Great, we download the file
wget -c "$pdfURL" -O "../$line - $talkName.pdf"
fi
done < "../downloadData"
#cleanup
cd ..
rm -rf tmp_download
rm downloadData
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment