Recursive GoIndex Downloader: Download all files in folders and sub-folders for GoIndex Website

Description

This source code was developed and improved based on the script posted by pankaj260 (Thanks to pankaj260 this post)

The main features are that

  • Recursive crawler (atlonxp)
  • Download all folders and files in a given url (atlonxp)
  • Adaptive delay in fetching url (atlonxp)
  • Store folders/files directly to your Google Drive (pankaj260)
  • Folders and files exclusion filters

Instructions/Installation/Usage

Go to my colab:
https://colab.research.google.com/drive/1VpAOxP_sM6bTmRM8xmrmbowznZP-mX4u

Disclaimer

This post is provided to help you and for personal use only. Sharing the content of your subscribed materials and the other purchased content is strictly prohibited under 1Hack Terms of Use.

By using this provided material the website 1Hack.us is not responsible for any law infringement caused by the users of this material.

16 Likes

Great work guys. Keep 'em coming. :ok_hand: :clap:

Cheers!!!

… And Over Every Possessor of Knowledge, There is (Some) One (Else) More Knowledgeable.

General instruction

  1. open my code in Google Colab
  2. run Section 1, it will ask you to connect to you Google Drive, and enter an authorization code
  3. login with you google drive, you will get the authorization code
  4. copy the authorization code, and paste in Step 2
  5. in the left sidebar, open the file structure (folder icon), you will see your Google Drive here
  6. find your folder to you want put all downloading fils e.g. “/content/drive/Shared drives/{your team drive}”
  7. right click on folder you want and copy path
  8. In Section 6, edit destination = “{from Step 7}”
  9. In Section 6, edit download_url = “{the GoIndex website}”
  10. run Section 2-6
  11. done

Note

This code is just for “edu.tuts.workers.dev”. It is just a concept, you will need to work a little bit if you want to use this on other websites.

Update

New features

  • Download queue supported
  • Auto-domain URL detection

Coming soon

  • TDQM multiple/paralelled downloader
  • Aria2 integration

Please use this link instead

4 Likes

how do we download a Udacity course with this tool? could you please guide us.

I believe the instruction is given, please take some effort

1 Like

Version 2 is just added

https://colab.research.google.com/drive/1P9R82sEpOuOK27ZGcZ93x2JnfdC35Eif

If you wanna help me out improving this code, go to my Github https://github.com/atlonxp/recursive-goIndex-downloader

Features

  • Recursive crawler (atlonxp)
  • Download all folders and files in a given url (atlonxp)
  • Download all folders and files in in sub-folders (atlonxp)
  • Adaptive delay in fetching url (atlonxp)
  • Store folders/files directly to your Google Drive (pankaj260)
  • Folders and files exclusion filters
  • Download queue supported
  • Auto-domain URL detection
  • API-based GoIndex crawler
  • Parallel/Multiple files downloader

Version 2:

16 April 2020

+ crawler_v2:
	* API-based GoIndex crawler
	* Collecting all urls to be downloaded
+ parallel downloader
	* TDQM progress bar
7 Likes

May the blood of 1000 virgins bless upon you and your descendants.

5 Likes

Can we use this script to copy shared folder and its content of google drive (not goindex) to our own google drive?

It is possibility. if you look at the script carefully you will fine the solution, there is an implication if you use some codes here, you can do that :smiley:

I too have created similar script to download complete site or any course from go index.

4 Likes
22 April 2020 (v2.3.2)
---------------------

+ added summary
+ added Exception when file is unable to download

21 Aprial 2020 (v2.3.1)
---------------------
While crawling, fetching might cause errors sometime due to some quick requests or server is busy.
This problem has caused the eror in getting a json, so we re-fetch the url again (up to MAX_RETRY_CRAWLING)
or until we found key "files" in the return response. Once retries is reached the maximum and
the key "files" is not found, so we ignore this link (return [])

At the end, if you find there is failure, just re-run the download section again. Unless you set
OVERWITE = TRUE, all files will be re-downloaded

+ added MAX_RETRY_CRAWLING (v2.3)
+ fixed FILE_EXISTING_CHECK (stupid) bug
+ added failure-links download task

20 Aprial 2020 (v2.2)
---------------------
Some sub-folders may be password-protected which will cause the error while crawling, so we skip this folder

+ added auto-skip password-protected folder

17 April 2020 (v2.1)
---------------------
+ fixed URL duplicated when crawling
+ added search 'files' key for some websites do not have proper files structure. So, we search it\

16 April 2020 (v2.0)
---------------------
+ crawler_v2:
	* API-based GoIndex crawler
	* Collecting all urls to be downloaded
+ parallel downloader
	* TDQM progress bar
1 Like

Thank you but where are those goindex site?

interesting! so where are they? :thinking:

look like your laughting about my zero level :pensive: The problem is that i m not english native.
I don’t know very well english. So please tell me the goindex site are where ?
by googling this is a gh link https://index.gd.workers.dev/ but they ask for password

look like your laughting about my zero level

My apology if you think I was laughing, actually I am not laughing. It is my way to teach you guys. Every time I replied, I always convince you guys to

  • do some research yourself at the beginning before asking or bothering the others.
  • guide or teach, not to give solutions. (these 2 have completely definitions)

Give a man a fish, and you feed him for a day. Teach a man to fish, and you feed him for a lifetime.

interesting! so where are they? :thinking:

If you look carefully in my script, you would find out immediately what is the GoIndex site. FYI, once I joined this community I had no idea as well what is the GoIndex website.

by googling this is a gh link https://index.gd.workers.dev/

See, you find the answer yourself :smiley: :+1:

but they ask for password

Not as always. Some GoIndex website (hint free courses or in my script) are not password-protected .

Note that if you want to download files from password-protected folders, you need to use different solution (probably version 1 + amendments + password from localstorage).

1 Like

Thank you dear.
I was in a website called coursehunter and after a moment they change all their rules. there were no more free courses after a month. and you have to pay for subscription.
Now I see lot of their course in google shared drive on 1h by some members …by googling i found a site webpremium and when i try to download it s impossible and just stream on the url like lol.coronavirus.worker.dev/coursestitle or worker.dev/coursetitle.
When I compare those addresses with the one in your script i make conclusion that goindex is in relation with free access to paied services cause of the covid-19 pandemic

Well-done!

There so many people downloaded and uploaded those online. FYI, one has created Coursehunter downloader

it’s for free course i think

MAX_DOWNLOAD_TASKS = 32
pool = multiprocessing.Pool(processes=MAX_DOWNLOAD_TASKS) # Num of CPUs

Hi brother, I have a doubt The google
colab free version has only 2 cores by default but in this code it is set 32, could you explain why you chose 32 for the MAX_DOWNLOAD_TASKS ?

Friendly Websites

ettvdl.com https://crackingpatching.com/ https://prostylex.org/ https://haxnode.com/ freecoursesonline.me ftuapps.dev