Free to code: Multitask file downloader in Bash in 2 minutes

Thursday, November 15, 2012

Multitask file downloader in Bash in 2 minutes

Suppose you have file with 5,000 urls and you want to download them in parallel, using 50 connections. You can do it using Bash in 2 minutes. Here is how:

MAXJOBS=50
for url in $(cat $UrlsFile); do
    CurJobs=$(jobs | wc -l)
    while [[ "$CurJobs" == "$MAXJOBS" ]]; do
        sleep 0.2
        CurJobs=$(jobs |wc -l)
    done
    curl .... & # download command goes here. NOTE THE "&" sign
done

The idea is plain simple:

Use shell's job control to run jobs on background
Monitor number of current jobs by simply counting lines in jobs command output
If you have 50 jobs already running in parallel - wait until some job will copmlete

Update

Until Bash 4.0 you could wait either for specific job/pid or for all background jobs. That's why I've used sleep in the above example. Since Bash 4.0 you can use wait -n to wait for any single job to terminate. So the code can be rewritten in a more optimal way like follows:

MAXJOBS=50
for url in $(cat $UrlsFile); do
    if [[ "$(jobs | wc -l)" != "$MAXJOBS" ]]; then
        wait -n
    fi
    curl .... & # download command goes here. NOTE THE "&" sign    
done

Free to code

Thursday, November 15, 2012

Multitask file downloader in Bash in 2 minutes

Update

No comments:

Post a Comment

Blog Archive