Suppose you have file with 5,000 urls and you want to download them in parallel, using 50 connections.
You can do it using Bash in 2 minutes. Here is how:
MAXJOBS=50
for url in $(cat $UrlsFile); do
CurJobs=$(jobs | wc -l)
while [[ "$CurJobs" == "$MAXJOBS" ]]; do
sleep 0.2
CurJobs=$(jobs |wc -l)
done
curl .... & # download command goes here. NOTE THE "&" sign
done
The idea is plain simple:
- Use shell's job control to run jobs on background
- Monitor number of current jobs by simply counting lines in
jobs
command output
- If you have 50 jobs already running in parallel - wait until some job will copmlete
Update
Until Bash 4.0 you could wait either for specific job/pid or for
all background jobs. That's why I've used
sleep
in the above example. Since Bash 4.0 you can use
wait -n
to wait for
any single job to terminate. So the code can be rewritten in a more optimal way like follows:
MAXJOBS=50
for url in $(cat $UrlsFile); do
if [[ "$(jobs | wc -l)" != "$MAXJOBS" ]]; then
wait -n
fi
curl .... & # download command goes here. NOTE THE "&" sign
done
No comments:
Post a Comment