Wednesday, September 26, 2018

Setting GCE snapshot labels en masse

I'm working on our cloud costs analysis and one the things to do here is to assign labels, e.g. service=ci to our resources. We massively use GCE PD snapshots for database backups and I want to label them as well, per service.

You can do it through:

  • Cloud console, max 200 snapshots at a time
  • gcloud, one at a time

    The problem is... I have thousands of snapshots to label. Running gcloud in a simple for loop takes up to several seconds per iteration, so the whole process would take a day. Therefore I crafted a script to do it in parallel, which, thanks for lesser known features of xargs, turned out to be really simple:

    
    #!/bin/bash
    
    set -e
    
    NAME=$1
    LABELS=$2
    
    JOBS=${JOBS:-10}
    
    if ! ([[ "$NAME" ]] && [[ "$LABELS" ]]); then
     echo "Usage: $0 <name substring> <labels>"
     echo "Label format: k1=v1,k2=v2"
     exit 1
    fi
    
    gcloud compute snapshots list \
          --filter "name~$NAME" \
          --format="table[no-heading](name)" | \
     xargs -I @ -P $JOBS -t gcloud compute snapshots update --update-labels=$LABELS @
    
    
    Note that each gcloud instance is in the air for several seconds and occupies ~80MB of RAM, i.e. running on 10 jobs can consume about 1GB of RAM easily. Obviously doing it through GCP APIs by writing a dedicated, say Python, code would not have that RAM issue, but it does not worth the effort in this case.