Tuesday, March 26, 2024

Rolling velocity of unladen human - the story of inner trust

Working from home has been a new normal for me for quite a while - I've been doing it for the past 7 years and don't see myself longing for daily trips to the office any time soon. But it has its challenges - for me, I need to get out regularly, broadly speaking. Otherwise the subtle sense of unrest begins to accumulate, layer by layer, until nothing seems enjoyable anymore.

For me, the solution lies not in traditional avenues of escape like bars and clubs, but rather in the exhilarating rush of speed. At the age of 33 I obtained my motorcycle licence, relishing the sensation of propelling myself through the air. Yet, motorcycling is not a carefree experience, and even more so as you grow older, have a dear wife, daughter and other responsibilities. I still enjoy it immensely but it's not the kind of experience where you just let go and enjoy the wind. That is, not the thing I would do when I'm tired and have some layers of stress to undo.

During a conversation with my psychologist I confirmed that my inner love for the sensation of wind rushing past my skin is true. Determined to experience this freedom with the need for control, I considered various options. Skydving with an instructor crossed my mind, but then the eureka moment struck: a theme park! I always loved them when I was a kid but was too scared to do the larger rides.

Now, at this stage of my life it looked like a perfect opportunity to both embrace childhood dreams & fears, as well to enjoy the wind!

Living in Melbourne, Gumguya looked like the best option. With my ticket booked, I showed up ready to try my best.

My first conquest was TNT, though I chose a seat farther from the front. Even before we started the climb, adrenaline stirred, coaxing layers of pent-up stress to the surface. And I was getting ready to scream, giving myself permission to scream, as much as I need. It felt like a well-mixed cocktail of "doing the right thing", taking the right care of myself, and being so nervous & scared at the same time. As we hurled downwards, it came time to scream. And I screamed, screamed my lungs out. At one point of the ride, a young lady on my left shouted asking if I'm OK. I could only show her thumbs up.

Once disembarked, with my heart pumping, I had only one thought: "AGAIN!". And on I went, again, and then again, just enjoying myself screaming through the wind.

Then I felt my body and brain are well overstimulated and it's time to absorb. I went for a 30 minute walk around the place until my body stopped feeling all spongy and the heart rate halved back to my normal one.

The next challenge was Project Zero . That scared me a lot. At one pint I got to the front of the line but ultimately baulked. The fear grip was too tight. I could've forced myself but realised I'm here to have fun, and it wasn't fun. Just fear.

I went for another walk, thinking. The whole situation with Project Zero didn't make sense - it's absolutely safe, nothing can happen to me, yet I'm scared, utterly scared. It was irrational - when entering a freeway on my motorbike, I hit cruise control on 100km/h, take my hands of the handlebars and enjoy the ride steering the bike with my body lean - this is objectively a much more scary situation, with danger of death included. Again, irrational.

I thought about astronauts for a moment - they've been truly launched into the void, without any guard rails... Somehow the thought about astronauts lifted me up and gave me courage. I was still scared, but for some reason, courage contained that fear. Renaming Project Zero to Gagarin my head, I did a ride, in the middle seat. Surprisingly it wasn't as scary as my first time TNT, potentially because on Gagarin your body has much more touch points with the sled, whereas with TNT you sit in a kind of a harness.

Then it hit me - my fear is caused by lack of trust!. I didn't trust, so I naturally tried to be in control, of which you have zero when strapped to a moving cart, and then you panic! All I have to do is to trust these contraptions to give me a good ride and have fun. This was the matter of truly letting go! From there it all was just one great ride. I did Gagarin again, on the front seat - and it was nice! I did TNT twice again, on the front row and didn't scream - didn't want to, just leaned on the wind and enjoyed the ride.

The final test to my rediscovery of inner trust was riding Rebel. I just sank into the harness and enjoyed the physics. My jaw was so relaxed I had to pick it up after gravity forces pulled it out during the downswing. Sitting with your legs up in the air at the height of a ten story building is supposed to be scary, but since I trusted nothing could happen to me, I had no reasons for longing to be in control and then be scared & stressed because I obviously couldn't be in control. And after a full day of such an active meditation it was an easy thing to do.

I'm now curious if a theme park will do the trick the next time I feel I need to "get out".

Wednesday, February 7, 2024

Guarding GitHub secrets in your organization

GitHub Actions Secrets security model is a breeze to understand when you are dealing with public, open-source projects. Here, contributors are encouraged to fork your repository and submit changes through a pull request.

But when you're working within a GitHub organization, where collaboration happens on private repositories, the game changes a bit. Forking is usually disabled and contributors are encouraged to commit their proposed changes to a separate branch and then do a pull request to the main branch. To keep things secure, branch protection rules are set for sensitive branches like main to prevent unauthorized push.

Sounds secure, right? But here's the catch: GitHub Action secrets are indeed available when running your workflow from a branch. This is a stark contrast to the fork model for public repos, where secrets are not passed to the runner when a workflow is triggered from a forked repository [1].

People who are new to GitHub orgnizations may be caught off guard here. For instance, if you have a secret, e.g. personal access token, that allowes pushing into a protected branch, you can request this secret into your workflow that runs in a branch and still push your code into the protected branch, effectively bypassing the branch protection rules.

One may argue that if you don't trust your colleagues users to such an extent then you defeinitely have other more important issue to solve (Conway's Law) but let's face it - we all occasionally do silly things innocently - it's part of being human.

But don't worry, there are native solutions to guard against such mishaps.

Let's dive into a practical use case. Suppose I'm a repo admin and would like to periodically run a code generation workflow, say, to update my auto-generated Opentofu configuraiton based on some external data. My main branch is protected, requiring all pull requests to be manually approved - this is, again, to encourage everyone in my organization to contribute by creating branches in my repo, while leaving me with the final say on each pull request to decide if the contribution makes sense.

One approach I could take is to issue a personal access token (PAT), define it as secret in the repo, and then use it in the checkout action to interact with the git repo as yourself:


   steps:
    - name: Checkout
      uses: actions/checkout@v4.1.1
      with:
        token: ${{ secrets.my_personal_access_token }}
  
    - name: Code gen
      run: ./generate_tf.sh
  
    - name: Commit changes if any
      uses: stefanzweifel/git-auto-commit-action@v5.0.0

While this approach works and seems to do what I want, it's severly flawed:

Any org member with the write access to the repo can access my PAT, e.g. when running a workflow in a branch, and hence to commit to the main branch.
Making things worth, they will get write access to any repository I have write access to, including my private repositories outside the organization in question!
- Yes, with Fine-grained access tokens I can now limit token access to a particular repo, but as of Feb 2024 that's limited only to repos in my own account, i.e. to have a PAT applicable to repos in your orgnization, I will need to create a legacy token that allows writes everywhere my user has access to.

Bottom line - don't use PATs in organizations unless you are really willing to give access to everything you got on GitHub.

"Of course" I hear you saying, "let's not use my user, let's use a user. E.g. let's create a user called gh-bot, allow it to bypass pull request protection in certain repos, issue a PAT for them and use it?" - Sure, it will work but is similarly problematic, though not as dangerous as with your own PATs:

You need to take care of guarding credentials for that user, such as using a password manager, etc. Whoever has acess to these credentials, will have write access to the relevant repos.
If you reuse this user over multiple repos, then whoever gets access to the PAT (again, by running a workflow in the branch on one repo) will have access to all those repos as well.
You can of course define one such user for every repo but it's wastefull because you'll need to pay for each of them to have them in your org, and credential management gets even worth.

So what should we do? - We need some kind of machine user, a service account in GCP terms or IAM Role in AWS terms. Thankfully there is a such thing - GitHub Apps. Yes, it may sound confusing - we need an identity, not another app to develop; but don't worry, it provides exactly that and you won't need to write any app code. Here is how it works:

You create a GitHub app. You don't need to use real Homepage URLs or Callback URL, neither implement webhooks - just enter values that makes sense, context wise. Following my previous example let's call it codegen.myrepo.mycomapany.com.
In the app's permissions section, grant the app Read & Write access to Contents.
Add the app to the list of entities allowed to bypass pull request protection rules for your repo's main branch.
Store App ID and private key in your repo env vars / secrets.
Use actions/create-github-app-token to create temporary token to access your github repository.

Here is how your workflow will look like:


    steps:
    - uses: actions/create-github-app-token@v1.7.0
      id: codegen-bot-token
      with:
        app-id: ${{ vars.CODEGEN_APP_ID }}
        private-key: ${{ secrets.CODEGEN_PRIVATE_KEY }}

    - name: Checkout
      uses: actions/checkout@v4.1.1
      with:
        token: ${{ steps.codegen-bot-token.outputs.token }}

    - name: Code gen
      run: ./generate_tf.sh
  
    - name: Commit changes if any
      uses: stefanzweifel/git-auto-commit-action@v5.0.0

This solves the issue of having proper machine users - you can have one app (or more!) per repo and there is neither overhead of managing them nor additional cost incurred.

There is still one issue remaining - any user in your org with the write access to the repo can still have access to the app's private key defined in the repo's secret and hence bypass the main branch protenction rules. Indeed, the blast radius is now very limited to just one branch in one repo, but still, it feels pointless to have branch protection that can be easily (and accidentally) be bypassed by any user in your org.

Thankfully, again, there is a native solution for this - GitHub Actions Environments. (Not to be confused with GitHub Actions Environment Variables. Environments allow fine-grained control to who can access secrets and env vars defined in them.

So instead of storing our app ID / private key in the repo level-secrets, let's:

Create a new environment called "codegen"
Limit access to that environemnt to the main branch
Move our app ID and secret from repolevel into the environment-level

The last touch, is to request access to this environment within the workflow file: Here is how your workflow will look like:


    environment: codegen
    steps:
      ...

With that change in place, re-running your workflow will only work if run on the main. If your colleague will try to run it on any other branch in the repo (with innocous intentions most of the time), they won't be allowed:

That's it! Now our repo access is properly configured!

As you can see, designing GitHub actions workflows with princial of least priviledge mind is not obvious and quite often a well working setup can be surprisingly well open to unintended access. I hope this summary of my research on the subject will save you some time.

References

[1] https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions#using-secrets-in-a-workflow
Making authenticated API requests with a GitHub App in a GitHub Actions workflow
A community thread pointing me in this direction - thank you Florian!

Tuesday, November 28, 2023

Practical DRY vs code readability

About 13 ago, Django docs formally introduced me to the DRY principal - Don't Repeat Yourself, as in "Don't pollute your project with almost identical pieces of code, but rather refactor them into a shared, reusable code base".

It sounds easy and is indeed something people have doning since at least the Multics days. However trying to enforce this principal on myself was quite a struggle. Whenever I spotted repetetion emerging while writing a code, I felt compelled to refactor it immediately. Needless to say, attempting to build a shared functionality for something still in heavy development is tiresome. The context switch between writing what you want to create and refactoring as you go wore me out heavily.

It took me some time to come terms with the perfectionist in myself that I do not have to DRY, that it's a guide, not a law.

Nowadays, when prototyping, I shamelessly repeat myself. Once I get the core functionality working, I review it and apply the following priciples of my own:

Readability trumps DRY
Repeating something twice is totally fine
If repeating logic appears in thee or more places - I'll refactor it out IF it doesn't sacrifice readbility
If I can't refactor with sacrificing readability - I probably have to redesign my code all together

Kick-start your new project with terraformed GCP-GitHub actions key-less auth

In the past, to gain access to our GCP env inside GitHub actions, we used GitHub secrets to store GCP service account keys. It worked but for me it always felt like walking a thin line. Thankfully now GitHub support OICD tokens and we can setup GCP Workload Identity Federation to grant key-less access for our GitHub actions to our GCP environment.

There are plenty of guides out there on how to do it but it takes some effort to follow them, particularly if you want to terraform everthing - it adds the extra work of bootstrapping terraform configuration itself (using local state to create remote state storage, upload state, switch to impersonalization, etc.). Hence, after repeating this a couple of times I decided to have repository template to save time to me and hopefully you as well.

Here is it: https://github.com/zarmory/gcp-github-federation-terraform

What do you get?

After cloning and configuring this repo, with a couple of commands, you'll get the following:

Terraform state bucket created
Terraform service account created and permissions assigned
GitHub OIDC federation set up
Sample GitHub Actions workflows to validate and apply your configuration

All in all just ~100 lines of terraform code, including comments. Basically, just clone, configure and start building.

All of the code is meant to serve as a working example to encourage you hack and modify (rather than highly abstracted resuable module of sorts).

This is merely an annoucement post - if interested, please continue to the repo README for further details.

Thursday, November 16, 2023

Getting GNU Make cloud-ready

The title looks cheeky, but so is the issue.

After release of GNU Make 4.2, the team went quiet for 4 years; however since the COVID times they've been releasing a new minor version about once a year. Surprisingly, upgrade to GNU Make 4.4 caused issues with Google Cloud SDK - the gcloud command.

When working on internal projects, I like to have CLOUDSDK_CORE_PROJECT environment variable populated, but I don't wan't to preset it to a fixed value, because every person on the team have their own playground project, which I want the tool to use as the deployment target. So I came with the following Makefile:


CLOUDSDK_CORE_PROJECT ?= $(shell gcloud config get-value project)
export CLOUDSDK_CORE_PROJECT

release:
	@echo Deploying to project $(CLOUDSDK_CORE_PROJECT)

This way my toolchain will pick user's default project, which usually points to their playground. And if someone wants things done differently they can set CLOUDSDK_CORE_PROJECT explicity, e.g. through .envrc - nice and simple.

This worked very well for years until I upgraded my system and started hitting the following cryptic errors when running make:


$ make
ERROR: (gcloud.config.get-value) The project property is set to the empty string, which is invalid.
To set your project, run:

  $ gcloud config set project PROJECT_ID

or to unset it, run:

  $ gcloud config unset project
ERROR: (gcloud.config.get-value) The project property is set to the empty string, which is invalid.
To set your project, run:

  $ gcloud config set project PROJECT_ID

or to unset it, run:

  $ gcloud config unset project
Deploying to project

After quite a bit of reading and bisecting upgraded packages (which is relatively easy with NixOS) I found that Make 4.4.x is the culprit. Reading through the release notes it was surprised to find a long list of backward incompatability warnings - quite astonishing for such a mature and feature-complete tool like GNU Make. Among them, the following paragraph caught my attention:

Previously makefile variables marked as export were not exported to commands started by the $(shell ...) function. Now, all exported variables are exported to $(shell ...). If this leads to recursion during expansion, then for backward-compatibility the value from the original environment is used. To detect this change search for 'shell-export' in the .FEATURES variable.

Bingo! After that I could quickly reproduce the issue:


$ CLOUDSDK_CORE_PROJECT= gcloud config get-value project
ERROR: (gcloud.config.get-value) The project property is set to the empty string, which is invalid.
To set your project, run:

  $ gcloud config set project PROJECT_ID

or to unset it, run:

  $ gcloud config unset project

So what happens is:

CLOUDSDK_CORE_PROJECT is not set, so Make calls the default
Since this variable is export'ed, Make makes it available to the shell assigning the empty string a value, which breaks gcloud

The fix is simple, though hacky:


CLOUDSDK_CORE_PROJECT ?= $(shell unset CLOUDSDK_CORE_PROJECT; gcloud config get-value project)
export CLOUDSDK_CORE_PROJECT

release:
	@echo Deploying to project $(CLOUDSDK_CORE_PROJECT)

I.e. if the variable is set, the default will not be called. But if it's not, we clear the empty variable from the subshell environment, thus preventing things from breaking.

Eventually trivial, such small issues can easily eat out several hours of one's time, hence I'm sharing this hopefully useful nugget of knowledge on how to make your Makefile cloud-ready :)

Friday, November 11, 2022

Pointing Mozilla SOPS into the right direction

Mozilla SOPS is a neat way to manage your secrets in git. I've been using it a lot in the last years in various project and so far I'm very happy with it.

Today I stumbled up a problem where sops refused to decode my file:


Failed to get the data key required to decrypt the SOPS file.

Group 0: FAILED
  projects/foo/locations/global/keyRings/foo-keyring/cryptoKeys/foo-global-key: FAILED
    - | Error decrypting key: googleapi: Error 403: Cloud Key
      | Management Service (KMS) API has not been used in project
      | 123xxxxxx before or it is disabled. Enable it by visiting
      | https://console.developers.google.com/apis/api/cloudkms.googleapis.com/overview?project=123xxxxxxx
      | then retry. If you enabled this API recently, wait a few
      | minutes for the action to propagate to our systems and
      | retry.
      | Details:
      | [
      |   {
      |     "@type": "type.googleapis.com/google.rpc.Help",
      |     "links": [
      |       {
      |         "description": "Google developers console API
      | activation",
      |         "url":
      | "https://console.developers.google.com/apis/api/cloudkms.googleapis.com/overview?project=123xxxxxxx"
      |       }
      |     ]
      |   },
      |   {
      |     "@type": "type.googleapis.com/google.rpc.ErrorInfo",
      |     "domain": "googleapis.com",
      |     "metadata": {
      |       "consumer": "projects/123xxxxxxx",
      |       "service": "cloudkms.googleapis.com"
      |     },
      |     "reason": "SERVICE_DISABLED"
      |   }
      | ]
      | , accessNotConfigured

Recovery failed because no master key was able to decrypt the file. In
order for SOPS to recover the file, at least one key has to be successful,
but none were.

That is, sops complained that Google KMS service I use to encrypt/decrypt the keys behind the scenes is diabled in my proejct. Which didn't make sense - after all, I created KMS keys in that project so the service must be enabled. I inspected the project id 123xxxxxxx the error was referring to and was surprised to find out that it belongs to a project bar and not the project foo I was working on (and the one where KMS keys where stored at).

After checking environment variables, KMS key location in the encrypted file I had no other options but to try strace on sops binary to find out was causes sops to go with project bar instead of foo. And bingo - it looked at ~/.config/gcloud/application_default_credentials.json file which has quota_project_id parameter pointing straight to bar.

One easy fix is to run gcloud auth application-default set-quota-project foo. It basically tells Google SDK to use foo as a billing project when calling KMS service (KMS API distinguishes between calling project and resource-owning project as explained here. It works but it's a fragile solution - if you are working on several projects in parallel you need to remember to switch back and forth to the correct project since these particular applicatoin-default settings can not be controlled from environment variables.

What is there was a way to simply tell sops (and others) to use project owning the resource (the KMS key in my case) as a billing project as well? Apparently there is a way:


gcloud auth application-default login --disable-quota-project
...
Credentials saved to file: [~/.config/gcloud/application_default_credentials.json]

These credentials will be used by any library that requests Application Default Credentials (ADC).
WARNING: 
Quota project is disabled. You might receive a "quota exceeded" or "API not enabled" error. Run $ gcloud auth application-default set-quota-project to add a quota project.

And voilà - it works!

Friday, January 21, 2022

How to STOP deletion of a Cloud Storage bucket in GCP Cloud Console

What if you need to delete a Cloud Storage bucket with lots of objects (tens of thousands or more)? As per GCP docs your options are:

Cloud Console
Object Lifecycle policies

Cloud Console is very handy in this scenario - a couple of clicks and you are done. However, using it to delete a bucket is akin hiring an anonymous killer - once you kick off a job there is no way to stop it. But what if it was a wrong bucket? - Sitting and watching your precious data melting away is a bitter experience

As you see, the above UI has no "OMG! Wrong bucket! Stop It Please!" button. However there is a hack to still abort a deletion job (othwerise I wouldn't be writing this post, right? :)

To abort the deletion job all you need to do is to call your fellow cloud admin and ask him to deprive you, termporary (or not?) of write permissions to the bucket in question. Once your user is not able to delete objects from the in question, the deletion job will fail:

Cloud Console performs deletions on behalf of your user so once your permissions has been snipped it aborts the deleton job. I only wish there was a "Cancel" button in the Console UI save us from using this inconvenient hack.

Of course data that was deleted up until abortion is aleady gone (15,000 objects in my example above) and the only way to restore it, aside if you had backups, is to have Object Versioning being setup in advance.

Free to code