Tuesday, October 2, 2018

How Google turned a simple feature into configuration disaster

A grain of frustration with a light in the end of the tunnel.

Back in a while G Suite had a simple way to create email distribution lists - in the admin panel you could simply create a group of users, e.g. support@example.com, add a couple, decide whether users outside of your organization can send an email to the group address, and you are done.

Today I tried to do the same - to create a distribution list - with the current version of G Suite for Business, which ended up in hours of time effort and a lengthy conversation with G Suit support.

First I tried to create a group and to send an email to its address - nope, does not work:

Grou-what? A Google Group? I don't want no Google Group! I want, you know, a distribution list!

Clicking on "Access Settings" of the group properties in G Suite leads to a particular settings page on... groups.google.com. Just one of 21(!) other setting pages! My day schedule didn't include a ramp up on Google Groups, so I hooked into support chat straight away. The Google guy explained to me that with G Suite Business the only option is to configure a particular Google Group to behave as a distribution list.

First we worked on enabling people outside of the organization to post (send emails) to the group. Once we configured the setting I was about to jump away to test it, but the guy told me, citing:

Zaar Hai: OK. Saving and testing. Can you please hold on? 
G Suite Support, Jay: Wait. 
Zaar Hai: OK :)
G Suite Support, Jay: That is not going to work right away.
G Suite Support, Jay: We have what we called propagation.
G Suite Support, Jay: You need to wait for 24 hours propagation for the changes to take effect.
G Suite Support, Jay: Most of the time it works in less than 24 hours.
Zaar Hai: Seriously??
Zaar Hai: I thought I'm dealing with Google... 
G Suite Support, Jay: Yes, we are Google.

24 hours! After soothing myself, I decided to give it a shot - it worked! The configuration propagated quite fast it seems.

It was still not a classic distribution list though, since all of the correspondence was archived and ready to be seen on groups.google.com. I didn't want this behaviour, so we kept digging. Eventually the guy asked me to reset the group to the Email list type:

The messages still got archived though, so we blamed it on the propagation and the guy advised me to come back next day if it still does not work.

Well, after taking 24 hours brake, it still didn't. I did a bit of settings exploration myself and found that there is dedicated toggle responsible for the message archiving. Turns out the reset does not untoggle it. Once disabled, it propagated within a minute.

That was a frustration part. Now the light - a guide on how to have a distribution list with G Suit.

How to configure a G Suite groups to behave like a distribution list

Step 1: Create a group

Create a group in G Suite admin console. If you need just an internal mailing list, that is for members only, and are fine with the message archiving, then you are done. If you need outside users to be able to send emails to it (like you probably do with, e.g. sales@example.com), then read on.

Step 2: Enabling external access

  • Go to groups.google.com.
  • Click on "My groups" and then on manage link under the name of the group in question
  • On the settings page, navigate to Permissions -> Basic permis... in the menu on the left
  • In the Post row drop-down select "Anyone on the web". Click Save and you should be done

This is almost a classic distribution list - we only need to disable archiving.

Step 3: Disable archiving

Eventually I discovered that archiving is controlled by the toggle located under Information -> Content control in the settings menu:

In my case, the change went into effect immediately.


  • Doing all of the above steps may be quite daunting on a system administrator that needs to manage many groups. Why not to have a shortcut right in G Suite admin console to make it easier?
  • 24h propagation period sounds like some blast from the past. The Google guy told me that any G Suite setting change can take up to 24 hours to take effet. Back to the future now, Google offers a distributed database with cross-continental ACID transactions, which makes me wonder about the reasons behind 24h propagation period.

Wednesday, September 26, 2018

Setting GCE snapshot labels en masse

I'm working on our cloud costs analysis and one the things to do here is to assign labels, e.g. service=ci to our resources. We massively use GCE PD snapshots for database backups and I want to label them as well, per service.

You can do it through:

  • Cloud console, max 200 snapshots at a time
  • gcloud, one at a time

    The problem is... I have thousands of snapshots to label. Running gcloud in a simple for loop takes up to several seconds per iteration, so the whole process would take a day. Therefore I crafted a script to do it in parallel, which, thanks for lesser known features of xargs, turned out to be really simple:

    set -e
    if ! ([[ "$NAME" ]] && [[ "$LABELS" ]]); then
     echo "Usage: $0 <name substring> <labels>"
     echo "Label format: k1=v1,k2=v2"
     exit 1
    gcloud compute snapshots list \
          --filter "name~$NAME" \
          --format="table[no-heading](name)" | \
     xargs -I @ -P $JOBS -t gcloud compute snapshots update --update-labels=$LABELS @
    Note that each gcloud instance is in the air for several seconds and occupies ~80MB of RAM, i.e. running on 10 jobs can consume about 1GB of RAM easily. Obviously doing it through GCP APIs by writing a dedicated, say Python, code would not have that RAM issue, but it does not worth the effort in this case.
  • Friday, September 21, 2018

    Docker multi-stage builds for Python apps

    Previously I highly regarded Multi-Stage Docker Image build approach, though it was not immediately clear how to apply it to Python applications.

    In Python you install application dependencies and (preferably) the application itself using pip tool. When we run it during image build, pip just installs everything under /usr so there is no immediate way to copy artifacts (that is the app and its dependencies installed by pip) into the next build stage.

    The solution that I came up with is to coerce pip to install everything into a dedicated directory. There are many ways of doing so, but from my experiments I found installing with --user flag and properly setting PYTHONUSERBASE as the most convenient way to install both Python libraries and app binaries (e.g. entrypoint scripts).

    Eventually it's quite straight forward and I wonder why I didn't find any formal guides on this.

    Without further ado, let's see how it can be done.


    Lets use a sample Python Hello World project that contains a proper setup.py to install both the app's libs and the entrypoint script.

    Note: I urge you to use setup.py even if you don't plan to distribute your app. Simply copying your Python sources into docker image will eventually break - you may end up copying __pycache__ directories, tests, tests fixtures, etc. Having a working setup.py makes it easy to use your app as an installable component in other apps/images.

    Let's setup our test environment:

    git clone git@github.com:haizaar/python-helloworld.git
    cd python-helloworld/
    # Add some artificial requirements to make the example more real
    echo pycrypto==2.6.1 > requirements.txt

    The Dockerfile

    All the "magic" is happening below. I've added inline comments to ease on reading.
    FROM alpine:3.8 AS builder
    # This is our runtime
    RUN apk add --no-cache python3
    RUN ln -sf /usr/bin/pip3 /usr/bin/pip
    RUN ln -sf /usr/bin/python3 /usr/bin/python
    # This is dev runtime
    RUN apk add --no-cache --virtual .build-deps build-base python3-dev
    # Using latest versions, but pinning them
    RUN pip install --upgrade pip==18.0
    RUN pip install --upgrade setuptools==40.4.1
    # This is where pip will install to
    ENV PYROOT /pyroot
    # A convenience to have console_scripts in PATH
    WORKDIR /build
    # Install dependencies
    COPY requirements.txt ./
    RUN pip install --user -r requirements.txt
    # Install our application
    COPY . ./
    RUN pip install --user .
    # Production image #
    FROM alpine:3.8 AS prod
    # This is our runtime, again
    # It's better be refactored to a separate image to avoid instruction duplication
    RUN apk add --no-cache python3
    RUN ln -sf /usr/bin/pip3 /usr/bin/pip
    RUN ln -sf /usr/bin/python3 /usr/bin/python
    ENV PYROOT /pyroot
    # This is crucial for pkg_resources to work
    # Finally, copy artifacts
    COPY --from=builder $PYROOT/lib/ $PYROOT/lib/
    # In most cases we don't need entry points provided by other libraries
    COPY --from=builder $PYROOT/bin/helloworld_in_python $PYROOT/bin/
    CMD ["helloworld_in_python"]

    Let's see that it works:

    $ docker build -t pyhello .
    $ docker run --rm -ti pyhello
    Hello, world

    As I mentioned before - it's really straight forward. So far I've managed to pack one of our real apps with the approach and it works well so far.

    Using pipenv?

    If you use pipenv, which I like a lot, you can happily apply the same approach. It's a bit tricky to coerce pipenv to install into a separate dir, but this command does the trick:
    WORKDIR /build
    # Install dependencies
    COPY Pipfile Pipefile.lock ./
    PIP_USER=1 pipenv install --system --deploy
    # Install our application
    COPY . ./
    RUN pip install --user .

    Tuesday, September 18, 2018

    Reducing docker image sizes

    Several years ago I started a new greenfield project. Based on the state of technology affairs back then, I decided to go full time with container technology - Docker and Kubernetes. We head dived into all of the new technologies and had our application started pretty fast. Back then the majority of Docker Library was based on Debian and it resulted in quite a large images - our average Python app container image weights about 700-1000MB. Finally the time has come to rectify it.

    Why do you care

    Docker images are not pulled too often and 1GB is not too big of a number in the age of clouds, so why do you care? Your mileage may vary, but these are our reasons:
    • Image pull speed - on GCE it takes about 30 seconds to pull 1GB image. While downloads when pulling from GCR are almost instant, extraction takes a notable amount of time. When a GKE node crashes and pods migrate to other nodes, the image pull time adds to your application downtime. To compare - pulling of 40MB coredns image from GCR takes only 1.3 seconds.
    • Disk space on GKE nodes - when you have lots of containers and update them often, you may end up with disk space pressure. Same goes for developers' laptops.
    • Deploying off-cloud - pulling gigabytes of data is no fun when you try that over saturated 4G network during a conference demo.

    Here are the strategies current available on the market.

    Use alpine based images

    Sounds trivial right? - they are around for quite some time already and the majority of the Docker Library has an -alpine variant. But not all alpine images were born the same:

    Docker Library alpine variant of Python:

    $ docker pull python3.6-alpine
    $ docker images python:3.6-alpine --format '{{.Size}}'

    DIY alpine Python:

    $ cat Dockerfile
    FROM alpine:3.8
    RUN apk add --no-cache python3
    $ docker build -t alpython .
    $ docker images alpython --format '{{.Size}}'

    This is %25 size reduction compared to Docker Library Python!

    Note: There is another "space-saving" project that provides a bit different approach - instead of providing a complete Linux distro, albeit a smaller one, they provide a minimal runtime base image for each Language. Have a look at Distroless.

    Avoid unnecessary layers

    It's quite natural to write your Dockerfile as follows:
    FROM alpine:3.8
    RUN apk add --no-cache build-base
    COPY hello.c /
    RUN gcc -Wall -o hello hello.c
    RUN apk del build-base
    RUN rm -rf hello.c

    It provides nice reuse of layers cache, e.g. if hello.c changes, then we can still reuse installation of build-base package from cache. There is one problem through - in the above example, the resulting image weights 157MB(!) through actual hello binary is just 10KB:

    $ cat hello.c 
    #include <stdio.h>
    #include <stdlib.h>
    int main() {
     printf("Hello world!\n");
     return EXIT_SUCCESS;
    $ docker build -t layers .
    $ docker images layers --format '{{.Size}}'
    $ docker run --rm -ti layers ls -lah /hello
    -rwxr-xr-x    1 root     root       10.4K Sep 18 10:45 /hello

    The reason is that each line in Dockerfile produces a new layer that constitutes a part of the image, even-through the final FS layout may not contain all of the files. You can see the hidden "convicts" using docker history:

    $ docker history layers
    IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
    8c85cd4cd954        16 minutes ago      /bin/sh -c rm -rf hello.c                       0B                  
    b0f981eae17a        17 minutes ago      /bin/sh -c apk del build-base                   20.6kB              
    5f5c41aaddac        17 minutes ago      /bin/sh -c gcc -Wall -o hello hello.c           10.6kB              
    e820eacd8a70        18 minutes ago      /bin/sh -c #(nop) COPY file:380754830509a9a2…   104B                
    0617b2ee0c0b        18 minutes ago      /bin/sh -c apk add --no-cache build-base        153MB               
    196d12cf6ab1        6 days ago          /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B                  
    <missing>           6 days ago          /bin/sh -c #(nop) ADD file:25c10b1d1b41d46a1…   4.41MB       

    The last - a missing one - is our base image, the third line from the top is our binary and the rest is just junk.

    Squash those squishy bugs!

    You can build docker images with --squash flag. What is does is essentially leaving your image with just two layers - the one you started FROM; and another one that contains only files that are visible in a resulting FS (minus the FROM image).

    It plays nice with layer cache - all intermediate images are still cached, so building similar docker images will yield in a high cache hit. A small catch - it's still considered experimental, though the feature available since Docker 1.13 (Jan 2017). To enable it, run your dockerd with --experimental or add "experimental": true to your /etc/docker/daemon.json. I'm also not sure about its support for SaaS container builders, but you can always spin your own docker daemon.

    Lets see it in action:

    # Same Dockerifle as above
    $ docker build --squash -t layers:squashed
    $ docker images layers:squashed --format '{{.Size}}'

    This is exactly our alpine image with 10KB of hello binary:

    $ docker inspect layers:squashed | jq '.[].RootFS.Layers'  # Just two layers as promised
    $ mkdir /tmp/img && docker save layers:squashed | tar -xC /tmp/img; du -hsc /tmp/img/*
    52K /tmp/img/118227640c4bf55636e129d8a2e1eaac3e70ca867db512901b35f6247b978cdd
    4.5M /tmp/img/1341a124286c4b916d8732b6ae68bf3d9753cbb0a36c4c569cb517456a66af50
    4.0K /tmp/img/712000f83bae1ca16c4f18e025c0141995006f01f83ea6d9d47831649a7c71f9.json
    4.0K /tmp/img/manifest.json
    4.0K /tmp/img/repositories
    4.6M total


    Nothing is perfect though. Squashing your layers reduces potential for reusing them when pulling images. Consider the following:

    $ cat Dockerfile
    FROM alpine:3.8
    RUN apk add --no-cache python3
    RUN apk add --no-cache --virtual .build-deps build-base openssl-dev python3-dev
    RUN pip3 install pycrypto==2.6.1
    RUN apk del .build-deps
    COPY my.py /  # Just one "import Crypto" line
    $ docker build -t mycrypto .
    $ docker build --squash -t mycrypto:squashed .
    $ docker images mycrypto
    REPOSITORY          TAG                 IMAGE ID            CREATED              SIZE
    mycrypto            squashed            9a1e85fa63f0        11 seconds ago       58.6MB
    mycrypto            latest              53b3803aa92f        About a minute ago   246MB

    The difference is very positive - comparing the basic Python Alpine image I have built earlier, the squashed one here is just 2 megabytes larger. The squashed image has, again, just two layers: alpine base and the rest of our Python, pycrypto, and our code squashed.

    And here is the downside: If you have 10 such Python apps on your Docker/Kubernetes host, you are going to download and store Python 10 times, and instead of having 1 alpine layer (2MB), one Python layer (~50MB) and 10 app layers (10x2MB) which is ~75MB, we end up with ~600MB.

    One way to avoid this is to use proper base images, e.g. instead of basing on alpine, we can build our own Python base image and work FROM it.

    Lets combine

    Another technique which is widely employed is combining RUN instructions to avoid "spilling over" unnecessary layers. I.e. the above docker can be rewritten as follows:
    $ cat Dockerfile-comb 
    FROM alpine:3.8
    RUN apk add --no-cache python3  # Other Python apps will reuse it
    RUN set -ex && \
     apk add --no-cache --virtual .build-deps build-base openssl-dev python3-dev && \
     pip3 install pycrypto==2.6.1 && \
     apk del .build-deps
    COPY my.py /
    $ docker build -f Dockerfile-comb -t mycrypto:comb .
    $ docker images mycrypto
    REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
    mycrypto            comb                4b89e6ea6f72        7 seconds ago       59MB
    mycrypto            squashed            9a1e85fa63f0        38 minutes ago      58.6MB
    mycrypto            latest              53b3803aa92f        39 minutes ago      246MB
    $ docker inspect  mycrypto:comb | jq '.[].RootFS.Layers'

    The end result is similar to a squashed one and now we can control the layers.

    Downsides? There are some.

    One is a cache reuse, or lack thereof. Every single image will have to install build-base over and over. Consider some real example which has 70 lines long RUN instruction. You image may take 10 minutes to build and changing a single line in that huge instruction will start it all over.

    Second is that development experience is somewhat hackish - you resort from Dockerfile mastery to shell witchery. E.g. you can easily overlook a space character chat crept after trailing backslash. This increases development times and ups our frustration - we all are humans.

    Multi-stage builds

    This feature is so amazing that I wonder why it is not very famous. It seems like only hard-core docker builders are aware of it.

    The idea is to allow one image to borrow artifacts from another image. Lets apply it for the example that compiles C code:

    $ cat Dockerfile-multi 
    FROM alpine:3.8 AS builder
    RUN apk add --no-cache build-base
    COPY hello.c /
    RUN gcc -Wall -o hello hello.c
    RUN apk del build-base
    FROM alpine:3.8
    COPY --from=builder /hello /
    $ docker build -f Dockerfile-multi -t layers:multi .
    $ docker images layers
    REPOSITORY          TAG                 IMAGE ID            CREATED              SIZE
    layers              multi               98329d4147f0        About a minute ago   4.42MB
    layers              squashed            712000f83bae        2 hours ago          4.44MB
    layers              latest              a756fa351578        2 hours ago          157MB

    That is, the size is as good as it gets (even a bit better, since our squashed variant still has couple of apk metadata left by). It works just great for toolchains that produce clearly distinguishable artifacts. Here is another (simplified) example for nodejs:

    FROM alpine:3.8 AS builder
    RUN apk add --no-cache nodejs
    COPY src /src
    WORKDIR /src
    RUN npm install
    RUN ./node_modules/.bin/jspm install
    RUN ./node_modules/.bin/gulp export  # Outputs to ./build
    FROM alpine:3.8
    RUN apk add --no-cache nginx
    COPY --from=builder /src/build /srv/www

    It's more tricky for other toolchains like Python where it's not immediately clear how to copy artifacts after pip-install'ing your app. The proper way to do it, for Python, it yet to be discovered (for me).

    I will not describe other perks of this feature since Docker's documentation on the subject is quite verbose.


    As you can probably tell there is no one ultimate method to rule them all. Alpine images are no-brainer; multi-stage provides nice & clean separation, but I lack RUN --from=...; squashing has its trade-offs; and humongous RUN instructions are still a necessary evil.

    We use multi-stage approach for our nodejs images and mega-RUNs for Python ones. When I find a clean way to extract pip's artifacts, I will definitely move to multi-stage builds there as well.

    Monday, August 6, 2018

    Running docker multi-stage builds on GKE

    I recently worked on reducing docker image sizes for our applications and one of the approaches is to use docker multi-stage builds. It all worked well on my dev machine, but then I shoved new Dockerfiles to CI and and it all shuttered complaining that our docker server is way too old.

    The thing is that GKE K8s nodes still use docker server v17.03, even on the latest K8s 1.10 they have available. If you like us run your Jenkins on GKE as well, and use K8s node's docker server for image builds, then this GKE lag will bite you one day.

    There is a solution though - run your own docker server and make Jenkins to use it. Fortunately the community thought about it before and official docker images for docker itself include -dind flavour which stands for Docker-In-Docker.

    Our Jenkins talked to host's docker server through /var/run/docker.sock that was mounted from host. Now instead we run DInD as a deployment and talk to it through GCP:

    apiVersion: extensions/v1beta1
    kind: Deployment
      name: dind
      replicas: 1
        type: Recreate
            component: dind
          - name: dind
            image: docker:18.06.0-ce-dind
            - name: DOCKER_HOST
              value: tcp://
              - dockerd
              - --storage-driver=overlay2
              - -H tcp://
            - name: http
              containerPort: 2375
              privileged: true
            - name: varlibdocker
              mountPath: /var/lib/docker
                path: /v1.38/info
                port: http
                path: /v1.38/info
                port: http
          - name: varlibdocker
            emptyDir: {}
    apiVersion: v1
    kind: Service
      name: dind
        component: dind
        component: dind
      - name: http
        targetPort: http
        port: 2375

    After loading it into your cluster you can add the following environment variable to your Jenkins containers: DOCKER_HOST=tcp://dind:2375 and verify that you are now talking to your new & shiny docker server 18.06:

    root@jenkins-...-96d867487-rb5r8:/# docker version
     Version: 17.12.0-ce
     API version: 1.35
     Go version: go1.9.2
     Git commit: c97c6d6
     Built: Wed Dec 27 20:05:38 2017
     OS/Arch: linux/amd64
      Version: 18.06.0-ce
      API version: 1.38 (minimum version 1.12)
      Go version: go1.10.3
      Git commit: 0ffa825
      Built: Wed Jul 18 19:13:39 2018
      OS/Arch: linux/amd64
      Experimental: false

    Caveat: the setup I'm describing uses emptyDir to store built docker images and cache, i.e. restarting pod will empty the cache. It's good enough for my needs, but you may consider using PV/PVC for persistence, which on GKE is trivial to setup. Using emptyDir will also consume disk space from you K8s node - something to watch for if you don't have an automatic job that purges older images.

    Another small bonus of this solution that now running docker images on your Jenkins pod will only return images you have built. Previously this list would also include images of container that currently run on the node.

    Thursday, December 7, 2017

    Quick test for GCP inter-zone networking

    Prologue: It took a year to move to Down Under and another 6 months to settle here, or at least to start feeling settled, but it looks like I'm back to writing, at least.

    I'm in the process of designing how to move our systems to multi-zone deployment in GCP and wanted to have a brief understanding of the network latency and speed impacts. My Google-fu didn't yield any recent benchmarks on the subject, so I decided to run a couple of quick checks myself and share the results.


    We are running in us-central1 zone and using n1-highmem-8 (8 CPUs / 50Gb RAM) instances as our main work horse. I've setup one instance in each of the zones - a, b, and c; with additional instance in zone a to measure intra-zone latency.

    VMCREATOR='gcloud compute instances create \
                      --machine-type=n1-highmem-8 \
                      --image-project=ubuntu-os-cloud \
    $VMCREATOR --zone=us-central1-a us-central1-a-1 us-central1-a-2
    $VMCREATOR --zone=us-central1-b us-central1-b
    $VMCREATOR --zone=us-central1-c us-central1-c


    I used ping to measure latency, the flooding version of it:

    root@us-central1-a-1 $ ping -f -c 100000 us-central1-b
    Here are the results:
    A A
    rtt min/avg/max/mdev = 0.041/0.072/2.882/0.036 ms, ipg/ewma 0.094/0.066 ms
    A B
    rtt min/avg/max/mdev = 0.132/0.193/7.032/0.073 ms, ipg/ewma 0.209/0.213 ms
    A C
    rtt min/avg/max/mdev = 0.123/0.189/4.110/0.060 ms, ipg/ewma 0.205/0.190 ms
    B C
    rtt min/avg/max/mdev = 0.123/0.176/4.399/0.047 ms, ipg/ewma 0.189/0.161 ms

    While inter-zone latency is twice as big as intra-zone latency, it's still within typical LAN figures. Mean deviation is quite low as well. Too bad that ping can't count percentiles.


    I used iperf tool to measure throughput. Both unidirectional (each way) and bidirectional throughputs were measured.
    • Server side: iperf -s
    • Client side: iperf -c -t 60 -r and iperf -c -t 60 -d

    Note: iperf has a bug where in client mode it ignores any parameters specified before client host, therefore it's crucial to specify the host as a first parameter.

    Here are the results. All throughput numbers are in gigabits.

    ZonesSendReceiveSend + Receive
    A & A12.013.98.12 + 10.1
    A & B7.968.224.57 + 6.30
    A & C6.878.513.97 + 5.98
    B & C5.757.513.05 + 3.96


    I remember reading in GCP docs, that their zones are kilometers away from each other, yet, according to the above quick tests, they still can be treated as one huge 10Gbit LAN - that's pretty impressive. I know such technology is available for quite some time already, but it's still impressive to have it now readily available to anyone, anytime.

    Saturday, April 15, 2017

    My sugar findings

    The posts in this blog is usually about technology subjects. However I'm on vacation for the last week and have spent several days reading about sugar and products containing it, mostly from Wikipedia. Below is the summary of my findings. Please note that I did not study neither chemistry not biology since 9th grade, so please bear with me for possible inaccuracies.

    Appetizer: In the year of 2015, the world has produced 177 million tons of sugar (all types combined). This is 24 kilograms per person per year, or 70 gram per day, and surely much higher in industrialized countries.


    AKA “Simple sugars”. These are the most basic types of sugar - they can not be further hydrolyzed to simpler compounds. Those relevant for humans are glucose, fructose and galactose - they are the only ones that human body can directly absorb through small intestine. Glucose can be used directly by body cells, while fructose and galactose are directed to liver for further pre-processing.

    Glucose is not “bad” per-se - it’s a fuel of most living organisms on earth, including humans. However high amounts of glucose, as well as other monosaccharides, can lead to insulin resistance (diabetes) and obesity. Another problem related to intake of simple sugars, is that they are fueling acid-producing bacteria living in mouth that leads to dental caries.


    Primary sources of monosaccharides in human diet are fruits (both fresh and dried), honey and, recently, HFCS - High Fructose Corn Syrup. On top of that, inverted sugar is also in use, but I will cover it separately later on.

    While fruits contain high percentage of fructose, it comes together with good amount of other beneficial nutrients, e.g. dietary fiber, vitamin C and potassium. For that, fruits should not be discarded because of their fructose content - they overall are healthy products and commonly are not a reason for overweight or obesity. For example, two thirds of Australians are overweight or obese, while an average Australian eats only about one piece of fruit a day.

    Note: It’s quite common in the food industry to treat dried fruits with sulfur dioxide, which is a toxic gas in its natural form. The health effects of this substance are still disputed, but since it’s done to increase shelf life and enhance visual appeal of the product, i.e. to benefit producer and not end user, I do not see a reason to buy dried fruits treated with it. Moreover, I’ve seen products labeled as organic, that still contained sulfur dioxide, i.e. the fruits themselves were from organic origin, but were treated with sulfur dioxide.

    Honey, one the other hand, while generally perceived as “healthy food” is actually a bunch of empty calories. An average honey consists of 80% of sugars and 17% of water, particularly, 38% of fructose and 31% of glucose. Since honey is supersaturated liquid, containing more sugar than water, glucose tends to crystallize into solid granules floating in fructose syrup.

    Note: one interesting source of honey is a honeydew secretion.

    Finally, HFCS, is a sweetener produced from corn starch by breaking its carbohydrates into glucose and fructose. The resulting solution is about 50/50% on glucose/fructose (in their free form), but may vary between manufactures. This sweetener is generally available since 1970, shortly after discovery of enzymes necessary for its manufacturing process. There were some health concerns about HFCS, however nowadays they are generally dismissed - i.e. HFCS is not better of worth than any other added sugar, which, again, in case of excess intake can lead to obesity and diabetes.


    Disaccharide is a sugar that is formed by two joined monosaccharides. The most common examples are:
    • Lactose: glucose + galactose
    • Maltose: glucose + glucose
    • Sucrose: glucose + fructose
    Disaccharides can not be absorbed by human body as they are, but require to be broken down, or hydrolyzed, to monosaccharides. To speed up the process and allow fast enough absorption, enzymes are secreted by small intestine, where disaccharides are hydrolyzed and absorbed. Dedicated enzyme is secreted for each disaccharide type, e.g. lactase, maltase and sucrase. Insufficient secretion, or lack thereof, results in body intolerance to a certain types of disaccharides, i.e. inability to absorb them in small intestine. In such case they are passed on into large intestine, where various bacteria metabolize them and the resulting fermentation process produces gases leading to detrimental health effects.

    Another issue with disaccharides is that they, together with monosaccharides, provide food food to acid-producing bacteria leading to dental caries. Sucrose particularly shines here allowing anaerobic environments that boost acid production by the bacteria.

    Lactose is naturally found in dairy products, but some sources say that it’s often added to bread, snacks, cereals, etc. I don’t quite remember lactose being listed on products, at least in Israel, and though I did not research on the subject, my guess is this is because it will convert products to milk-kosher, and thus can limit their consumption by end user. I did not study lactose any further. Maltose is a major component of brown rice syrup - this is how I’ve stumbled upon it initially.

    Sucrose, or “table sugar”, or just “sugar” is the king of disaccharides, and all of the sweeteners together. The rest of this post will be mainly dedicated to it, but let's finish with maltose first.


    My discovery to maltose started with reading nutrition facts of organic, i.e. perceived “healthy”, candy saying “rice syrup”. Reading further, I found out that it’s a sweetener produced by breaking down starch of the whole brown rice. The traditional way to produce the syrup is to cook the rice and then to add small amount of sprouted barley grains - something that I should definitely try at home some time. Most of the current production is performed using industrial methods, as one would expect.

    The outcome is, again, sweet, empty calories, for good and for bad of it. Traditionally prepared syrup can contain up to 10% of protein, however it’s usually removed in industrial products. Other than that, again, - empty calories.


    Without further adieu, let's get to sucrose, most common of all sugars. Since Wikipedia has quite good and succinct article on sucrose, I will only mention topics that particularly thrilled me.

    Note: Interestingly enough, before introduction of industrial sugar manufacturing methods, honey was the primary source of sweeteners in most parts of the world.

    Humans extract sucrose from cane sugar from about 500BC. The process is quite laborious and involves juice extraction from crushed canes, boiling it to reduce water content, then, while cooling, sucrose crystallizes out. Such sugar is considered Non-centrifugal cane sugar (NCS). Today processes are quite optimized and use agents like lime (don’t confuse with lemon), and activated carbon for purification and filtering. The result is raw sugar, which is then further purified up to pure sucrose and molasses (residues).

    In 19th century, sugar beet plant joined the sugar party. Slightly different process is used, but it also results in sucrose and molasses. Beet’s molasses are considered unpalatable by humans, while cane molasses are heavily used in food industry.

    While it’s generally agreed that regular white sugar (sucrose) is “bad”, in recent years there is trend to substitute it with various kinds of brown sugars, which are considered healthier. Let’s explore what brown sugars are.

    Brown sugar is a sucrose based sugar that has a distinctive brown color due to presence of molasses. It’s either obtained by stopping refinement process at different stages, or by re-adding molasses to pure white sugar. Regardless of the method, the only non-sugar nutritional value of brown sugars comes from their molasses, and since typical brown sugar does not contain more than 10% of molasses, its difference to white sugar is negligible, nutrition wise. Bottom line - use brown sugars, e.g. demerara, muscovado, panela, etc. because you like their taste and not because they are healthier.

    This leads to conclusion that molasses is the only health-beneficial product of sugar industry. The strongest, blackstrap molasses, contains significant amount of vitamin B6 and minerals like calcium, magnesium, iron, and manganese, with one tablespoon providing 20% of daily value.

    The only outstanding detrimental effect of sucrose that I have discovered (compared to other sugars) is its increased effect on tooth decay.



    Heating sugars, particularly sucrose, produces caramel. Sucrose first gets decomposed into glucose and fructose and then builds up new compounds. Surprisingly enough, this process is not well understood.

    Inverted sugar

    Inverted sugar syrup is produced by splitting sucrose into its components - fructose and glucose. The resulting product is alluringly sweet, even compared to sucrose. The simplest way to obtain inverted sugar is to dissolve some sucrose in water and heat it. Citric acid (1g per kg of sugar) can be added to catalyze the process. Baking soda can be used later to neutralize the acid and thus remove the sour taste.

    Sucrose inversion occurs when preparing jams, since fruits naturally contain acids. Inverted sugar provides strong preserving qualities for products that use it - this is what gives jams relatively long shelf life even without additional preservatives.