Monday, June 1, 2015

AWS Elastic Beantalk is amateur?

My friend asked me to help him to setup AWS Elastic Beanstalk for his python app. I worked with this service couple of years ago and it was interesting to see how it developed since then. Good news - Python 3.4 is supported! But then I started to uncover some grim stuff.

Changing Apache config is a nightmare

Want to change Apache config? Particularly /etc/httpd/conf.d/wsgi.conf? - Its the one that launches your app and is autogenerated. So to change it:
  • Create config file under .ebxtensions, say app.config
  • Add there
    commands:
      create_post_dir:
        command: "mkdir -p /opt/elasticbeanstalk/hooks/appdeploy/post"
    
    since the only way to hook after wsgi.conf has been generated is to hook into hooks
  • Create wsgi-aux.conf somwhere - I used just app directory. In this file you'll put additional directive for Apache
  • Now add some more stuff to your ebextensions config:
    commands:
      rm_old_fixer_cron:
        command: "rm -rf /etc/cron.d/wsgi-fixer.bak"
    files:
      "/opt/elasticbeanstalk/hooks/appdeploy/post/99_wsgi_conf_aux.sh":
        mode: "000755"
        owner: root
        group: root
        content: |
          #!/usr/bin/env bash
          grep -q 'Include /opt/python/current/app/wsgi-aux.conf' /etc/httpd/conf.d/wsgi.conf || sed -i.back '/<VirtualHost/aInclude /opt/python/current/app/wsgi-aux.conf' /etc/httpd/conf.d/wsgi.conf
          /usr/bin/pkill -HUP -P $(cat /opt/python/run/supervisord.pid)
        encoding: plain
      "/etc/cron.d/wsgi-fixer":
        mode: "000644"
        owner: root
        group: root
        content: |
          # From time to time it happens that from time to time wsgi.conf is regenerated after our hook is run, especially on fresh boot.
          # Fixing it the hard way :(
          * * * * * root grep -q 'Include /opt/python/current/app/wsgi-aux.conf' /etc/httpd/conf.d/wsgi.conf || /opt/elasticbeanstalk/hooks/appdeploy/post/99_wsgi_conf_aux.sh
        encoding: plain
    

And that's all! (*Sarcasm*) Pretty trivial to figure hour, huh? Especially the last pkill ... part. Be ware that Elastic Beanstalk runs apache using its own supervisord scripts, so if you run service httpd reload it will happily tell you [ OK ], but nothing will actually happen.

Now, the above is only for adding stuff. To change existing settings you'll need to craft even more scripts.

It rotates logs ONCE A DAY

Yes, this is the case - plain old logrotate in /etc/cron.daily/logrotate. Elastic Beanstalk instances comes with 7.8G root partitions. On 100 QPS server, with average log line size of 1k, you'll hit this before 24h. Yeap... run you app, and logs will bloat your disk. To fix it, I've used rotatelogs tool together with Apache pipe logs. So again
  • Create file compresslogs.sh in your app folder containing:
    #!/bin/bash
    
    # rotatelogs runs us with   when file is rotated
    # and just with  every time new log file is open. 
    # We only run in the former case.
    ROTATED_FILE=$2
    if [[ "$ROTATED_FILE" != "" ]]; then
        gzip -f $2
    fi
    
  • Add the following lines to aforementioned wsgi-aux.conf
    CustomLog "|/usr/sbin/rotatelogs -p /opt/python/current/app/compresslogs.sh -n 10 logs/app_access_log 20M" combined
    ErrorLog "|/usr/sbin/rotatelogs -p /opt/python/current/app/compresslogs.sh -n 10 logs/app_error_log 20M"
    
  • EB's logrotate script will still try to rotate everything under /var/log/httpd. I.e. it will create additional copy of every log file created by rotatelogs. To disable this, add the following entry under the files session of config file:
      "/etc/cron.d/kill-http-logrotate":
        mode: "000644"
        owner: root
        group: root
        content: |
          # Get rid of EB logrotate for http that blows up our disk space
          * * * * * root rm -rf /etc/logrotate.d/logrotate.elasticbeanstalk.httpd.conf
        encoding: plain
    
    This is a bit ugly, bulletproof.
Again, hmm... easy :( Something like this should come standard! No production server should rely on a pace of the logrotate cron.

Load balancer IPs are logged by default

Looking at your access logs and wondering why you get so much traffic from only a couple of IPs? - Well, those are load balancer's IPs. And they are probably not what you are after. So, lets use our handy wsgi-aux.conf again:
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" custom
This is similar to the standard definition of combined from main Apache config, but with host parameter changed.
NOTE: Change your CustomLog line in wsgi-aux.conf to use custom log format instead of combined.

Wrapping up

This service is supposed to help devops-less people to get rolling fast. While it definitely lets you to start quickly, watch it carefully - with default settings it will let you (literally) down pretty quickly.