Blog von Slagwerks

Ruby & NoSQL @ Vermonster

Update: Vermonster has a nice recount, chock full of code & explanations.

A fine time was had the other night in the offices of Boston’s Vermonster, when a few Vermonsters generously helped some folks from Boston.rb get up to speed on the use of some NoSQL projects from Ruby.

Up until now, I’ve been a little leery of NoSQL. Probably due to painful past experience with ZODB failing to keep up with moderate loads, and reading too many Philip Greenspun essays at an impressionable age. Happily, it appears that the projects collected under the NoSQL banner can actually walk and chew gum at the same time, without rendering your data unreasonably inconsistent.

The whole question of the Consistency of one’s data is addressed by the CAP theory, which I understand to roughly say

Consistency, Availability, Partitionability: pick (at most) two, particularly under certain challenging conditions such as running Google or Amazon.

Even if you aren’t running something quite that big, there seem to be some situations where you’d want to think about this stuff – for example, running an app on Google’s App Engine (right? Haven’t yet myself.) Plus, all the cool kids are into it.

We worked with the locally-written Riak (looks like it’s the topic of the April Boston.rb meeting!) and with CouchDB. Both are ridiculously easy to get running locally, have Ruby client libraries, and are powered mainly by Erlang, with javascript Map/Reduce. For the latter, we used the couch_potato library, which seems to do a nice job of writing your javascript for you in the most common cases.

We wrapped the evening up with a coding challenge. My brain was fried & I gave up 2/3rds of the way through, but still had a blast & learned plenty. As a side benefit, beyond the exposure to the NoSQL, my state-of-the-art-circa-2008 Ruby habits got challenged by working with RSpec, 1.9.1, and RVM, all of which will should prove handy for future things.

Big ups to Vermonster for hosting, feeding, and educating us. They are good guys, skilled teachers, and have excellent taste in beverages.

Snow Leopard Still a Mixed Bag

I’ve been trying out Mac OS 10.6 a.k.a. Snow Leopard for a few weeks now. For the most part it looks and acts… just like Leopard! Still, I have run into the following annoyances:

  • Doesn’t really want to do more than one thing if you only have 1 GB RAM, very noticeably worse than Tiger in this regard (never ran Leopard much on only 1 GB).  I guess there are more ints running in the OS & in basic apps than I would have thought, if it is the 64bitness to blame.
  • Doesn’t work with our older b/g Airport Extreme. Says it’s on the wireless network, but doesn’t configure TCP/IP settings – this is after much experimenting with various Airport settings. Search for ‘snow leopard wireless’ for a variety of related complaints.
  • Doesn’t work with the Citrix XenApp web plugin. To be fair, this seems to be due to Citrix expecting Java 1.5 to be installed, which is kind of lame. Workarounds are reported on the internets, but then you’re managing your own Java installation, which seems to be one of the most vulnerability-plagued pieces of OS X.

My conclusion, as of 10.6.2: no reason to upgrade from Leopard, unless you’ve bought brand-new hardware that requires SL.

Latest 201 CMR 17 Hotness

You could be excused for having missed the news, but the 201 CMR 17 that was just about to go into effect over a year ago… is now just about to go into effect!

some tidbits:

Fortunately, there doesn’t seem to be anything particularly unreasonable in the requirements, so organizations following good data security procedures shouldn’t have to do much work (if any) to be compliant.

    Testing Backups

    I’m putting together our backup testing plan, and marveling at the suggestions in Preston’s Backup and Recovery. Here’s my paraphrase:

    • restore many single files
    • restore older versions of files
    • restore entire drive / filesystem, compare to original (same size? etc.)
    • recreate entire system
    • pretend a given backup volume is bad, use alternate
    • restore without touching backup server (as if it were destroyed)
    • include database restores, inc. database at different point in time
    • dream up painful scenarios with pessimists, test for those regularly

    To actually do these tests, he suggests making a list & randomly picking a subset to test on a monthly basis.

    Fun, huh? Beats holding the bag when your organization’s vital data goes missing.

    Python Script for Importing Maildirs to Gmail

    In fact the script in question should work also for mboxes and for other SMTP servers, but maildir-to-gmail was the problem I was trying to solve.

    The most promising starting point was an old script by Mark Lyon. After a little rejiggering so I could see what error was coming back from Google, I made a couple of more tweaks to use TLS & to take the user’s password.

    If anyone’s interested in what seemed to me a strange hoop to hop through before connecting, check the src.

    Considering How to Reliably Jam Stuff Into FileMaker From the Web

    I’m sure I’m not the only person with this situation:

    1. FileMaker database sitting behind a firewall (though similar issues would pertain for other internal databases / services)
    2. Website hosted elsewhere (i.e. other side of firewall)
    3. Need to get data from #2 to #1 reliably and securely

    Up until today, I’ve only had one instance of #2 in this situation. I dealt with it by storing data collected on the website (which happened to be written in Rails) in a database on the web server, and then running a periodic PHP script on the FileMaker server that connects to the Rails app via phpactiveresource, pulls in pending data, and inserts it into FileMaker via its PHP api.

    That instance was such a roaring success that the requests have been pouring in for more of the same. Some of the new requests will be handled by a site running PHP, so I’ve got a bit of rewiring to do – I can’t see any sense in the getting the data from the PHP app into something the Active Resource client can talk to.

    Stepping back and looking at the bigger picture, issues here include:

    • the connection from the website to the FileMaker server could be down, so data collected by the website needs to be stored until it can be confirmed to have made it to FileMaker.
    • it would be nice for this to happen in a timely fashion
    • multiple technologies on the web side (PHP & ruby) are going to be collecting data to be submitted to FileMaker, so it’d be nice if the transfer machinery can be agnostic and just accept JSON or XML or something.

    Sounds like a problem for a queue system, huh? So my current plan is to run a beanstalkd instance on the webserver, deposit JSON-endocded data into it from the web sites, and run workers that write to FileMaker using the Ruby FM API. I have no experience with beanstalkd, but a bit of googling suggests that it’s at a nice point in simplicity to configure & run, maturity, light weight, and easy access from PHP & Ruby.

    A further benefit of working in beanstalkd is that, based on a quick perusal of the recommended Rails integration, it should be really easy to break Observers out to async code, thus making my rails apps snappier.

    Any advice to the contrary is of course welcome. I’ll try to remember to update y’all on how this turns out.

    Custom Flickr Sidebar via Wget, Cron & PHP

    The new site launched with a sidebar that shows two random photos from our flickr account, using their javascript widget. This was a great way to get things going, but now we’ve developed slightly more involved needs and I’ve had to come up with a custom solution.

    Getting the list of photos

    You need a flickr API key, which is quick & easy to get. Then wget & cron to get ‘em: wget --quiet 'http://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=YOUR_API_KEY&user_id=YOUR_USER_ID&tags=website&per_page=500' -O photos.xml

    Note that this includes a tags argument. The thing that pushed me to switch the workflow was the desire to be able to upload photos to our flickr account that don’t necessarily fit into the sidebar format, such as panoramics. To handle this, everything that belongs on the website gets the tag website, and we only fetch those ones. We’ve also talked about just getting landscape oriented photos, but haven’t implemented that.

    I’m running this daily, which is plenty often to update the available photo list. I believe this gets the newest 500, which seems more than adequate, particularly since we don’t have close to 500 photos yet.

    Parsing the list & generating the HTML

    PHP5’s SimpleXML is pretty nice – here’s what we’re doing:

    try {
      $xml = new SimpleXMLElement(file_get_contents('photos.xml'));
        $number_of_photos = count($xml->photos->photo);
        $displayed_photos = array();
        array_push(
          $displayed_photos,
          $xml->photos->photo[rand(0, $number_of_photos - 1)]);
        array_push(
          $displayed_photos,
          $xml->photos->photo[rand(0, $number_of_photos - 1)]);
        foreach ($displayed_photos as $photo) { ?>
      <div>
    <?php
          print "<a href=\"http://www.flickr.com/photos/8562013@N07/" .
            $photo['id'] . "\"><img src=\"http://farm" . $photo['farm'] .
            ".static.flickr.com/" . $photo['server'] . "/" . $photo['id'] . "_" .
            $photo['secret'] .  "_m.jpg\" alt=\"" . $photo['title'] . "\" /></a>";
    ?>
      </div>
    <?php
     }
    } catch (Exception $e) {
      error_log("flickr badge had some troubles: " .
        $e->getMessage());
    }

    This snippet takes my laptop less than 1/20th of a second to run from the command line, which suits me fine. The actual code sits in page.tpl.php.

    Flickr’s API docs, in particular the API Explorer, were awful handy in figuring this all out.

    Joe’s Hacky Approach to Getting Arbitrary Behavior Out of Drupal

    Wherein I disqualify myself for work as a “drupal developer”

    Self indulgent history part

    Way back in 1999 when I started out writing web apps, Embperl seemed like the logical tool to use – so I’m used to having direct access to every step in the request-to-response process. In years since, I’ve progressed through a bunch of different tools and approaches, which have included writing a few rudimentary content management systems and using a bunch of prexisting ones.

    Over the last few years I’ve become a big fan of ruby on rails, so if a site does much of anything (simple, recent example: The Food Project’s online summer program application), that’d be my starting point.

    so, why use Drupal?

    Even so, when a site is basically about some content that’s written and edited on an ongoing basis by other folks, Drupal starts looking like a reasonable tool. It’s pretty easy to get it to generate semantic output with reasonable URLs, and the editor-facing UI does a fair job of offerring a large degree of control and customizability without being overwhelmingly complex.

    Despite having done some custom work on a few live Drupal sites, I still get confused about the best way to do some of the arbitrary web interaction tasks that were so straightforward back in the Embperl days. There are a billion modules in the Drupal ecosystem, but I often find that it takes more time to figure out if there is an appropriate one for my task & if it works with the current version of Drupal than it would to code up a solution on my own. Also, while I’ve made several attempts to solve programming tasks in what I understand to be the Drupal way, wrapping my head around all the relevant APIs (which can totally change every 6-12 months, with each Drupal release) similarly tends to take more time than bypassing Drupal’s functionality & handling things in Plain Old PHP.

    Here’s a common approach I’ll take, then. I have no interest in extensive coding through a web browser (i.e. putting a bunch of PHP in a node or block), so I’ll set up a Drupal module to hold my application’s arbitrary php functions. Then I’ll create a page at the desired URL and enter a line of PHP to route the GET and/or POST to my code, as appropriate.

    Example

    I’m working on an email list signup that needs to communicate with a FileMaker database. An HTML form POSTs to http://thefoodproject.org/mailing_list_signup, which is just a Drupal page using the PHP input filter (which you now need to turn on via admin settings) containing <?php tfp_process_list_signup($_POST); The code for that function lives in sites/all/modules/tfp/tfp.module, and does the normal PHP things to pull information out of the POST & hand that info off to the FileMaker database. Here’s a bit of it:

    function tfp_process_list_signup($POST) {
      $reply = '';
      $sent_plausible_address = FALSE;
      if ($POST['email']) {

    $supplied_email = $_POST['email'];
    if (preg_match(PLAUSIBLE_EMAIL, $supplied_email)) {
      $sent_plausible_address = TRUE;
      $reply = "Thanks! We'll add your to our mailing list.";
    } else {
      $reply = "'$supplied_email' doesn't look like an email address to me. ";
    }
    //. . .
    

    } return $reply; }

    On the upside, this is quick & easy. The downside with this approach vs. writing a module the Drupal way is that I’m losing out on Drupal’s form API and its associated validation, antispoofing, etc. services. On the upside, last time I checked the linked documentation for the form API, it said

    Warning - this page has only been partially updated for the Drupal 6.x API Until it has been fully updated, reference this page as well: Drupal 5.x to 6.x FormAPI changes

    so I’m quite happy to not have to wade through all of that!

    Lazyweb: 5 Hr Layover in Amsterdam Airport?

    Any readers out there who know what options exist for a 5 hr layover in Amsterdam? This is before an international flight, so it’s not like I have 5 actual hours to play with. A (not this) Friday morning.

    Wikipedia claims it’s a 20 minute trip into the city, which suggests that some minimal tourism should be possible…

    Checking Auth in Apache Over LDAP With OS X

    Here’s the configuration I’ve been working on: control access to Apache webserver by checking (over LDAP) against our existing user database, held in an OS X Open Directory. It’s taken me more casting about than I’d expected, but it looks like I’m finally there.

    In the beginning, I got a little confused by the HTTP auth options. I’d been hoping to use Digest mode, but a comment on this post points out the logical problem with that: Digest doesn’t involve the password making its way to Apache, so there’s no way for it to pass the password along over LDAP.

    BTW this is under Tiger (OS X 10.4) – I’m not sure if anything changes with other versions of OS X.

    Once figuring out that I did need to use Basic auth, Production Monkeys got me most of the way with my Apache config. What I missed is that, at least with our OD configuration, it’s necessary to include the server name in the dc list. Here’s what worked for me:

    <Location "/somewhere"> AuthType Basic AuthName "Whatever You Call This Auth" Require valid-user AuthBasicProvider ldap AuthLDAPURL ldap://servername.yourdomain.org/cn=users,dc=servername,dc=yourdomain,dc=org?uid AuthzLDAPAuthoritative off </Location>