Log in

No account? Create an account

JavaScript Validation and Compression

Feb. 11th, 2007 | 06:38 pm

I know it, you’re an AJAX geek. Just like me. You think it’s the next Win32, or GTK, or .NET, .ABC, .DEF, or .whatever. There’s one problem, though: Web 2.0 sites are not just about AJAX. They are about JavaScript and client-side code, as much as they are about client-server interaction.

Look at Google Analytics. Their sexiness comes from their gorgeous client-side Flash, not because they get stuff asynchronously from the server. Look at Google Maps, the original Ajax app. You may say “#$@%, what are you talking about: Google Maps are all about the async stuff!”. No, Google Maps are about drag-and-drop. If you needed to click the stupid “left arrow” button to get the map of what’s to the west, you wouldn’t care that only a part of the screen is being refreshed.

One of my friends frequently makes me repeat the following phrase: “Ajax is not a silver bullet”. He’s so damn right.

What is the silver bullet? Beautiful user models. Intuitive interactions. Visual effects on-par with rich client applications (fade in/out, cinematic expand/collapse, sexy highlight, animated movements). And if you want to get those in your Web 2.0 app, you need to do some major JavaScript work.

Yes, there are platforms out there that will help. I won’t talk about those here. Instead, I’ll focus on two categories of tools that I found to be really helpful in my JavaScript development efforts lately; these two aren’t talked about much, to my sadness.

JavaScript Validation

JavaScript is a beautiful language, too bad it’s abused too often. An inexperienced developer can write something (== vs ===, or “var vs no var”), not even know what the difference is, and have it work most of the time. And that’s the problem. JS is a language without a standard compiler, without type checking, without normal IDE’s. This means that JS developers have to be really careful; but humans make mistakes, and they need tools.

I found one great tool to do JS validation; I use it to do automatic code review of my JavaScript files before I check in. The tool is called JSLint; here’s the DOS batch file snippet I use:

echo Verifying Javascript files using JSLint:

FOR %%f IN (*.js) DO ( 
   echo Processing %%f..
   cscript //B //E:jscript ../tools/jslint.js < %%f

If something’s wrong with any of your js files, JSLint will complain, pointing out the exact line and type of issue, for example:

Lint at line 298 character 23: Use '!==' to compare with 'null'.
if(el.currentStyle != null)

JavaScript Compression

When you have lots of JavaScript (and man do I do), you need some way to compress it. The less bytes each client has to download, the snappier the load performance is; this also helps to scale your server to more users.

There are a few ways to compress your JS: the easiest one involves removing all comments and whitespaces (browsers don’t usually execute those :-)). More complex ones involve renaming variables and methods to shorter names: this helps if you’re like me, who loves using really descriptive variable names, even in JS. Others do obfuscation, which compresses the JS further, and prohibits humans from easily parsing your client-side code.

I’ve used several tools in the last few months: one is called JSMin; I believe it’s written by the same person who wrote JSLint. This tool does everything except obfuscation. I’ve had good experience with it, but have come to use a different tool as it turned out to be a tad more powerful: a tool called Packer from http://dean.edwards.name/packer/.

Here’s the way I use it:

1) I have many .js files, a few of which were written by third parties (those are in separate .js files to keep me sane). Before compression, I append all of these files together into a single temp.js file to achieve higher compression ratio.
2) Compress the temp.js file using Packer.

Here’s a snippet from my build script (DOS batch file):

echo Merging all javascript files into one...
SET MERGED_FILE_NAME="compressed\\core.js"


FOR %%f IN (*.js) DO (
   echo. >> %TEMP_FILE_NAME%
   type %%f >> %TEMP_FILE_NAME%
   echo. >> %TEMP_FILE_NAME%

echo Compressing Javascript files using Packer..

CScript /nologo ..\devtools\packer\pack.wsf %TEMP_FILE_NAME% >> %MERGED_FILE_NAME%


Some stats for your mathematical pleasures: uncompressed JavaScript for the cocktail builder takes up 91.1KB. Compressed, it’s 33.2KB. I’m sure you can calculate the square root of the rate of change of flow of these kilobytes, if transmission is happening on the equator at the top of a 1500ft mountain.

Whatever. Enjoy your JavaScript :-)

Link | Leave a comment |

SQL Injection Defense

Feb. 8th, 2007 | 06:31 am

We all heard of it. Sites going down because of SQL injections. Embarrassed admins. Credibility loss among users. Really simple concept – if your server-side script takes user input verbatim, and issues SQL statements based on it, you will be screwed. There are great scripts out there that will screw you automatically, without personal involvement of someone evil.

SQL injection is scary. Let’s talk about ways you can (and must!) protect your web app against it.

1. Verify data types of all non-string user inputs
Here’s what I do: create a separate class that does all communications to the database. I call it DataRetrieval. In that class, every method just gets you some data from the DB (i.e. executes a SQL query given some parameters), for example,

class DataModification { 
   public static function saveCocktailFeedback 
     ($cocktailID, $starRating, $userInfo)
     { ... }
  public static function saveSiteFeedback($feedbackText, $userInfo)
     { ... }
  public static function registerUserSession($userToken) 
     { ... }

Make it a habit to verify each and every non-string parameter in the method body before doing any SQL commands, for example:

public static function saveCocktailFeedback
   ($cocktailID, $starRating, $userInfo) {

   if (!is_numeric($cocktailID) || !is_numeric($starRating)) {
      // someone is trying to hack us; do something (more below)


2. Escape all string inputs

On MySQL, this is as simple as

$str = mysql_escape_string(stripslashes($str));

I usually create a wrapper class for working with the database (even when I’m using a prepackaged DB class as excellent as ezSQL), and define a function called DB::escape(), that does exactly that.

3. Keep track of security violation attempts

If someone’s trying to hack you, it would be nice to know (1) when it happens and (2) who was it (3) how hard did they try

Remember the “someone’s trying to hack us, do something” comment in suggestion 1? Set up a function similar to the following

public static function log_security_event() {
   // get stack trace into the $trace variable

   $sql = "VALUES INSERT INTO securityevent (Trace, IP)
VALUES ('" .DB::escape($trace) . "', '"

   // execute the statement

This way, when someone tries to hack you, you know exactly what they tried to use, and you can see patterns in their search; you could then determine whether you’re dealing with an input error (oops, someone typed in a quote in a field accidentally) or an intentionally malicious user. You can even gauge the experience level of the hacker by the queries he tries to issue.

This function would fit very nicely into the DB wrapper class that I mentioned earlier.

4. Defense-in-depth: limit SQL user rights

Your PHP application issues a ton of queries to the production database. Poor database, I tell you. If I had to do all those queries, I’d just spit out junk and see if the app server can handle it :-).

Now, seriously: if the attacker was somehow able to run queries on your MySQL server, you want to limit the impact. For example, in the cocktail builder, PHP code should be able to query the known-and-sexy list of cocktails (read only! not modify!); report security violations (write only – only insert, not even modify; no reading allowed, either), and track and report on user feedback around cocktails (read and write).

So, as a part of your build script, you need to set the most-restrictive-possible permissions for the MySQL user that your PHP script will be running as. Here’s what mine looks like, abridged:

/* By default, all tables can be read, unless revoked below */
grant SELECT on cocktailbuilder.* to cbuser@localhost identified by 'xxxxxxx’;

/* Only the following tables can be modified, and only by performing
an INSERT (not UPDATE!). Note that some tables (i.e. securityevent)
will have INSERT, but not SELECT rights.
grant INSERT on cocktailbuilder.rating to cbuser@localhost;
grant INSERT on cocktailbuilder.securityevent to cbuser@localhost;
grant INSERT on cocktailbuilder.userfeedback tocbuser@localhost;

/* Some tables aren’t meant to be read by the app, only the DBA */
revoke SELECT on cocktailbuilder.securityevent from cbuser@localhost;
revoke SELECT on cocktailbuilder.userfeedback from cbuser@localhost;
flush privileges;

Unfortunately, this script can’t be generated automatically (mysqldump can’t dump permissions – at least I don’t know of a good portable way to do this), so you just have to keep this file updated by hand. Good news is that this shouldn’t be too hard: you don’t add tables to your DB that often, do you?

Oh, and one more thing: if you have several applications running on a single MySQL box, make sure that each app gets its own login (like cbuser in my sample script above), and that the logins have zero permissions outside of the designated database. This way, if a hacker screws up one of your apps, others stand a chance.

5. Defense-in-depth: production database backups

This is good against hackers, fires, and your cat chewing the network cable while database sync is happening. Do the (full, not incremental!) database backups regularly, and keep the dumps on a different box (or, even better, at a remote site).


Link | Leave a comment {3} |

AJAX Podcast

Feb. 1st, 2007 | 05:06 pm

I've been quite hooked to podcasting in the last few weeks - mainly because I've been taking the bus to work (Seattle's weather is, ahem, weird this winter), and the best thing to do on the bus is to try to create an interesting enough shell around yourself that you don't notice the 150 people squeezed in a little box around you, and don't hear the banging of the chains against the nearby cars.

Funny thing is, I found a very cool podcasting series that's been open for over a year now; it's hosted by folks running ajaxian.com. They are fun, very knowledgeable, and their sole focus is AJAX and a tiny bit of JavaScript. The guys bring in influential people from the industry, and interview them; other of their podcasts are about "the state of AJAX" - new tools and best practices. Here's their podcast URL: http://media.ajaxian.com

Link | Leave a comment |

Developing Against a Live Database

Jan. 24th, 2007 | 08:58 am

All right, friends, it’s time for some serious development talk. You and a couple of your friends are building the next it, the Web 2.0 app of the future. You have a source control system set up, you even have a bug tracking system, you even have a spec and someone (not you!) who tests your software (may I remind you of the famous Joel Test?). You’re heading for success.

Your web app has a database back end, and you already have a few users out there, playing with your system; their data lives in your “production” database. You also have local dev boxes, with complete copies of the production environment, with development versions of the database. Anyone can mess with the development databases without causing havoc to real users’ data.

If you don’t have any of that, don’t read this article. Instead, go read Joel Spolsky’s blog or book “Joel on Software”. He’ll explain why all of these pieces are necessary.

If you do, however, you’re on the right track. But.

There’s always a “but” (or a butt?).

How do you propagate the changes to the database schema, if there are any, to the production version of the database? You’re still doing development on the system, schema and data are changing with each build, you can’t afford a DBA to do crazy DIFF’s between your dev database and production database…

Simple, and flawed, approach

You already have a version control system. Let’s assume you’re using MySQL as your database (I am). You can just use mysqldump to dump the schema and data of your development database, and check in those .sql files into source control. Then, your build script just drops the production database, and recreates it from scratch using the .sql files from the source tree.

Note that you definitely want two dumps: one for the schema, and one for the data. This way it’s much easier to track schema changes in your source control system.

Only one, but very big problem: user-generated content gets lost.

You may say “copy over user generated content to the dev box before checking stuff in”. But the build isn’t instantaneous, and everyone who’s been doing something on your site during the time between your “sync” and the build will lose their data. Not good.

You may say “screw user content”. No, you didn’t say that. I just heard it from some weirdo in the back of the room. Screwing your users is a safe way to lose early adopters, who are the only way for you to deliver a quality app fast. If you don’t get feedback from those enthusiasts (and you won’t if you piss them off by continuously throwing away the data that they’ve put in), you’ll lose.

Complex, and successful, approach

Identify which tables in your database will contain user-generated content. For the cocktail builder site, these are the mainly the tables with user ratings and user actions, and table for reporting crashes.

Your build script will no longer drop the entire database and rebuild it from scratch; it will drop all tables except for tables with user content. That is, as a part of the check-in process, split your output .sql files as following:

schema.sql: DDL for all tables except those with user-generated content
schema-extra.sql: DDL for tables with user-generated content
data.sql: actual INSERT statements for all tables other than those with user-generated content
data-extra.sql (optional): you get the drill. This is optional because there’s not always a point in including user-generated data into your build – just back it up on the production database side, and sync it to the dev boxes once in a while.

Your script to dump the dev databases may look like this file below (this is a batch script – I do my development on a Windows box; please don’t flame me, I’m built of a fire-retardant):

@echo off

REM Full path to the MYSQL utilities. Required.
SET MYSQLDUMP="C:\xampp\xampp\mysql\bin\mysqldump"
SET MYSQL="C:\xampp\xampp\mysql\bin\mysql"

REM Database settings

cd ..\setup_sql

REM Producing schema and data files for the build. They are
REM called "schema.sql" and "data.sql".
REM Using MySQLDump. Path is specified in the variable above.

REM ----- Delete old versions of the files.
del schema.sql
del data.sql

echo Generating core schema and data files...
REM ----- Using settings specified in the configuration file, ignore
REM ----- a few tables that shouldn't go into the incremental build
REM ----- (user-generated content should not be discarded on upgrade).
REM ----- The schemas for those tables will go into a separate file
REM ----- that can be used to initiate a "full from scratch" build.
REM ----- Content for those is not a part of the build.
SET ResultSkipString=
SET ResultIncludeString=--tables
echo The following tables will not be included in the core build:
FOR /F %%t IN (userContentTables.conf) DO (
   echo - %%t
   SET ResultSkipString=!ResultSkipString! --ignore-table=%DBNAME%.%%t
   SET ResultIncludeString=!ResultIncludeString! %%t

REM ----- Dump the actual SQL schema/data creation scripts...
SET CommonDumpSettings=--comments=false --user=%DBUSER% --single-transaction --no-set-names --skip-add-locks -B %DBNAME%
"%MYSQLDUMP%" --result-file=schema.sql --no-data %ResultSkipString% %CommonDumpSettings%

echo USE `%DBNAME%` > schema-extra.sql
"%MYSQLDUMP%" --no-data %CommonDumpSettings% %ResultIncludeString% >> schema-extra.sql

"%MYSQLDUMP%" --result-file=data.sql --no-create-db --no-create-info --skip-extended-insert --skip-disable-keys %ResultSkipString% %CommonDumpSettings%

echo Done

And the mentioned file userContentTables.conf is just a newline-separated list of table names that contain user content.

Have fun!

Link | Leave a comment {2} |

AutoComplete: Which items to show?

Jan. 21st, 2007 | 05:17 am

Some time ago, I wrote how freaking cool the auto-complete system is for ingredient entry. If you don’t believe me, go take a look at its implementation – I actually have it done. For the lazy ones – folks like me - it ended up looking something like this:

Sexy, huh?

Yea, whatever. I know what you’re gonna say – I’ve seen this a billion times.

You have, you’re right. But one thing you most likely never thought about when you used a similar sexy-auto-complete textbox: which matches show up?

Obviously, you want to show ingredient names that have the stuff that the user typed in as a substring. So, user types “vo”, you show “vodka” and “absolute vodka”. But you don’t want to show “Cuervo tequila”, do you? So, rule #1: only show stuff that has user’s substring as the beginning of the word.

Now, the more interesting question: how do you want the suggestions ordered? Few options here:

  1. Alphabetically. Simple to implement, easy to understand, cool. One problem: if there are 50 items that match (as there are with the substring “vo”), your auto-complete text box is not really useful.
  2. Depending on how frequently the item is used in cocktails. This is a pretty good measure of popularity; if there’s 100 cocktails that use orange juice, it’s probably because lots of people have orange juice in their bars, so you can infer that orange juice should show up before Stoli Orange for the query “ora”. However, this has drawbacks: what if your cocktail database is skewed? 
  3. Depending on the number of previous users of the system that have added this ingredient. This means that the system is getting smarter and smarter as the number of its users increases; this is definitely a good thing. Drawback: chicken-and-egg problem, you need initial users to make the system smart, and those initial users will suffer.

So far, I went with approach 2 – just because the site doesn’t have too many users yet (do let your friends know about it if you like it!), but I instrumented the tools necessary to know what ingredients users are adding, so I’ll switch to 3 at some point.


Link | Leave a comment {1} |

Generating Semi-Dynamic Content

Jan. 19th, 2007 | 10:33 pm

Remember how we talked about using a static list of ingredients for auto-complete, and downloading that static list to the client during first load? Well, that static list needs to come from somewhere; besides, it’s not really static.

Sometimes, a recipe will come along, quoting an ingredient that the system doesn’t yet have; all ingredients are stored in a database, and it’s relatively easy to add an ingredient there. But hey, we have that static lookup list now, how do we get that to work?

The “static” list is just a JavaScript file that looks similar to the following:

ingredientOptions = [ 
   new Ingredient("7-up", 8),
   new Ingredient("absinthe", 5),
   new Ingredient("absolut citron", 3),

Basically, it’s just a list of 500 ingredient names and their popularities (for sorting). This list changes relatively rarely (when new ingredients are added – I’d say once a week at most). However, it does change, so we have to have a way to generate it from the database (copy-paste is a bad idea :-)).

I just used a PHP script that talks to the database, asks for the ingredient list, and generates the entire JS file from scratch. Roughly, the code looks like this:

// Get the list of ingredients from the database
$data = DataRetrieval::getAllIngredients();

// Create the output string
$output = "ingredientOptions = [\r\n";
foreach ($data as $item) {
   $output .= "new Ingredient(\"" . strtolower($item->Name) . "\"" . $item->popularity . ")"; 
   if ($item != $data[sizeof($data) - 1]) {
      $output .= ",\r\n";
$output .= "\r\n];";

// Actually write the JavaScript file
$OUTPUT_FILE_NAME = ‘ingredients.js';
if (!$file_handle = fopen($OUTPUT_FILE_NAME,"w")) {
   die("Cannot open file");
if (!fwrite($file_handle, $output)) {
   die("Cannot write to file");

echo "Success. Output saved to " . $OUTPUT_FILE_NAME . "\n";

There are many other places in the cocktail builder where I’m using this “js generation” technique. For example, it’s known to be good to have all application UI strings separately from application logic and even presentation. It helps with several things:

1) Localization
2) Strings consistency

One more thing to note is that UI strings may be references in two completely different contexts: in .php files, where strings are used to generate server-side presentation piece, and in .js files, where strings are most frequently error messages or tiny things like the “no matches” string for auto-complete. Some weird strings may appear in either context. But, as you might have guessed, it would be really nice to have all UI strings live in one place, no matter where they are referenced – in client code or in server code.

So I created a table in MySQL, called UIStrings, where all strings will be stored an edited:

Then, I created two scripts – one to generate a server side include file strings.php.inc, and another one for client side strings: strings.js. These scripts are trivially different from the one quoted above.

Here’s a part of the listing of the generated strings.php.inc:

class UIText {
   const COCKTAIL_DETAILS_ACTION_ADD_TO_FAVORITES = "add to favorites"; 

In the actual PHP code, if I need to reference a UI string, I’ll just write


Similarly, the listing of the generated strings.js file looks like:

var UIText = [];
UIText.INPUT_BOX_HINT_TEXT = "add ingredient...";
UIText.INPUT_BOX_NO_MATCHES = "No matches";
UIText.INPUT_FEEDBACK_HINT_TEXT = "give feedback..";

And if I need to reference a UI string in JavaScript code, I’ll just write


One last thing to note is that these generation steps need to be completely automated – they need to be a part of your build or checkin processes.


P.S. I use the same “generated” .js for client-side usage statistics gathering, but that’s a different story. Let me know if you want me to share it ;-)

Link | Leave a comment |

Persisting User Data Across Visits

Jan. 15th, 2007 | 11:30 am

Almost every non-trivial website out there is facing a classical problem: persisting user data across visits (also knows as sessions). For your MySpace account, it's a list of your friends, and that clever joke you put into the profile to help replicate the DNA. For CocktailBuilder, it's a list of your ingredients and your favorite cocktails. You don't want to type them in every time, do you?

Now comes the big problem. How do you remember these in a seamless, non-intrusive fashion?

The obvious answer is a login system. Heck, go to myspace, the first thing you see is a logon box. You can't access almost any of the site's features without being logged in. Why is that? Because much majority of the site's information is private (others would get pissed if you saw their mailbox without their permission), so it's an absolute must to know who the user is before showing them their mailbox.

So, the plus of this approach is relative security of user data. Another plus: you can access your account from any computer. What are the minuses? You know already - you have to register for yet another darn site. Uhhhh. Pick a login. What's your email. Check that email. Click the link. Your account is now active, please log in. Next time, type the password again. Uhhhhhhh.

What's the alternative?

Modern, and not-so-modern, browsers provide a feature called "cookies". Essentially, it's a way for a website to store a piece of information on a user's computer, and access it later. All of this can be done seamlessly, without any explicit user action. For example, many websites store tracking cookies on your computer so that they can later on identify you and offer you customized services based on your previous actions.

What's good? Transparency. Plus it's reeeally easy to get the desired effect of "hey, they remembered!". Minuses: cookies are not secure. If you're a bank, don't do this. If you're a bank IT guy, you read this, and you were surprised, please realize that the sky is blue. Cookies can be stolen by other sites, if the browser has a security issue (and browsers have security issues in this area very frequently). Another minus: cookies are an ancient technology. It's not very pleasant to work with them from JavaScript (you have to serialize your data into a string less than 4KB; I'll talk about this at a different time). Another minus: cookie data is not accessible on other computers, so you wouldn't be able to share data between your home and work computers.

There are many websites that go with the third approach - combination of a login system with cookies, so that you only need to log in once a day, and throughout that day, your cookie serves as your identity. Think Amazon: when you come in, you see "Welcome, Bob (or whatever)". How do they know? The obviously don't store your IP address, that would be silly (IP addresses change for many consumers relatively frequently). They store a cookie.

So far, for CocktailBuilder, I decided to go with a plain cookies approach. There isn't any secure data in there - just a list of ingredients and favorite cocktails. Simplicity is key here. Yes, the "travel with my data" feature is missing. I'm not sure of the best way to cover for it.

One idea I had was "send my settings to another computer", which would generate an email with a hyperlink that has all of the cookie data. User would be able to send this to themself (or a friend), and get all of their settings on another computer. I haven't thought about this much, do tell me what you think.

Another idea - non-mandatory login system. You want to save your settings - register for an account. You want to load your settings from another computer - log in. Otherwise, everything else is done through cookies.


Link | Leave a comment |

New Build: 90 new cocktails, 110 new ingredients

Jan. 11th, 2007 | 10:04 pm

The title says it all. It's the first time I'm doing a "release announcement" like this, so I think I'll share the new things I added to the database this time around.

By the way, you may wonder, how did I prioritize the ingredients to add? There's been more folks visiting the site in the last few days - just look at the screenshot from Google Analytics below:

So I was able to analyze the things that they were looking for and didn't find, and added them to the database. Here's the list of new stuff:  

Cocktails Ingredients

1960s Black Devil
Absecon Fruit Punch
Absinthe Cocktail
Absolute Pineapple Martini
Admiral Nelsons Brew
Adonis Cocktail
Alabama Cocktail
Alaska Cocktail
Angels Kiss
Angels Kiss 2
Angels Wing
Bahama Mama 2
Barracuda Bite
Basil Haydens Mint Julep
Bermuda Triangle
Bo Diddley Shooter
Bourbon Sidecar
Brandy Alexander
Brandy Alexander Classic, 1930
Brandy Strudel
Brisk Blue Blazer
Bronx Cocktail No2 Dry
Candy Apple
Captains French Kiss
Chocolate Chip Cookie
Colorado Prarie Fire
Deach Desire
Dutch Orange
Exterminator Coffee
Green Monster
Irish Flag
Jamaican Me Crazy
Jealous Love
Keoke Coffee
Latino Last Scream
Lauries Favorite
Martini Cocktail, Dry
Martini Cocktail, Sweet
Midori Margarita
Mollys Milk
Mountain Altitude
Mountain Grog
Mountain Madness
Mrs. Greens LMC
Original Bushwacker
Original Manhattan
Pass It On
Perfect Manhattan
Perrier Lemonade
Pineapple Smooch
Piss In The Snow
Pousse Cafe
Santa Monica Splash
Sapphire Sensation
Sex On Mountain
Sex on the Slopes
Sexy Lexy
Sidecar Cocktail
Sidecar Special
Ski Tip Coffee
Stairway To Heaven
Stars And Stripes
Stroll Through The Woods
Sugar and Spice
Sun Burn
Sweet Kisses
Tic Tac
Tokyo Tea
Triple Kiss
Trouble In Paradise
Vitamine Maidas
Wake Me In An Hour
Ward Eight
Westin St. Francis Chocolate Martini
White Christmas
Wicky Wacky Woo
Wooly Mamouth

151 rum
7-UP or Sprite
Angostura bitters
anise liqueur
Applejack Brandy
Bacardi 151
Bacardi 151 rum
Bacardi Breezer Orange
Bacardi white rum
black olive
black tea
Bols Blue Curacao
brut Champagne
burgundy wine
California brandy
cham pagne
Chandon sparkling wine
chilled champagne
chopped strawberries
Coca Cola
coconut creme
coconut liqueur
concentrated simple syrup
creme de banana
creme de bananes
creme de cassis
creme de coconut
creme de monte
creme de yvette
creme yvette
crushed pineapple
DeKuyper Tropical Pineapple Liqueur
dr pepper
dry sherry wine
dry white wine
egg white
French roast vanilla bean coffee
Godiva white chocolate liqueur
green chartreuse
green olive
guava juice
heavy sweet cream
Hennessy Cognac
iced tea
Jack Daniels
Johnny Walker Black Label Scotch
juice from one lemon
Ketel One Vodka
lemon slice
lime cordial
lime zest
Malibu coconut liqueur
margarita mix
margarita mix (non-alcoholic)
Margarita mixture ( lemon lime mix )
Moet Rose NV (Non Vintage/house blend) Champagne
Opal Nera
overproof rum
parfait amour
parfait damour
Pina Colada Mix
pineapple chunks
pink grapfruit juice
pink lemonade
Pisang Ambon
pomace brandy
Que Pentas silver tequila
red dinner wine
Red vodka
rum, 151
Rumple Minze peppermint schnapps
Sauza Tres Generations Tequila
sparkling white wine
Sprite or 7Up
squeezed lime juice
Stolichnaya Gold vodka
Stolichnaya Limonaya
tabasco pepper sauce
Tanqueray gin
Ursus vodka
versinthe absinthe
white dinner wine
worchestershire sauce
yellow chartreuse
Yellow Galliano

My goal is to make a site that's useful for the community - and the only way to do that is to look into what folks are trying to do with it and failing, and enabling those things.


Link | |

Which Cocktails to Show?

Jan. 10th, 2007 | 08:48 am

All right, we’re trying to show the user what they can make from what they got in the bar. Cool.

We know what they got in the bar (see previous post); now, we better show them some useful cocktail recipes, or they’ll sue the hell out of poor me for false advertising :-). And hey, I gotta pay for hosting, no dollars to waste on legal fees, sorry.

[Side note: cocktailbuilder.com is currently hosted on my home server, an ancient Linux box - PIII 500MHz. 40KB/s upload rate. $30/month for cable internet with Millennium Digital Media – btw, wonderful ISP.]

All right, jokes aside. We need some alcohol here.

I (my second self), the semi drunk college student Jane, am holding a party. I go to San Jose State, so I managed to sneak out some lemons out of my auntie’s back yard, so these lemons end up on my ingredient list. Now back to cb, the poor site owner that’s about to be sued; he’s thinking whether to show 1940s blue blazer, a cocktail that involves lemon peel, to our tipsy Jane. One more piece of data: CB doesn’t know that Jane is so drunk that she wouldn’t be able to peel the lemon without converting the lemon into lemon juice.

So: do we show a cocktail that has lemon peel to a user that has lemons in the ingredient list?

Before you answer, let me ask you a different question. A cocktail called Purple Passion calls for Absolut vodka, but the user has Stoli. Do we show the cocktail?

Another one: a cocktail calls for a raspberry vodka, but the user has an orange-flavored vodka. Do we show the cocktail? What if the user has just “vodka”, and they didn’t specify the type of vodka they have?

Yeah. It gets complicated, doesn’t it.

For the first two questions (lemon peel vs lemon, Absolut vs Stoli), I’m guessing you answered “yes”, show the cocktail, let ‘em have some booze. For the last one (raspberry vodka vs orange vodka), you probably said “no, it will taste different”.

Crap. And I had such a good theory here. Now I’m totally gonna get sued, cause I don’t know how to build this :-)

Fine-fine, no more theatre, let’s talk about solutions.

Option 1: Define “homogeneous groups”: all cocktails that contain any light rum will show up if the user has any other light rum. That is, all members of the group are considered “synonyms” for cocktail-making purposes.

Option 2: Define a hierarchy of ingredients that looks something like the following:

If we let the user enter any one of the LEAF nodes as one of their ingredients (i.e. Stoli Orange, but not Orange Vodka), cocktail matches will become relatively simple. If a user holds a particular node, show all cocktails that include that node’s siblings or ancestors. For example, if the user has Stoli Orange, we’ll show recipes that include the following highlighted nodes:

However, this may become a bit constraining:

1) Users can’t just type in that they have some vodka in the bar. They gotta say exactly what kind it is. Make users suffer (i.e. stand up from their computer), and they will leave.
2) What do you do with recipes that call for stuff that doesn’t quite fit in one category, for example, what if a recipe says “use raspberry or orange vodka” (bear with me, I can’t imagine what the consequences of this substitution would be)? Our hierarchy would then have to become sort-of unclean, involving “virtual” nodes; these virtual nodes break the lovely tree structure, converting the whole thing into a directed graph:

Uhh, decisions, decisions… What’s your take?

Link | Leave a comment {8} |

Give me your ingredients..

Jan. 8th, 2007 | 09:50 pm

All right, we know what we want to build: a system that tells you the cocktails that you can make from what you got in your bar. I won’t get tired repeating this mantra until the website comes live, or I come dead, whichever comes first :-).

So, the first task: we need to find out what the user has in their bar. How do we do that?

The problem, really, is another incarnation of a very well-known issue: how do we let the user select several items from a predefined list? One major tweak: the list is freaking huge. Gi-normous. You know how many possible types of vodkas are out there? My database currently has 60, and that’s just vodkas! Altogether, we’re looking at something like 500 valid entries.

But there are also invalid entries! What if one of our users thinks that Chartreuse is spelled with an SH instead of CH?

So, taking all of these requirements into account, let’s look into possible solutions.

Option 1
The obvious solution: drop-down list box, also known as an HTML <SELECT> tag. Looks like this:

I’ll allow myself to remind my faithful readers that drop-down list boxes are generally known to be good with average-sized lists (for example, they are good for a list of states). However, they have quite a few limitations:

1) You need several dropdown controls to let the user pick several items. Each dropdown needs to have quite a bit of HTML associated with it – each entry needs an <OPTION> tag.
2) It’s pretty hard to navigate the dropdown that has more than, say, 100 items, by using a mouse: the little scroll bar is not quite usable: it’s too small.
3) Navigating entries using the keyboard is hard, too: up until Internet Explorer 7, typing “ora” with the keyboard focus on the above dropdown would take you to an entry that starts with “a”, instead of the desired orange juice.
4) Think of “Stoli Raspberry” vodka. Some users may refer to it as “Raspberry Stoli”, others – as “Stoli Raspberry”. Does this mean that we need two different items for the same thing in our dropdown?

Option 2
Multiple-selection list box, or just a regular list box with multi-select enabled. These two look like this:

Other cocktail sites that I looked through (for example, idrink.com, or cocktail.uk.com), use this kind of user interface.

You know that option 1 stinks – but there’s someone out there, at least cocktail.uk.com, that doesn’t understand that this option is almost just as bad. And that exact thing is one of the reasons why I feel obligated to make the life of all casual alcoholics better by creating the cocktail builder website :-).

Now, why is this option good from afar, but far from good?

1) If you’re using a list box with multiple-selection enabled: do you really think people know that CTRL-clicking will select several items? Plus, when you’re doing this CTRL-clicking voodoo, you’ll inevitably click outside of the control, which will lose all items you worked so hard on.
2) For the multiple checkboxes: a much better user model, but hey, I still need to scroll through an immensely long list, and there’s no good way for me to quickly go to an item (besides using the browser’s CTRL+F, but how is this better than just a text box for entry?)
3) Still no good answer for “Raspberry Stoli” vs “Stoli Raspberry”

Option 3
The holy grail of Web 2.0, Google AutoSuggest. Looks like this:

You’re probably expecting me to start screaming about how freaking cool this is, but I’ll let you decide this yourself :-). One of my favorite sayings goes: “Diplomacy is an art of letting others have it your way”.

Why is this cool?

1) Super-fast entry: type in “vodka”, and see all vodkas out there. Even if vodka is not the first word in the ingredient name, you can still show that option in your auto-suggest box (because you’re the one controlling the logic for what to show)
2) Almost no issues with scalability and real estate on the screen: it doesn’t matter how many items the user has to choose from; they only see options that fit the context of what they typed in.
3) Keyboard navigation: unlike previous options, you can enable up/down arrow navigation through the list of choices, offering enter/tab to actually pick an option. This makes the site much more accessible to people who need it. Recall Joel Spolsky, and design for users that can’t use do precise mouse movements, that don’t have time, that have their cats chewing the mouse cord and babies screaming in the room nearby. Users of my website will very likely be pretty drunk :)
4) Multiple entries require multiple auto-complete text boxes, but c’est-la-vie.

There are a few options as to how to implement this auto-complete text box – one big design decision is “get the list live” or “download the static list”.

The Internet, including Google Suggest in particular, is full of examples as to how to implement a “live lookup” auto-complete. For example, user types “vod”, and your program goes back to the server, asks what are the matches for “vod”, and then returns the results to the browser for display. Not a bad idea if you have many, many items to search through.

However, in our case, the list is reasonably small (something like 500 items?), so maybe we can store the entire list on the client? Maybe, as the user downloads the initial webpage, we’ll also pump in all possible options of ingredients they might enter, and then as they type in “vod”, we’ll just go through 500 items on the client side, in JavaScript?

This second approach is what I chose for cocktail builder; there is one big tradeoff here: initial download time of the 500 item list vs on-demand, lazy download of the results. Several outcomes of this tradeoff:

- the price of downloading the list is paid only once per user (remember, your stuff gets cached by the browser); no matter how many ingredients they add – there will be no roundtrips. The users will, I betcha, have 5-10 ingredients in their bar, which will translate to 5-10 queries with the first approach, and only one payoff with the second approach.
- users that come back won’t have the roundtrips.
- server load: less roundtrips means that I don’t need to buy a 10-rack super-powerful server :-). Which means no annoying ads for the users to cover the costs.
- showing matches for user input: clear performance win. As soon as the users types in “vod”, I do snappy JavaScript processing, all on the client side, to run through an ordered list of ingredients (how long does it take a 3GHz computer to do 500 string matches?). If I went with a lazy approach, users would have to wait for a roundtrip, which takes time.

What exactly is the cost of downloading the full ingredient list at load time? My current list of ingredients has 250 items; it’s 13KB, when not compressed (we’ll talk about JavaScript compression at a different time). You can make the judgment yourself, I’m sure.


Link | Leave a comment |