Partnership with GarnishBar.com
Jan. 11th, 2008 | 12:16 pm
As you know, Cocktail Builder is a creation of a techie - minimalist design, clean functionality, and no editorials. I know that the site can benefit greatly from someone with a creative side; Ben is just the right person here. Plus, the articles he writes will nicely complement the mean-and-lean 'what you can make out of what you got' functionality, to round out a full, one-stop-shop cocktail website that will make visitors return time after time.
GarnishBar and Cocktail Builder are announcing a partnership; it's gonna be great! Expect growth and new content! I'm super-excited about the opportunities this brings - the sites perfectly complement each other today - and in this case, 1+1 = 3!
Link | Leave a comment {1} | Add to Memories | Tell a Friend
The Query to Power It All
Nov. 27th, 2007 | 06:41 pm
This article is only meant for major, major SQL geeks. You'll be bored out of your mind if you're not one; don't tell me I didn't warn you.
As you might have guessed, there's one, only one query powering the basic cocktail builder functionality. It's the query that answers the question "if I have ingredients A, B, and C, give me a list of all cocktails that I can make". Also:
- give me cocktails that I can almost make, ordered by the number of missing ingredients (i.e. those that I can make right away go first, those that are missing a single ingredient second, etc).
- consider ingredient substitutions (a concept I refer to as "normalized ingredients"): if you have Sky Vodka, and the recipe calls for Stoli, you can make the cocktail just fine.
- calculate cocktail ratings from the user feedback
As you might guess, the SQL is rather involved. Here's the statement that gets executed when the user has two ingredients - items with ID's 73 and 76:
SELECTInnerCountsTable.CocktailID,InnerCountsTable.CocktailName,InnerCountsTable.CocktailShortName,CountPresent,CountMissing,FORMAT(round((AVG(rating.Rating)*2))/2, 1) AS Rating,(CountPresent - CountMissing) as DifferenceFROM(SELECTcocktail.ID as CocktailID,cocktail.Name as CocktailName,cocktail.ShortName as CocktailShortName,cocktail.Instructions as CocktailInstructions,SUM(IF(userHas.NormalizedIngredientID IS NULL, 1, 0))as CountMissing,SUM(IF(userHas.NormalizedIngredientID IS NULL, 0, 1))as CountPresentFROMmix INNER JOIN ingredientON mix.IngredientID = ingredient.IDLEFT JOIN (SELECT NormalizedIngredientIDFROM ingredientWHERE ID=76UNIONSELECT NormalizedIngredientIDFROM ingredientWHERE ID=73) AS userHasON userHas.NormalizedIngredientID =ingredient.NormalizedIngredientIDINNER JOIN cocktail ON mix.CocktailID = cocktail.IDWHERE NOT (cocktail.Status = 'Cut')GROUP BY mix.CocktailID) AS InnerCountsTableLEFT JOIN ratingON InnerCountsTable.CocktailID = rating.CocktailIDWHERE CountPresent > 0GROUP BY InnerCountsTable.CocktailIDORDER BYCountMissing ASC,Difference DESC,CountPresent DESC LIMIT 30
A weird part of the query is the "UNION" part inside (the one that's creating the userHas part). Basically, I don't believe there's any other way to create an in-memory virtual table in MySQL to do a JOIN with. Anyone got better ideas?
Or, maybe you can offer a completely different approach that will work faster?
Link | Leave a comment | Add to Memories | Tell a Friend
The first place!
May. 18th, 2007 | 07:42 am
So, here I am, looking at the Google Analytics for CocktailBuilder last week, and the graph looks approximately like this:
Whaaaaat? was exactly my reaction. Here I am, having about 30 visitors on a usual day for about a month or so, and suddenly, a jump to 300, and then 700 visitors a day!
The reason was really easy to find. SeoMoz.org ranked Web 2.0 sites, and CocktailBuilder won the first place in the Fun Stuff category!
Here's their full review, for historical purposes.
Woooo hoo! :-)
Link | Leave a comment {10} | Add to Memories | Tell a Friend
Cross-Domain Ajax
Apr. 19th, 2007 | 08:05 pm
<rants>
Oh, friends, friends. Whoever thought of setting up the so-called Same Origin Policy was, ahem, not very forward-looking.
A bit of history: Microsoft. Internet Explorer 5.0 (released in 1999). Outlook Web Access (OWA) team comes to Trident (IE) and asks for a way for the web page to talk to the Exchange server, asynchronously. They get the XMLHTTPRequest object, currently more famously known as XHR. Seven years later, the industry catches up and understands that XHR can be, indeed, used for Ajax.
That XHR object has a very interesting property: you can't initiate requests to pages other than the one that the web page is originating from. For example: your site is cocktailbuilder.com, but you want to asynchronously communicate with maps.google.com through their REST API to get to something useful.
Tough luck, not gonna happen. Your XHR object has a security restriction: talk only to cocktailbuilder.com. Why? Well, because you could, theoretically, sniff the local network, then send the results to evilhacker.com.
BUT WHY IN THE WORLD DO YOU HAVE TO USE XHR FOR THAT??? Who in his right mind was setting up that restriction??? If I'm running JavaScript on your browser, I can just freaking insert an IMG tag, point it to evilhacker.com, and through clever parameters, pass any info you want.
There's a huge hole, and they put a tiny little plug over it.
Results:
1) It's much harder to code legitimate applications - mashups - and any kind of JavaScript-based services.
2) Security hole is as glaring as it ever was.
</rants>
OK, now to the useful stuff. There are a couple ways to set up two-way asynchronous communications across domains.
1) Outright ugly methods: Flash and iframes.
2) Proxy. You set up a service on your domain that relays all requests to the target domain. Your XHR is talking to your box, your box is being the middleman between the other box and the browser.
+ You can still use XHR and related frameworks.
- Load on your server goes up; plus, if you're thinking about providing a service (I am - ajaxmetrics.com), this means that your clients will need to install some PHP/ASP/JSP on their server. That's a very bad thing; this implies lots of trust between the service provider and the client (if I have my PHP code running on your server, I own your box). That's why guys from instacomment.com are not going to make it big any time soon.
3) Script tag monkeying, also known as on-demand JavaScript. You all know that you can insert a <script> tag into your HTML, and have that tag point to an external server. But wait, you can insert that script tag dynamically! What does that mean? That you can dynamically inject JS code into your app, and that JS is coming asynchronously from the server.
Here's the code to make that "fake Ajax" call:
function hackyAjaxCall (op, params, callbackFunctionName) {
// Inject the script tag into the document and pass parameters
// through the URL. Get and execute the JS that the server .
// returns. No, there are no security risks: the server is mine.
var head = document.getElementsByTagName("head")[0];
var script = document.createElement('script');
script.type = 'text/javascript';
if (op.length < 1) {
alert("Error: Unspecified op parameter for the Ajax call");
}
var opUrlParam = "op=" + op + "&";
var callbackUrlParam = callbackFunctionName ? "callback=" +
callbackFunctionName + "&": "";
var UrlParams = "p=" + escape(Object.toJSON(params));
script.src = "http://server/endpoint.php?"+ opUrlParam +
callbackUrlParam + UrlParams;
head.appendChild(script);
}
+ True, asynchronous communication between the browser and a web service on a different domain.
- This method is not as pleasant to work with as XHR, and there are no nice frameworks like Prototype, YUI, and Dojo. There was a project called "Ajax Extended", which added things to Prototype to do this, but the website is now dead. I'd love to see this feature in core Prototype.
- Security implications: if you don't trust the other end, don't do this. They can inject any JS they want into your browser-side app, and steal your cookies.
Further reading:
1) XML.com article of both mechanisms mentioned above.
2) Good article on the subject from snook.ca.
Cheers!
-cb
Link | Leave a comment {1} | Add to Memories | Tell a Friend
AjaxMetrics
Apr. 8th, 2007 | 02:10 am
You've seen my recent post about the importance of knowing what the users do on your site. My friend David gave me great feedback to actually productize the idea, considering there really aren't any good solutions out there - so I put together a site where I discuss the problem in detail, and where eventually the tool will be available.
Check it out; it's called AjaxMetrics.com.
Link | Leave a comment {3} | Add to Memories | Tell a Friend
Homework: Analyzing Competition
Apr. 6th, 2007 | 01:34 am
It's good to know who else is trying to do the same thing. Why?
Obvious answers - so that you don't reinvent the wheel; so that you know what they are doing; so that you can differentiate yourself properly.
Non-obvious answers - presence of competition is actually a good thing, it means that the market you're about to enter in fact exists, and you're not delusional. If you're planning to provide Internet services using toilets as hotspots, and noone else is doing anything similar, you are probably a bit off base (crap, how did they - CityNet - get the VC funding with that idea??). On the other hand, if other [sucky] solutions to the same problem exist, you're on the right track.
Well, of course, there are revolutionary things out there, too, but hey, if you're Einstein, you better realize it yourself. As one of my favorite bloggers Guy Kawasaki says, toilet paper is radically better than leaves.
But back to the topic. Competitive analysis. Let's assume for a moment that you decided to believe the crazy author of this blog, and actually do it. Well, I admire your trust in me - even I don't have that kind of trust in me :-).
All right, all right, back on track. Here's what you may want to cover:
1) What's the problem that you're trying to solve? For cocktail builder: find out cocktails you can make from ingredients in your bar.
2) What are adjacent problems? Where to buy liquor; history of cocktails; cocktail pictures; etc.
3) What are the Google search terms for the problem you're trying to solve? For me, it's cocktail, bar, ingredient, etc.
4) Who comes up on top in search results for these keywords? These are likely your top current competitors, if you're doing an Internet business.
5) Who does things right nearby? For cocktail builder, it's food sites that let you put in ingredients you have and find out the meals you can make from 'em. These folks may come in to your market at any moment.
To inspire you a little bit, I'm attaching the competitive analysis doc I used when doing the initial homework for my site. Take a look if you'd like.
Cheers!
-cb
Link | Leave a comment | Add to Memories | Tell a Friend
Vision vNext: Organizing a Party
Mar. 24th, 2007 | 04:16 am
I'm thinking about the next version of the CocktailBuilder, and I'd like to share an idea that a friend of mine gave me. Basically, it's about one good end-to-end scenario for the website that I'm almost achieving, but not quite.
What's the most obvious use case for the site today? "Enter what you have in your bar, and you'll see the cocktails you can make from it". Great.
We want more.
We want to think end-to-end. What is the user trying to do? Probably, organize a party. A good party host wants to post a bar menu somewhere, to disincline the cute and pleasantly un-intelligent freshmen from the nearby dorm from asking "hey, can you make a sex on the beach?" RTFM, baby.
To make that scenario work, you need to help the savvy sysadmin of the party print out that M, I mean the manual, oops, the bar menu. And you need to help him manage that bar menu. Really, the bar menu is just a list of favorite cocktails; and printing of the menu is just a matter of converting the pretty HTML page to a pretty PDF file.
I'm half-way there on the implementation - favorites are done, but I'm still thinking about whether doing a PDF exporter is the right way to go .
What do you think about this scenario? Anything I'm missing? Any nearby scenario that we can get for cheap?
Cheers!
-cb
Link | Leave a comment {4} | Add to Memories | Tell a Friend
What are users doing on your site?
Mar. 24th, 2007 | 03:44 am
So, you have a great-looking AJAX website. You're Zillow, or Google Maps, or whatever. You have a billion visitors, but as a great entrepreneur, you understand that the only way to keep afloat is to give users the features that they want.
How do you do this?
1) Ask your users. Focus groups and similar methods still apply.
2) Talk to your users. Open a blog, get the comments. Put a comment box right there on the website. Have you noticed the "rate this content" concept on MSDN?
Do it.
3) Gather statistics on what users are doing on your site. We'll be talking about this method in depth here.
You surely noticed that the first two methods require either a major money investment, or proactive users; the vast majority of your users, however, won't be this "early adopters" crowd, willing to debug your JavaScript and bring you coffee just for making the coolest Ajax app around.
Average users want their problems solved. They don't want to help you. They want to help themselves. You want to help them, too, but how do you help, if they don't say what they want?? You track their behavior. You use tracking cookies. You record every single click they make within your application, look through the session information, and research interesting sessions. You personally, the site owner, need to read through these logs to understand what users do in your app.
Let me give you a more concrete feeling of what I'm talking about. Here's how a sample log excerpt may look like in CocktailBuilder:
- Added ingredient "absolut vodka"
- Tried to add an ingredient "absinte", but failed, because there was no match in the database of ingredients.
- Added ingredient "orange juice"
- Expanded the details for the "melon ball" cocktail
- Expanded the details for the "mandarin passion" cocktail
- Added the "mandarin passion" cocktail to favorites
This is just like conducting a customer visit: you're observing the user do their thing. Except that it's not happening in real-time, and you're not really seeing a video, but you know your app enough to make sense of it.
Plus, it's all quantitative, unlike videos: you can make a query such as "how many times in the last month folks tried to add an ingredient, but failed". Or you can draw conclusions over time, determining whether your efforts to improve the ingredient database is paying off. This, my friends, really is business intelligence for the Ajax world.
How do you set this kind of logging system up? You need to do this by hand; I haven't seen any good frameworks for this just yet, unfortunately. I told you about my love for Google Analytics - it's a great product to measure your site-wide performance (i.e. how many visitors came to your site in the last month? which country were they from?), but it's utterly useless for any kind application usage analysis for Ajax apps (since everything is on the same page, likely, with the same URL). So old-school metrics tools won't help you, and Apache log files won't, either.
You need a custom solution. At it's basis, there are 2 components:
1) A database table that defines the "datapoints" that you want to collect. Sample:
2) Client-side data collection mechanism. Really, just a backlog of things that happened on the client side. Things that may happen are defined in the first step. The backlog will be sent to the server every 5-10 seconds, and when that's done, the backlog is back to empty.
// submit user actions to the server every X milliseconds
window.setInterval(submitUserActions, 5000);function registerUserAction(id, data) {
userActions[userActions.length] = [id, data];
}// Send the usage data back to the server to track what
// the user was doing on the site
function submitUserActions() {
if (userActions.length > 0) {
// Ajax call sending userActions to the server
...// We've sent the data to the server,
// so let's clear the buffer
userActions = [];
}
}
Every time something interesting happens in the app, I'd just call registerUserAction; for example, when the user is adding an ingredient, I call
registerUserAction(DataPoint.ADD_INGREDI
ENT_BY_NAME,
inputTextBox.value);
Simple, isn't it?
One question remains: where is the DataPoint array coming from? At some point in the past, I mentioned a technique for generating semi-dynamic content. You'd use just that here: at build time, your script will dump the contents of the datapoints table into a .js file (it would have to do some minimal formatting to be syntactically correct). Then, at runtime, you just include that .js file as you would any other JavaScript resource. Several good things about this approach:
- Your code remains readable. You can easily tell that in the code fragment above I was trying to capture the fact that someone entered an ingredient. That fact could have been captured by using the datapoint id (1), not it's name (ADD_INGREDIENT_BY_NAME). This would have saved us some JS size, but I'll trade maintainability for download size any day.
- It's really easy to add new datapoints. Just add them to your database table, and code a single registerUserAction() call. Then, at build-time, everything will start working automagically. Again, ease of maintenance.
For dessert, take a look at some reports that I was able to generate from CocktailBuilder usage data. Answering the "how did you make this??" question: these were done using Rico/Prototype and the LiveGridPlus extension. 200 lines of code (PHP, JS, HTML, SQL, all intermingled - yeah, I don't drink my own Koolaid when I build quick-and-dirty things).
Link | Leave a comment {1} | Add to Memories | Tell a Friend
JavaScript Validation and Compression
Feb. 11th, 2007 | 06:38 pm
I know it, you’re an AJAX geek. Just like me. You think it’s the next Win32, or GTK, or .NET, .ABC, .DEF, or .whatever. There’s one problem, though: Web 2.0 sites are not just about AJAX. They are about JavaScript and client-side code, as much as they are about client-server interaction.
Look at Google Analytics. Their sexiness comes from their gorgeous client-side Flash, not because they get stuff asynchronously from the server. Look at Google Maps, the original Ajax app. You may say “#$@%, what are you talking about: Google Maps are all about the async stuff!”. No, Google Maps are about drag-and-drop. If you needed to click the stupid “left arrow” button to get the map of what’s to the west, you wouldn’t care that only a part of the screen is being refreshed.
One of my friends frequently makes me repeat the following phrase: “Ajax is not a silver bullet”. He’s so damn right.
What is the silver bullet? Beautiful user models. Intuitive interactions. Visual effects on-par with rich client applications (fade in/out, cinematic expand/collapse, sexy highlight, animated movements). And if you want to get those in your Web 2.0 app, you need to do some major JavaScript work.
Yes, there are platforms out there that will help. I won’t talk about those here. Instead, I’ll focus on two categories of tools that I found to be really helpful in my JavaScript development efforts lately; these two aren’t talked about much, to my sadness.
JavaScript Validation
JavaScript is a beautiful language, too bad it’s abused too often. An inexperienced developer can write something (== vs ===, or “var vs no var”), not even know what the difference is, and have it work most of the time. And that’s the problem. JS is a language without a standard compiler, without type checking, without normal IDE’s. This means that JS developers have to be really careful; but humans make mistakes, and they need tools.
I found one great tool to do JS validation; I use it to do automatic code review of my JavaScript files before I check in. The tool is called JSLint; here’s the DOS batch file snippet I use:
echo Verifying Javascript files using JSLint:
FOR %%f IN (*.js) DO (
echo Processing %%f..
cscript //B //E:jscript ../tools/jslint.js < %%f
)
If something’s wrong with any of your js files, JSLint will complain, pointing out the exact line and type of issue, for example:
Lint at line 298 character 23: Use '!==' to compare with 'null'.
if(el.currentStyle != null)
JavaScript Compression
When you have lots of JavaScript (and man do I do), you need some way to compress it. The less bytes each client has to download, the snappier the load performance is; this also helps to scale your server to more users.
There are a few ways to compress your JS: the easiest one involves removing all comments and whitespaces (browsers don’t usually execute those :-)). More complex ones involve renaming variables and methods to shorter names: this helps if you’re like me, who loves using really descriptive variable names, even in JS. Others do obfuscation, which compresses the JS further, and prohibits humans from easily parsing your client-side code.
I’ve used several tools in the last few months: one is called JSMin; I believe it’s written by the same person who wrote JSLint. This tool does everything except obfuscation. I’ve had good experience with it, but have come to use a different tool as it turned out to be a tad more powerful: a tool called Packer from http://dean.edwards.name/packer/.
Here’s the way I use it:
1) I have many .js files, a few of which were written by third parties (those are in separate .js files to keep me sane). Before compression, I append all of these files together into a single temp.js file to achieve higher compression ratio.
2) Compress the temp.js file using Packer.
Here’s a snippet from my build script (DOS batch file):
echo Merging all javascript files into one...
SET MERGED_FILE_NAME="compressed\\core.js"
SET TEMP_FILE_NAME="temp.txt"IF EXIST %TEMP_FILE_NAME% del %TEMP_FILE_NAME%
FOR %%f IN (*.js) DO (
echo. >> %TEMP_FILE_NAME%
type %%f >> %TEMP_FILE_NAME%
echo. >> %TEMP_FILE_NAME%
)echo Compressing Javascript files using Packer..
IF EXIST %MERGED_FILE_NAME% del %MERGED_FILE_NAME%CScript /nologo ..\devtools\packer\pack.wsf %TEMP_FILE_NAME% >> %MERGED_FILE_NAME%
del %TEMP_FILE_NAME%
Some stats for your mathematical pleasures: uncompressed JavaScript for the cocktail builder takes up 91.1KB. Compressed, it’s 33.2KB. I’m sure you can calculate the square root of the rate of change of flow of these kilobytes, if transmission is happening on the equator at the top of a 1500ft mountain.
Whatever. Enjoy your JavaScript :-)
cb
Link | Leave a comment | Add to Memories | Tell a Friend
SQL Injection Defense
Feb. 8th, 2007 | 06:31 am
We all heard of it. Sites going down because of SQL injections. Embarrassed admins. Credibility loss among users. Really simple concept – if your server-side script takes user input verbatim, and issues SQL statements based on it, you will be screwed. There are great scripts out there that will screw you automatically, without personal involvement of someone evil.
SQL injection is scary. Let’s talk about ways you can (and must!) protect your web app against it.
1. Verify data types of all non-string user inputs
Here’s what I do: create a separate class that does all communications to the database. I call it DataRetrieval. In that class, every method just gets you some data from the DB (i.e. executes a SQL query given some parameters), for example,
class DataModification {
public static function saveCocktailFeedback
($cocktailID, $starRating, $userInfo)
{ ... }
public static function saveSiteFeedback($feedbackText, $userInfo)
{ ... }
public static function registerUserSession($userToken)
{ ... }
...
}
Make it a habit to verify each and every non-string parameter in the method body before doing any SQL commands, for example:
public static function saveCocktailFeedback
($cocktailID, $starRating, $userInfo) {if (!is_numeric($cocktailID) || !is_numeric($starRating)) {
// someone is trying to hack us; do something (more below)
}...
}
2. Escape all string inputs
On MySQL, this is as simple as
$str = mysql_escape_string(stripslashes($str));
I usually create a wrapper class for working with the database (even when I’m using a prepackaged DB class as excellent as ezSQL), and define a function called DB::escape(), that does exactly that.
3. Keep track of security violation attempts
If someone’s trying to hack you, it would be nice to know (1) when it happens and (2) who was it (3) how hard did they try
Remember the “someone’s trying to hack us, do something” comment in suggestion 1? Set up a function similar to the following
public static function log_security_event() {
// get stack trace into the $trace variable$sql = "VALUES INSERT INTO securityevent (Trace, IP)
VALUES ('" .DB::escape($trace) . "', '".
$_SERVER['REMOTE_ADDR'] . "')";// execute the statement
}
This way, when someone tries to hack you, you know exactly what they tried to use, and you can see patterns in their search; you could then determine whether you’re dealing with an input error (oops, someone typed in a quote in a field accidentally) or an intentionally malicious user. You can even gauge the experience level of the hacker by the queries he tries to issue.
This function would fit very nicely into the DB wrapper class that I mentioned earlier.
4. Defense-in-depth: limit SQL user rights
Your PHP application issues a ton of queries to the production database. Poor database, I tell you. If I had to do all those queries, I’d just spit out junk and see if the app server can handle it :-).
Now, seriously: if the attacker was somehow able to run queries on your MySQL server, you want to limit the impact. For example, in the cocktail builder, PHP code should be able to query the known-and-sexy list of cocktails (read only! not modify!); report security violations (write only – only insert, not even modify; no reading allowed, either), and track and report on user feedback around cocktails (read and write).
So, as a part of your build script, you need to set the most-restrictive-possible permissions for the MySQL user that your PHP script will be running as. Here’s what mine looks like, abridged:
Unfortunately, this script can’t be generated automatically (mysqldump can’t dump permissions – at least I don’t know of a good portable way to do this), so you just have to keep this file updated by hand. Good news is that this shouldn’t be too hard: you don’t add tables to your DB that often, do you?
Oh, and one more thing: if you have several applications running on a single MySQL box, make sure that each app gets its own login (like cbuser in my sample script above), and that the logins have zero permissions outside of the designated database. This way, if a hacker screws up one of your apps, others stand a chance.
5. Defense-in-depth: production database backups
This is good against hackers, fires, and your cat chewing the network cable while database sync is happening. Do the (full, not incremental!) database backups regularly, and keep the dumps on a different box (or, even better, at a remote site).
cb
Link | Leave a comment {3} | Add to Memories | Tell a Friend
AJAX Podcast
Feb. 1st, 2007 | 05:06 pm
Funny thing is, I found a very cool podcasting series that's been open for over a year now; it's hosted by folks running ajaxian.com. They are fun, very knowledgeable, and their sole focus is AJAX and a tiny bit of JavaScript. The guys bring in influential people from the industry, and interview them; other of their podcasts are about "the state of AJAX" - new tools and best practices. Here's their podcast URL: http://media.ajaxian.com
Link | Leave a comment | Add to Memories | Tell a Friend
Developing Against a Live Database
Jan. 24th, 2007 | 08:58 am
All right, friends, it’s time for some serious development talk. You and a couple of your friends are building the next it, the Web 2.0 app of the future. You have a source control system set up, you even have a bug tracking system, you even have a spec and someone (not you!) who tests your software (may I remind you of the famous Joel Test?). You’re heading for success.
Your web app has a database back end, and you already have a few users out there, playing with your system; their data lives in your “production” database. You also have local dev boxes, with complete copies of the production environment, with development versions of the database. Anyone can mess with the development databases without causing havoc to real users’ data.
If you don’t have any of that, don’t read this article. Instead, go read Joel Spolsky’s blog or book “Joel on Software”. He’ll explain why all of these pieces are necessary.
If you do, however, you’re on the right track. But.
There’s always a “but” (or a butt?).
How do you propagate the changes to the database schema, if there are any, to the production version of the database? You’re still doing development on the system, schema and data are changing with each build, you can’t afford a DBA to do crazy DIFF’s between your dev database and production database…
Simple, and flawed, approach
You already have a version control system. Let’s assume you’re using MySQL as your database (I am). You can just use mysqldump to dump the schema and data of your development database, and check in those .sql files into source control. Then, your build script just drops the production database, and recreates it from scratch using the .sql files from the source tree.
Note that you definitely want two dumps: one for the schema, and one for the data. This way it’s much easier to track schema changes in your source control system.
Only one, but very big problem: user-generated content gets lost.
You may say “copy over user generated content to the dev box before checking stuff in”. But the build isn’t instantaneous, and everyone who’s been doing something on your site during the time between your “sync” and the build will lose their data. Not good.
You may say “screw user content”. No, you didn’t say that. I just heard it from some weirdo in the back of the room. Screwing your users is a safe way to lose early adopters, who are the only way for you to deliver a quality app fast. If you don’t get feedback from those enthusiasts (and you won’t if you piss them off by continuously throwing away the data that they’ve put in), you’ll lose.
Complex, and successful, approach
Identify which tables in your database will contain user-generated content. For the cocktail builder site, these are the mainly the tables with user ratings and user actions, and table for reporting crashes.
Your build script will no longer drop the entire database and rebuild it from scratch; it will drop all tables except for tables with user content. That is, as a part of the check-in process, split your output .sql files as following:
schema.sql: DDL for all tables except those with user-generated content
schema-extra.sql: DDL for tables with user-generated content
data.sql: actual INSERT statements for all tables other than those with user-generated content
data-extra.sql (optional): you get the drill. This is optional because there’s not always a point in including user-generated data into your build – just back it up on the production database side, and sync it to the dev boxes once in a while.
Your script to dump the dev databases may look like this file below (this is a batch script – I do my development on a Windows box; please don’t flame me, I’m built of a fire-retardant):
@echo off
REM Full path to the MYSQL utilities. Required.
SET MYSQLDUMP="C:\xampp\xampp\mysql\bin\mysqldump"
SET MYSQL="C:\xampp\xampp\mysql\bin\mysql"REM Database settings
SET DBNAME=mydb
SET DBUSER=myusercd ..\setup_sql
REM Producing schema and data files for the build. They are
REM called "schema.sql" and "data.sql".
REM Using MySQLDump. Path is specified in the variable above.REM ----- Delete old versions of the files.
del schema.sql
del data.sqlecho Generating core schema and data files...
REM ----- Using settings specified in the configuration file, ignore
REM ----- a few tables that shouldn't go into the incremental build
REM ----- (user-generated content should not be discarded on upgrade).
REM ----- The schemas for those tables will go into a separate file
REM ----- that can be used to initiate a "full from scratch" build.
REM ----- Content for those is not a part of the build.
SET ResultSkipString=
SET ResultIncludeString=--tables
echo The following tables will not be included in the core build:
FOR /F %%t IN (userContentTables.conf) DO (
echo - %%t
SET ResultSkipString=!ResultSkipString! --ignore-table=%DBNAME%.%%t
SET ResultIncludeString=!ResultIncludeString!%%t
)REM ----- Dump the actual SQL schema/data creation scripts...
SET CommonDumpSettings=--comments=false --user=%DBUSER% --single-transaction --no-set-names --skip-add-locks -B %DBNAME%
"%MYSQLDUMP%" --result-file=schema.sql --no-data %ResultSkipString% %CommonDumpSettings%echo USE `%DBNAME%` > schema-extra.sql
"%MYSQLDUMP%" --no-data %CommonDumpSettings% %ResultIncludeString% >> schema-extra.sql"%MYSQLDUMP%" --result-file=data.sql --no-create-db --no-create-info --skip-extended-insert --skip-disable-keys %ResultSkipString% %CommonDumpSettings%
echo Done
And the mentioned file userContentTables.conf is just a newline-separated list of table names that contain user content.
Have fun!
cb
Link | Leave a comment {1} | Add to Memories | Tell a Friend
AutoComplete: Which items to show?
Jan. 21st, 2007 | 05:17 am
Some time ago, I wrote how freaking cool the auto-complete system is for ingredient entry. If you don’t believe me, go take a look at its implementation – I actually have it done. For the lazy ones – folks like me - it ended up looking something like this:
Sexy, huh?
Yea, whatever. I know what you’re gonna say – I’ve seen this a billion times.
You have, you’re right. But one thing you most likely never thought about when you used a similar sexy-auto-complete textbox: which matches show up?
Obviously, you want to show ingredient names that have the stuff that the user typed in as a substring. So, user types “vo”, you show “vodka” and “absolute vodka”. But you don’t want to show “Cuervo tequila”, do you? So, rule #1: only show stuff that has user’s substring as the beginning of the word.
Now, the more interesting question: how do you want the suggestions ordered? Few options here:
- Alphabetically. Simple to implement, easy to understand, cool. One problem: if there are 50 items that match (as there are with the substring “vo”), your auto-complete text box is not really useful.
- Depending on how frequently the item is used in cocktails. This is a pretty good measure of popularity; if there’s 100 cocktails that use orange juice, it’s probably because lots of people have orange juice in their bars, so you can infer that orange juice should show up before Stoli Orange for the query “ora”. However, this has drawbacks: what if your cocktail database is skewed?
- Depending on the number of previous users of the system that have added this ingredient. This means that the system is getting smarter and smarter as the number of its users increases; this is definitely a good thing. Drawback: chicken-and-egg problem, you need initial users to make the system smart, and those initial users will suffer.
So far, I went with approach 2 – just because the site doesn’t have too many users yet (do let your friends know about it if you like it!), but I instrumented the tools necessary to know what ingredients users are adding, so I’ll switch to 3 at some point.
Cheers!
cb
Link | Leave a comment {1} | Add to Memories | Tell a Friend
Generating Semi-Dynamic Content
Jan. 19th, 2007 | 10:33 pm
Remember how we talked about using a static list of ingredients for auto-complete, and downloading that static list to the client during first load? Well, that static list needs to come from somewhere; besides, it’s not really static.
Sometimes, a recipe will come along, quoting an ingredient that the system doesn’t yet have; all ingredients are stored in a database, and it’s relatively easy to add an ingredient there. But hey, we have that static lookup list now, how do we get that to work?
The “static” list is just a JavaScript file that looks similar to the following:
ingredientOptions = [
new Ingredient("7-up", 8),
new Ingredient("absinthe", 5),
new Ingredient("absolut citron", 3),
...
]
Basically, it’s just a list of 500 ingredient names and their popularities (for sorting). This list changes relatively rarely (when new ingredients are added – I’d say once a week at most). However, it does change, so we have to have a way to generate it from the database (copy-paste is a bad idea :-)).
I just used a PHP script that talks to the database, asks for the ingredient list, and generates the entire JS file from scratch. Roughly, the code looks like this:
There are many other places in the cocktail builder where I’m using this “js generation” technique. For example, it’s known to be good to have all application UI strings separately from application logic and even presentation. It helps with several things:
1) Localization
2) Strings consistency
One more thing to note is that UI strings may be references in two completely different contexts: in .php files, where strings are used to generate server-side presentation piece, and in .js files, where strings are most frequently error messages or tiny things like the “no matches” string for auto-complete. Some weird strings may appear in either context. But, as you might have guessed, it would be really nice to have all UI strings live in one place, no matter where they are referenced – in client code or in server code.
So I created a table in MySQL, called UIStrings, where all strings will be stored an edited:
Then, I created two scripts – one to generate a server side include file strings.php.inc, and another one for client side strings: strings.js. These scripts are trivially different from the one quoted above.
Here’s a part of the listing of the generated strings.php.inc:
In the actual PHP code, if I need to reference a UI string, I’ll just write
UIText:: COCKTAIL_DETAILS_ACTION_EMAIL
Similarly, the listing of the generated strings.js file looks like:
var UIText = [];
UIText.INPUT_BOX_HINT_TEXT = "add ingredient...";
UIText.INPUT_BOX_NO_MATCHES = "No matches";
UIText.INPUT_FEEDBACK_HINT_TEXT = "give feedback..";
And if I need to reference a UI string in JavaScript code, I’ll just write
UIText.INPUT_BOX_HINT_TEXT
One last thing to note is that these generation steps need to be completely automated – they need to be a part of your build or checkin processes.
cb
P.S. I use the same “generated” .js for client-side usage statistics gathering, but that’s a different story. Let me know if you want me to share it ;-)
Link | Leave a comment | Add to Memories | Tell a Friend
Persisting User Data Across Visits
Jan. 15th, 2007 | 11:30 am
Almost every non-trivial website out there is facing a classical problem: persisting user data across visits (also knows as sessions). For your MySpace account, it's a list of your friends, and that clever joke you put into the profile to help replicate the DNA. For CocktailBuilder, it's a list of your ingredients and your favorite cocktails. You don't want to type them in every time, do you?
Now comes the big problem. How do you remember these in a seamless, non-intrusive fashion?
The obvious answer is a login system. Heck, go to myspace, the first thing you see is a logon box. You can't access almost any of the site's features without being logged in. Why is that? Because much majority of the site's information is private (others would get pissed if you saw their mailbox without their permission), so it's an absolute must to know who the user is before showing them their mailbox.
So, the plus of this approach is relative security of user data. Another plus: you can access your account from any computer. What are the minuses? You know already - you have to register for yet another darn site. Uhhhh. Pick a login. What's your email. Check that email. Click the link. Your account is now active, please log in. Next time, type the password again. Uhhhhhhh.
What's the alternative?
Modern, and not-so-modern, browsers provide a feature called "cookies". Essentially, it's a way for a website to store a piece of information on a user's computer, and access it later. All of this can be done seamlessly, without any explicit user action. For example, many websites store tracking cookies on your computer so that they can later on identify you and offer you customized services based on your previous actions.
What's good? Transparency. Plus it's reeeally easy to get the desired effect of "hey, they remembered!". Minuses: cookies are not secure. If you're a bank, don't do this. If you're a bank IT guy, you read this, and you were surprised, please realize that the sky is blue. Cookies can be stolen by other sites, if the browser has a security issue (and browsers have security issues in this area very frequently). Another minus: cookies are an ancient technology. It's not very pleasant to work with them from JavaScript (you have to serialize your data into a string less than 4KB; I'll talk about this at a different time). Another minus: cookie data is not accessible on other computers, so you wouldn't be able to share data between your home and work computers.
There are many websites that go with the third approach - combination of a login system with cookies, so that you only need to log in once a day, and throughout that day, your cookie serves as your identity. Think Amazon: when you come in, you see "Welcome, Bob (or whatever)". How do they know? The obviously don't store your IP address, that would be silly (IP addresses change for many consumers relatively frequently). They store a cookie.
So far, for CocktailBuilder, I decided to go with a plain cookies approach. There isn't any secure data in there - just a list of ingredients and favorite cocktails. Simplicity is key here. Yes, the "travel with my data" feature is missing. I'm not sure of the best way to cover for it.
One idea I had was "send my settings to another computer", which would generate an email with a hyperlink that has all of the cookie data. User would be able to send this to themself (or a friend), and get all of their settings on another computer. I haven't thought about this much, do tell me what you think.
Another idea - non-mandatory login system. You want to save your settings - register for an account. You want to load your settings from another computer - log in. Otherwise, everything else is done through cookies.
cb
Link | Leave a comment | Add to Memories | Tell a Friend
New Build: 90 new cocktails, 110 new ingredients
Jan. 11th, 2007 | 10:04 pm
The title says it all. It's the first time I'm doing a "release announcement" like this, so I think I'll share the new things I added to the database this time around.
By the way, you may wonder, how did I prioritize the ingredients to add? There's been more folks visiting the site in the last few days - just look at the screenshot from Google Analytics below:
So I was able to analyze the things that they were looking for and didn't find, and added them to the database. Here's the list of new stuff:
| Cocktails | Ingredients |
| 1960s Black Devil | 151 rum |
My goal is to make a site that's useful for the community - and the only way to do that is to look into what folks are trying to do with it and failing, and enabling those things.
Cheers!
-cb
Link | | Add to Memories | Tell a Friend
Which Cocktails to Show?
Jan. 10th, 2007 | 08:48 am
All right, we’re trying to show the user what they can make from what they got in the bar. Cool.
We know what they got in the bar (see previous post); now, we better show them some useful cocktail recipes, or they’ll sue the hell out of poor me for false advertising :-). And hey, I gotta pay for hosting, no dollars to waste on legal fees, sorry.
[Side note: cocktailbuilder.com is currently hosted on my home server, an ancient Linux box - PIII 500MHz. 40KB/s upload rate. $30/month for cable internet with Millennium Digital Media – btw, wonderful ISP.]
All right, jokes aside. We need some alcohol here.
I (my second self), the semi drunk college student Jane, am holding a party. I go to San Jose State, so I managed to sneak out some lemons out of my auntie’s back yard, so these lemons end up on my ingredient list. Now back to cb, the poor site owner that’s about to be sued; he’s thinking whether to show 1940s blue blazer, a cocktail that involves lemon peel, to our tipsy Jane. One more piece of data: CB doesn’t know that Jane is so drunk that she wouldn’t be able to peel the lemon without converting the lemon into lemon juice.
So: do we show a cocktail that has lemon peel to a user that has lemons in the ingredient list?
Before you answer, let me ask you a different question. A cocktail called Purple Passion calls for Absolut vodka, but the user has Stoli. Do we show the cocktail?
Another one: a cocktail calls for a raspberry vodka, but the user has an orange-flavored vodka. Do we show the cocktail? What if the user has just “vodka”, and they didn’t specify the type of vodka they have?
Yeah. It gets complicated, doesn’t it.
For the first two questions (lemon peel vs lemon, Absolut vs Stoli), I’m guessing you answered “yes”, show the cocktail, let ‘em have some booze. For the last one (raspberry vodka vs orange vodka), you probably said “no, it will taste different”.
Crap. And I had such a good theory here. Now I’m totally gonna get sued, cause I don’t know how to build this :-)
Fine-fine, no more theatre, let’s talk about solutions.
Option 1: Define “homogeneous groups”: all cocktails that contain any light rum will show up if the user has any other light rum. That is, all members of the group are considered “synonyms” for cocktail-making purposes.
Option 2: Define a hierarchy of ingredients that looks something like the following:
If we let the user enter any one of the LEAF nodes as one of their ingredients (i.e. Stoli Orange, but not Orange Vodka), cocktail matches will become relatively simple. If a user holds a particular node, show all cocktails that include that node’s siblings or ancestors. For example, if the user has Stoli Orange, we’ll show recipes that include the following highlighted nodes:
However, this may become a bit constraining:
1) Users can’t just type in that they have some vodka in the bar. They gotta say exactly what kind it is. Make users suffer (i.e. stand up from their computer), and they will leave.
2) What do you do with recipes that call for stuff that doesn’t quite fit in one category, for example, what if a recipe says “use raspberry or orange vodka” (bear with me, I can’t imagine what the consequences of this substitution would be)? Our hierarchy would then have to become sort-of unclean, involving “virtual” nodes; these virtual nodes break the lovely tree structure, converting the whole thing into a directed graph:
Uhh, decisions, decisions… What’s your take?
cb
Link | Leave a comment {8} | Add to Memories | Tell a Friend
Give me your ingredients..
Jan. 8th, 2007 | 09:50 pm
All right, we know what we want to build: a system that tells you the cocktails that you can make from what you got in your bar. I won’t get tired repeating this mantra until the website comes live, or I come dead, whichever comes first :-).
So, the first task: we need to find out what the user has in their bar. How do we do that?
The problem, really, is another incarnation of a very well-known issue: how do we let the user select several items from a predefined list? One major tweak: the list is freaking huge. Gi-normous. You know how many possible types of vodkas are out there? My database currently has 60, and that’s just vodkas! Altogether, we’re looking at something like 500 valid entries.
But there are also invalid entries! What if one of our users thinks that Chartreuse is spelled with an SH instead of CH?
So, taking all of these requirements into account, let’s look into possible solutions.
Option 1
The obvious solution: drop-down list box, also known as an HTML <SELECT> tag. Looks like this:
I’ll allow myself to remind my faithful readers that drop-down list boxes are generally known to be good with average-sized lists (for example, they are good for a list of states). However, they have quite a few limitations:
1) You need several dropdown controls to let the user pick several items. Each dropdown needs to have quite a bit of HTML associated with it – each entry needs an <OPTION> tag.
2) It’s pretty hard to navigate the dropdown that has more than, say, 100 items, by using a mouse: the little scroll bar is not quite usable: it’s too small.
3) Navigating entries using the keyboard is hard, too: up until Internet Explorer 7, typing “ora” with the keyboard focus on the above dropdown would take you to an entry that starts with “a”, instead of the desired orange juice.
4) Think of “Stoli Raspberry” vodka. Some users may refer to it as “Raspberry Stoli”, others – as “Stoli Raspberry”. Does this mean that we need two different items for the same thing in our dropdown?
Option 2
Multiple-selection list box, or just a regular list box with multi-select enabled. These two look like this:
Other cocktail sites that I looked through (for example, idrink.com, or cocktail.uk.com), use this kind of user interface.
You know that option 1 stinks – but there’s someone out there, at least cocktail.uk.com, that doesn’t understand that this option is almost just as bad. And that exact thing is one of the reasons why I feel obligated to make the life of all casual alcoholics better by creating the cocktail builder website :-).
Now, why is this option good from afar, but far from good?
1) If you’re using a list box with multiple-selection enabled: do you really think people know that CTRL-clicking will select several items? Plus, when you’re doing this CTRL-clicking voodoo, you’ll inevitably click outside of the control, which will lose all items you worked so hard on.
2) For the multiple checkboxes: a much better user model, but hey, I still need to scroll through an immensely long list, and there’s no good way for me to quickly go to an item (besides using the browser’s CTRL+F, but how is this better than just a text box for entry?)
3) Still no good answer for “Raspberry Stoli” vs “Stoli Raspberry”
Option 3
The holy grail of Web 2.0, Google AutoSuggest. Looks like this:
You’re probably expecting me to start screaming about how freaking cool this is, but I’ll let you decide this yourself :-). One of my favorite sayings goes: “Diplomacy is an art of letting others have it your way”.
Why is this cool?
1) Super-fast entry: type in “vodka”, and see all vodkas out there. Even if vodka is not the first word in the ingredient name, you can still show that option in your auto-suggest box (because you’re the one controlling the logic for what to show)
2) Almost no issues with scalability and real estate on the screen: it doesn’t matter how many items the user has to choose from; they only see options that fit the context of what they typed in.
3) Keyboard navigation: unlike previous options, you can enable up/down arrow navigation through the list of choices, offering enter/tab to actually pick an option. This makes the site much more accessible to people who need it. Recall Joel Spolsky, and design for users that can’t use do precise mouse movements, that don’t have time, that have their cats chewing the mouse cord and babies screaming in the room nearby. Users of my website will very likely be pretty drunk :)
4) Multiple entries require multiple auto-complete text boxes, but c’est-la-vie.
There are a few options as to how to implement this auto-complete text box – one big design decision is “get the list live” or “download the static list”.
The Internet, including Google Suggest in particular, is full of examples as to how to implement a “live lookup” auto-complete. For example, user types “vod”, and your program goes back to the server, asks what are the matches for “vod”, and then returns the results to the browser for display. Not a bad idea if you have many, many items to search through.
However, in our case, the list is reasonably small (something like 500 items?), so maybe we can store the entire list on the client? Maybe, as the user downloads the initial webpage, we’ll also pump in all possible options of ingredients they might enter, and then as they type in “vod”, we’ll just go through 500 items on the client side, in JavaScript?
This second approach is what I chose for cocktail builder; there is one big tradeoff here: initial download time of the 500 item list vs on-demand, lazy download of the results. Several outcomes of this tradeoff:
- the price of downloading the list is paid only once per user (remember, your stuff gets cached by the browser); no matter how many ingredients they add – there will be no roundtrips. The users will, I betcha, have 5-10 ingredients in their bar, which will translate to 5-10 queries with the first approach, and only one payoff with the second approach.
- users that come back won’t have the roundtrips.
- server load: less roundtrips means that I don’t need to buy a 10-rack super-powerful server :-). Which means no annoying ads for the users to cover the costs.
- showing matches for user input: clear performance win. As soon as the users types in “vod”, I do snappy JavaScript processing, all on the client side, to run through an ordered list of ingredients (how long does it take a 3GHz computer to do 500 string matches?). If I went with a lazy approach, users would have to wait for a roundtrip, which takes time.
What exactly is the cost of downloading the full ingredient list at load time? My current list of ingredients has 250 items; it’s 13KB, when not compressed (we’ll talk about JavaScript compression at a different time). You can make the judgment yourself, I’m sure.
Cheers!
cb
Link | Leave a comment | Add to Memories | Tell a Friend
Who is this?
Jan. 6th, 2007 | 09:21 pm
You’re probably wondering: who in the world is doing this super-altruistic free cocktail site thing, hoping to make millions?
You’d be wrong on just one count: the millions. You’d be almost right on the altruistic part.
Let’s go slowly: what do you already know about me from the last post? I have a girlfriend; this doesn’t, however, imply that I’m a guy or a girl; you know that my girlfriend takes showers (important piece of quality of life!); you can guess that I like probably cocktails, and that the narration is happening somewhere in Seattle. You also know that I – or someone I know – or something – built some website.
Not enough for your voyeuristic pleasure. Well, I admit, it wouldn’t be enough for mine, either.
I’m just a girl…
Sorry, can’t get that stupid song out of my head now.
I’m not a girl, I’m a guy. I’m reasonably young, reasonably geeky, reasonably fashionable. Seattle-ite. Live in downtown.
Love coding. I don’t write code for living, just for pleasure.
Love making people happy. Biggest fault I’m willing to admit: watching someone use the system I built.
Love creating usable systems. Love Joel Spolsky. Love PHP and MySQL.
Would love to create a cocktails website that will make you happy with your bar. Tell me what you’d like in it, and we’ll discuss in comments.
Cheers,
cb
Link | Leave a comment | Add to Memories | Tell a Friend
Well, HI!
Jan. 5th, 2007 | 07:26 pm
Hello there! I proclaim the blog of the cocktail builder officially open.
“Cocktail what?”, you might say, and you’d be right. It’s a brand new project that you most likely never heard of; this post aims to shed some light on it.
In one sentence: website that helps you find out what cocktails you can make from the ingredients you have in your bar.
Imagine: you come home from class; there was a party at your house yesterday. Damn, I’d want a drink too. But the only thing you got in your bar is tiny bit of vodka, some champagne, and leftovers of a couple of juices in the fridge – pineapple and cranberry… You’re wondering whether mixing all that stuff together will make a delicious drink of gods or a disgusting bleah… Nope, you’re not wondering. You know where to go: www.cocktailbuilder.com, where you type up the stuff you got and find the stuff you can make – Puck’s Potion and Bay Breeze, each with detailed instructions and user reviews. Now your quickly put-together cocktail is much more likely to impress you – and your possible guest.
You may ask – damn, why do I need yet another cocktail site? There’s the cocktail.com, and webtender, and a bunch of others… I thought so too. But then I changed my mind; you will too, I’m sure, but first – here’s a story.
One rainy Seattle morning, I was brushing my teeth, and my girlfriend was taking a shower. I hear a earth-shattering question from behind the shower curtain: “hey, we have a party tomorrow; do you know what cocktails we’ll be serving”? I - the procrastinator of all times - of course say yes, but we both knew I didn’t. So she asks: “Hey, is there a website where I can put in the stuff we have in our bar, and find out what we can make”?
The answer, at that time, was no. Soon, it will be “yes”.
Now back to your original question – why not cocktail.com and others? Because all existing sites will help you make a drink if you know it by name, but are notoriously useless if you’re like me – want to experiment, or just want to use your bar to its fullest, and don’t feel like throwing money around for a special type of raspberry-based orange-flavored vodka that will only help you make one fancy drink.
Why am I starting a blog? I have a few reasons – first off, I love to write. Second, I want to hear the feedback of enthusiasts, readers like yourself, - so that the website really helps you with your bar. Lastly, I want to promote the cocktail builder project – which, of course, will be completely free (with a bare minimum of ads), and won’t require any registration.
Cheers!
Alex Weinstein, aka cb (cocktail builder)



