A lot of posts on the interent take knowing the ins-and-outs of whatever Linux distribution for granted.  For those of us who are developers/programmers first, database adminsitrators second, and system administrators third it may not be so obvious!  So that can be very frustrating for me when Googling for answers.

So I found what I need to do to get this going and wanted to post it, both for any others but also for me later.  It was actually way easier than I thought once I actually found it...

apt-get install lighttpd php5-cgi    (download and install lighttpd and php 5 for fast-cgi)

lighttpd-enable-mod fastcgi  (enable fast-cgi module in lighttpd)

Download and install some other PHP modules:

apt-get install php5-gd

apt-get install php-memcache

apt-get install php-mysql

apt-get install php-curl

And finally restart lighttpd...

/etc/init.d/lighttpd restart

I had heard nightmares about getting PHP and Fast-CGI configured on Lighttpd.  Maybe they've just made it easier or maybe just Ubuntu's fabulous apt-get tool automates it for you.  Not sure but that was all!!  Didn't even break a sweat!

Just create your php file with init phpinfo() and pull it up in a browser and you should see all of your PHP options with the modules configured.  So easy.

I am running this on an EC2 instance, so I intend to turn this into a bootstrap script so that it will configure the server, copy over my configuration and php files from S3, and install all of this automatically.

Thanks to these posts for pointing me in the right direction:

 

After reading about the IndexedDB stuff (here, here and here) that is coming and looking at its code, I wanted to abstract my local database access a bit more.  And I decided to make it more closely match what they are doing.  That way when that launches then I can easily just update this object to pull from that new local storage.

Right now I am developing for the Mobile Safari platform, so I am not very concerned about other browsers that don't support this local SQLite engine.

First the jQuery Plugin Database Object

 


(function($) {

$.Database = function(dbname) {
	// Initialize
	var db = null;
	var me = this;
	this.isSupported = false;
	if (window.openDatabase) {
		try {
			db = window.openDatabase(dbname);
			me.isSupported = true;
		} catch (err) {
			db = null;
		}
	}
	// Public functions
	this.execute = function(sql, onSuccess, onError) {
		if (!onSuccess) onSuccess = me.nullHandler;
		if (!onError) onError = me.nullHandler;
		db.transaction(function(tx) { tx.executeSql(sql, [], onSuccess, onError) });
	};
	this.executeBatch = function(arr, onError) {
		if (!onError) onError = this.nullHandler ;
		db.transaction(function(tx) { 
			for (var i=0; i < arr.length; i++) {
				tx.executeSql(arr[i], [], this.nullHandler , onError);
			}
		});
	};
	this.nullHandler = function(tx, e) {  };
	this.listTables = function(s, e) {
		db.execute('SELECT tbl_name FROM sqlite_master', s, e);
	};
	this.openTable = function(tableName) {
		return new Table(me, tableName);
	};
	// Table object
	var Table = function(db, tableName) {
		// Private variables
		var me = this;
		// Internal default handlers
		var onError = function(tx, e) {
			this.lastTransaction = tx;
			this.lastResponse = e; 
		};
		var onSuccess = function(tx, e) {
			this.lastTransaction = tx;
			this.lastResponse = e; 
		};
		var onRowsReceived = function(rows) {
			this.lastTransaction = tx;
			this.lastResponse = e; 
		};
		// Public variables
		this.lastTransaction = null;
		this.lastResponse = null;
		this.lastCommand = null;
		// Public methods
		this.create = function() {
			var sql = 'CREATE TABLE ' + tableName;
			var bump=0;
			if (arguments.length > 0) {
				var col;
				sql += ' (';
				for (var i=0; i < arguments.length; i++) {
					col = arguments[i];
					if ($.isPlainObject(col)) {
						sql += col.name + ' ';
						if (col.type) sql += col.type;
						else sql += ' TEXT';
						if (col.length > 0) sql += '(' + col.len + ')';
						if (col.ext) sql += ' ' + col.ext;
					} else {
						sql += col + ' TEXT';
					}
					if (i+1 < arguments.length) sql += ', ';
					bump++;
				}
				sql += ')'
			} else {
				sql += ' (ID INTEGER PRIMARY KEY)';
			}
			// Responders
			var s = me.onSuccess;
			var e = me.onError;
			if ($.isFunction(arguments[bump])) {
				// Set success
				s = arguments[bump];
				// Increment
				bump++;
				// Look for fail handler
				if ($.isFunction(arguments[bump])) {
					// Set error handler
					e = arguments[bump];
					// bump
					bump++;
				}
			}
			// Execute
			this.lastCommand = sql;
			db.execute(sql, s, e);
			// Return self
			return me;
		};
		this.get = function() {
			if (arguments.length === 0) {
				db.execute('SELECT * FROM ' + tableName, me.onSuccess, me.onError);
			} else {
				// SELECT
				var sql = 'SELECT ';
				var bump = 0;
				if ($.isArray(arguments[bump])) {
					$.each(arguments[bump], function(key, value) {
						sql += value + ', ';
					});
					// Remove last comma
					sql = sql.slice(0, -2) + ' ';
					// Increment
					bump++;
				} else {
					sql += '* ';
					
				}
				// FROM
				sql += ' FROM ' + tableName;
				// WHERE
				if ($.isPlainObject(arguments[bump])) {
					sql += ' WHERE ';
					$.each(arguments[bump], function(key, value) {
						sql += key + '=' + "'" + value.replace(/'/g,"''") + "' AND "
					});
					// Remove last and
					sql = sql.slice(0, -4);
					// Increment
					bump++;
				}
				// ORDER BY
				if ($.isArray(arguments[bump])) {
					sql += ' ORDER BY ';
					$.each(arguments[bump], function(key, value) {
						sql += value + ', ';
					});
					// Remove last comma
					sql = sql.slice(0, -2) + ' ';
					// Increment
					bump++;
				}
				// Responders
				var s = me.onSuccess;
				var e = me.onError;
				if ($.isFunction(arguments[bump])) {
					// Set success
					s = arguments[bump];
					// Increment
					bump++;
					// Look for fail handler
					if ($.isFunction(arguments[bump])) {
						// Set error handler
						e = arguments[bump];
						// bump
						bump++;
					}
				}
				// Execute
				this.lastCommand = sql;
				db.execute(sql, function(tx, e) { this.lastTransaction = tx; this.lastResponse = e; s(e.rows); }, e);
			}
			// Return self
			return me;
		};
		this.insert = function(obj, suc, err) {
			var col;
			var sql = 'REPLACE INTO ' + tableName + ' (';
			// Add fields
			$.each(obj, function(key, value) {
				sql += key + ', ';
			});
			// Remove last comma
			sql = sql.slice(0, -2);
			// add values
			sql += ') VALUES (';
			$.each(obj, function(key, value) {
				sql += "'" + value.replace(/'/g,"''") + "', ";
			});
			// Remove last comma
			sql = sql.slice(0, -2);
			sql += ')';
			// Responders
			var s = me.onSuccess;
			var e = me.onError;
			if ($.isFunction(suc)) s = suc;
			if ($.isFunction(err)) e = err;
			// Execute
			this.lastCommand = sql;
			db.execute(sql, s, e);
			// Return self
			return me;
		};
		this.drop = function(suc, err) {
			// Responders
			var s = me.onSuccess;
			var e = me.onError;
			if ($.isFunction(suc)) s = suc;
			if ($.isFunction(err)) e = err;
			// Execute
			this.lastCommand = sql;
			db.execute('DROP TABLE ' + tableName, s, e);
			// Return self
			return me;
		}
		this.exists = function(does_exist, not_exists) {
			// Responders
			var yep = function() { if ($.isFunction(does_exist)) does_exist(me); };
			var nope = function() {	if ($.isFunction(not_exists)) not_exists(me); };
			// Check exists
			db.execute("SELECT tbl_name FROM sqlite_master WHERE tbl_name='" + tableName.replace(/'/g,"''") + "'", function(tx, e) { if (e.rows.length > 0) { yep(); } else { nope(); } }, nope);
			// Return self
			return me;
		}
		// Event handlers
		this.success = function(f) { me.onSuccess = f; return me; };
		this.error = function(f) { me.onError = f; return me; };
		this.received = function(f) { me.onRowsReceived = f; return me; };
	};
	// return me
	return this;
}

})(jQuery);

 

And this is how you'd use it...

 

// Create Database Connection
var db = $.Database('testing');
// Initialize Table Object
var settings = db.openTable('app_settings');
// Create Table
settings.create({ name:'ID', type:'INTEGER', ext:'PRIMARY KEY' }, { name:'Name', type:'TEXT', ext:'UNIQUE' }, { name:'Value', type:'TEXT' });
// Insert a value
settings.insert({ Name:'Today', Value:'Tuesday' });
// Get the result
settings.get({ Name:'Today' }, function(rows) { alert('returned ' + rows.length + ' rows'); });

// You can also chain it and put inline handlers
var settings2 = db.openTable('app_settings').insert({ Name:'Foo2', Value:'Bar' }).get({ Name:'Foo2' }, function(rows) { alert(rows.item(0)['Value']); });

 

I'm going to continue to iterate it and make it a little prettier and more flexible, but it's a pretty good start. Hope you like it!

Cross-Domain Ajax Posting

At MileSplit, we use a lot of different domains.  Each subdomain means something to us and this structure is an integral part of the application.  Further with our new API, we want to direct everything to the API's own domain.  Browser security rules often get in the way though when you try do AJAX calls from one domain to another. We can easily get around this with GET requests with JSON-P techniques, which technically sidestep AJAX altogether.  However, we run into a wall when it comes to POSTs.  What to do??  You can submit the data to a hidden IFRAME and then cross your fingers that it went through, but you have no actual confirmation that it was successful since you can't read or receive any data from it. I have read several of the work arounds.  One involves listening for the load event on the iframe and then doing another call to the server via the JSON-P technique to get the data.  That's not a horrible solution, but it's an extra ste more...

jQuery Ad Frame Plugin

#jqueryblogpost code { display: block; border: solid 1px #cdcdcd; padding: 10px; } We are in the middle of a huge upgrade project over at MileSplit:  rewriting significant portions of code, some redesign, rolling out some new features and more...

On MileSplit we make extensive use of subdomains.  They regionalize our content or otherwise determine exactly what we're after.  Additionally, we are starting to make more extensive use of caching servers to store our static content--including javascript files.  These also have their own domains.

Cross domain AJAX calls were becoming a real problem for me.  I thought about doing some overly complex stuff like dynamically creating hidden iframes with sub-iframes, but I with my current major site overhaul and redesign in progress I am trying to focus on cleaner code.  So I started looking at JSON-P methods.

Basically every tutorial you find online about this just says use jQuery.  While jQuery is well and good, it's not really my coding style.  For now I am using Prototype, but I have the desire to get away from any of these frameworks and just utilize my own custom framework.  These Javascript frameworks get way too bloated and I never use 75% of their features, so why have to load them?  Anyway... back on subject...

I already had an API created for MileSplit and it had the option to return data in either XML or JSON.  So I needed to modify that API to have the option to return that JSON wrapped in a callback function.  This was incredibly easy to do. 

 My original PHP code looked something like this:

$json = '{';
$json .= '"version": "'.$version.'",';
$json .= '"request": "'.$call_id.'",';
$json .= '"code": "'.$code.'",';
$json .= '"message": "'.$msg.'",';
$json .= '"content": '.$content;
$json .= '}';
echo $json;

 Alll I need to do was edit the last line and change it to this...

if ($callback) echo "$callback($json)";
else echo $json;

Note:  I'm not showing you above any of the rest of the code that sets those variables, but they'd be set however your API is structured. The only thing worth mentioning is that $callback is set from $_GET['callback'] passed as a URL parameter.

Back on the javascript side, I just created a very quick and easy object called Request.  We create a function to receive the response.  And then we instantiate a new instance of that Response object, telling it the URL of the API method we want to call and passing (as text) the name of the function that will receive the answer.  So easy!

Here's the javascript...

var Request = function(url, callback) {
    var s = document.createElement("script");
    s.src = url + '&callback=' + callback;
    document.getElementsByTagName('head')[0].appendChild(s);
};
function received_json(response) {
    alert(response.message);
}
new Request('http://api.mydomain.com/get/object?format=json', 'received_json');

Of course, I guess technically this is no longer AJAX since its not using XMLHttpRequest, but who cares... it works.  Tested in Firefox 3.6, Chrome 4.1, and Internet Explorer 8.

 I hope this helps someone out.

With a sample size of 1,724,087 visits and over 17,428,039 page views over the last 30 days we have a large enough base for these numbers to actually mean something.  While we perhaps have a slightly larger concentration of the younger demographic than most sites, this should be a pretty good idea of the picture of the larger internet in general.  About 97% of our traffic is US based (with the Canadian traffic starting to build slightly), so this is not an international picture but rather a domestic one.

Mobile Traffic

I will start here because it's such a growing movement that will be more and more relevant.  It is an area we are going to be putting increasing resources into.

Mobile traffic represented 5.44% of our overall traffic.  What is more impressive is that this is a month-over-month increase of over 0.7% (4.71%) and a whopping 2.5% increase from just six months ago.  Clearly the mobile revolution is in full effect and it is rapidly changing the internet culture.

Of our 94,777 visits from mobile devices, here is the breakdown by device.  In parenthesis is where the device was 6 months ago.

  • iPhone 35% (48%)
  • iPod 30% (33%)
  • Android 16% (4.0%)
  • Blackberry 9.5% (5.7%)
  • Sidekick 2.3% (2.8%)
  • Windows 2.15% (2.4%)
  • Samsung 1.71% (1.4%)
  • iPad 1.02% (n/a)
  • LG 0.9% (0.98%)
  • Palm 0.5% (0.93%)
  • Symbian 0.3% (0.18%)
  • Moto 0.15% (0.17%)
  • Sony 0.04% (0.17%)
  • Nokia 0.03%

Obviously this shows that Apple has a huge lead over the competition having a collective percentage of over 66% of the mobile traffic.  However, another interesting tidbit is to look at 6 months ago to see where the momentum lies.  While overall its number of visits have definitely gone up tremendously, Apple's share of the pie has decreased 15% in that time period.  Most of that share has gone directly to Android devices, which increased 12% during the period (that is a 4x increase!!!).  Blackberry also saw a healthy 4% gain.  Most of the other devices stayed mostly the same with Windows and Sidekick (both Microsoft owned) seeing some loss and Samsung building support with its newer devices. 

But poor Palm.  Everyone is talking about its unavoidable demise and these numbers show that Palm's share was about cut in half.  What is likely is that the actual number of Palm users stayed the same during that time... just everyone else added and no one else bought a Palm.  I'm not trying to hate... just saying.

One other interesting point that I noticed.  While our average user on a computer will look at over 10 pages per visit on average, the average mobile user is looking at an average of 4.66 pages/visit.  Perhaps that's our failure to create a more mobile friendly interface or perhaps it's the nature of mobile browsing that it's quick and to the point instead of a longer engagement when you're in front of a larger device.  If you're wondering... iPad users did not view much more despite having a larger screen size... 5.12 pages/visit on average for those users.  Here's the break down by device, which seems to near directly correlate with the richness of the expience on each device...

  • iPod 5.44
  • iPad 5.12
  • Android 5.07
  • iPhone 4.67
  • Windows 4.36
  • Samsung 3.79
  • LG 3.64
  • Palm 3.61
  • Blackberry 2.81

Browsers

Because our God is a good and merciful god and loves web developers, Internet Explorer usage continues to decline. It's more of a slow bleeding out, but I'm thankful for it and hope it will accelerate or at least IE9 will come out and kill off remaining IE6 users.  Unfortunately, IE is still holding on to the majority, but barely.

  • Internet Explorer 57.9%  (was 61.6% six months ago)
    Breakdown: IE8 54.4%, IE7 34.7%, IE6 11.9% (please upgrade!!!!!!!!!!!!!!), IE5.5  0.02% (think this is my grandma)
  • Firefox 17.75%  (was 19.2% six months ago)
    Breakdown: 3.6 52%, 3.5 29%, 3.0 14%, 2.0 1.5% (thankfully most stay on top of upgrades)
  • Safari 16.8% (was 15.44% six months ago.... about 5% of this is iPhone/iPod/iPad and Android who all use Mobile Safari)
  • Chrome 5% (was 2.34% six months ago)
  • Various Mozilla 0.65%  (was 0.37% six months ago)
  • Opera 0.26% (was 0.20% six months ago... not sure why I always bother testing for Opera compatability of my site with it being this low, but I always do!)
  • Netscape 0.24% (was 0.03% six months ago what's up with that increase?!? and wow! almost as many netscape users as Opera users! nuts!! the nostalgia!!!)
    Breakdown:  4.0 95% (old school for real!), the other 5% are version 7 or 8.... no early 90s throw backs apparently!  I started with Netscape version 1.0!!!!!!!!  Before upgrading to MSIE version 2.0 yeah baby!

 

Operating Systems

Always fun to keep an eye on this...

  • Windows 79% 
    Breakdown:  XP 57.7%, Vista 29.8%, Windows 7 11.4%, Win2000 0.42%, Server 2003 0.42%, CE 0.13%, Win98 0.09%, ME 0.01%, and a few still ringing in with NT or even Windows 95!!  Kickin' it old school!
  • Mac 14.25%
    92% are on the Intel platform and only 8% still on the PowerPC platform
  • iPhone/iPod/iPad 3.61%
  • Android 0.87%
  • Blackberry 0.52%
  • Linux 0.32%
  • various others left off because they are other mobile devices already covered
  • Playstation-3 0.08%
  • Playstation Portable 0.04%
  • Nintendo Wii 0.02%
  • Other interesting ones:  AIX, SunOS, FreeBSD, SoftBank all with less 0.01% combined but still a few users!

 Screen Resolution

It's always nice to get justification for no longer supporting 800x600 or smaller screens!  Here is a look at the continued adoption of larger screens...

  • 1024x768 - 26.75% (so we have to stick with 960 pixels wide for now!)
  • 1280x800 - 21.8%
  • 1280x1024 - 9.0%
  • 1440x900 - 8.4%
  • 1366x768 - 5.7%
  • 1680x1060 - 4.6%  (congrats on your 20" monitor!)
  • 320x396 - 3.55% (hello iPhone/iPod users!)
  • 1152x864 - 2.0%
  • 800x600 - 1.94% (sorry... horizontal scroll bars for you!)
  • 1600x900 - 1.4%
  • 1920x1200 - 1.4% (congrats on your 23" monitor!)
  • 1280x768 - 1.39%
  • 1920x1080 - 1.26%
  • 1024x600 - 0.9%
  • 1280x960 - 0.77%
  • ... plenty of other variations

Features

Various stats...

  • Flash - 90+% of users could run Flash.... 5.4% of those who can't remember are mobile users.  So 95% of computer users can view Flash.  The vast majority are on version 10.
  • Java - 86.5% of users have Java installed.. again 5.4% of those who don't are mobile users... so over 90% have it on computers.
  • Dial-Up - Only 1.3% are still using dial-up connections.  So broadband of at least lower levels is pretty much a given.

 

 

Hope you enjoyede the numbers.  Helps to check in with these now and again to see where we stand and who we should develop for.

 

I am currently in the midst of a major redesign, so I am using this opportunity create it using the most semantic and standards based code as feasibly possible.  One of the things I wanted to do is layout some of my content into columns.  Now I don't mean having content on one side and a sidebar on the other.  I mean columns as in auto-following content into multiple columns.

This is an extremely common practice in print.  While I think that it is extremely important to implement on the web (and very very long over-due in the CSS standards), I believe its uses are very limited.  For any long content article the web is very different from print.  I don't think that it works for lots of paragraph content because that requires the user to scroll up and down to read it... which is more problemsome in my opinion than just reading long horizontal lines of text.

Anyway... but in certain uses I think it's very useful.  I have found one of those and wanted to use it.  I am listing out links to our latest content.  For same of example, let's just say it's the latest news.  So I want to list out the headlines and would like to do so in an un-ordered list (<ul> element) which each headline as a line item (<li> element).  It is very good visually to break these into multiple columns.  It makes efficient use of the space and is easy to read.

What a perfect opportunity to use this new format!  (or so I thought)  I realize that it is not supported in Internet Explorer (which still makes up 57% of our users), but you know what.... I kinda felt like punishing them for not "upgrading" anyway!

It worked pretty well and was pretty easy.  Here is a code example with inline styles just to be compact...

<ul style="width: 300px; margin: 0; padding: 0; list-style: none; column-count: 2; -moz-column-count: 2; -webkit-column-count: 2">
<li>Headline 1: Vel delenit vulputate iustum luctus</li>
<li>Headline 2: suscipit esse quia iriure. Ulciscor elit macto gilvus ludus augue neo odio capto conventio gemino, sed.</li>
<li>Headline 3: Accumsan tation haero o</li>
<li>Headline 4: hendrerit, commodo tum nimis, luptatum eros jugis</li>
<li>Headline 5: molior metuo</li>
<li>Headline 6: aletudo consectetuer immitto consequat utrum</li>
</ul>

Which looks like this...

 

  • Headline 1: Vel delenit vulputate iustum luctus
  • Headline 2: suscipit esse quia iriure. Ulciscor elit macto gilvus ludus augue neo odio capto conventio gemino, sed.
  • Headline 3: Accumsan tation haero o
  • Headline 4: hendrerit, commodo tum nimis, luptatum eros jugis
  • Headline 5: molior metuo
  • Headline 6: aletudo consectetuer immitto consequat utrum

 

It's a little hard to see for lack of other styling, but notice the problem is that the third line item breaks half in one column and half in the other.  It really is confusing to the user and just looks bad for that to happen.  And it can look much worse in other examples.

What we really need to make this scenario at all feasible is to control column breaking.  I first thought that if I just tried to enforce the "display: block"-edness of my list items it would work, but no dice.  And that makes sense because paragraphs are block elements too and the whole point is to break them.

After much searching and asking other developers, I found that the standard that should solve this is the CSS property column-break-inside.  Setting the value of this property to "avoid" on the list items should solve it.  Unfortunately, it is not supported in any major browser.  Safari/Chrome/Webkit claims to support it in their documentation with their proprietary property -webkit-column-break-inside... however it does absolutely nothing.

It seems at this point there is no solution to do this for real.  The best I came up with is this (again inline styles for compactness)...

<div style="display: table">
<ul style="display: table-cell; width: 150px; list-style: none; margin: 0; padding: 0">
<li>Headline 1: Vel delenit vulputate iustum luctus</li>
<li>Headline 2: suscipit esse quia iriure. Ulciscor elit macto gilvus ludus augue neo odio capto conventio gemino, sed.</li>
<li>Headline 3: Accumsan tation haero o</li>
</ul>
<ul style="display: table-cell; width: 150px; list-style: none; margin: 0; padding: 0">
<li>Headline 4: hendrerit, commodo tum nimis, luptatum eros jugis</li>
<li>Headline 5: molior metuo</li>
<li>Headline 6: aletudo consectetuer immitto consequat utrum</li>
</ul>
</div>

So essentially it's implementing tables without being quite so evil.  The positive is that it is decent looking code and it is supported by all major browsers (including Internet Explorer) unlike the other solution would have been at this point.  The big disadvantage is it is not true flowing and you have to hard break it in the code, potentially creating uneven columns and losing some of the clean code advantages of it in the first place.  Oh well.  For now it works and it's not bad.

 

  • Headline 1: Vel delenit vulputate iustum luctus
  • Headline 2: suscipit esse quia iriure. Ulciscor elit macto gilvus ludus augue neo odio capto conventio gemino, sed.
  • Headline 3: Accumsan tation haero o
  • Headline 4: hendrerit, commodo tum nimis, luptatum eros jugis
  • Headline 5: molior metuo
  • Headline 6: aletudo consectetuer immitto consequat utrum

 

I ran into (yet another) ridiculous Internet Explorer bug tonight that made me pull my hair out for about two hours.  I created a javascript object that powers an article scroller... you know one of those things that rotates through top stories on a news site with thumbnails and the larger cover story photo.

Anyway... so I had it working great in Firefox and Chrome.  Then I started Internet Explorer.  I noticed that it was updating the image fine, but the headline and summary of the cover story was not changing with it.  What the heck??  So I made a few code tweaks and console.log call-outs and fired up Firebug Lite.  It worked!  So I took out my logging and then it didn't work.  What in the world?!?

So after going round and round for a while, I noticed something.... I could run the page and it doesn't work.... without reloading the page, start Firebug then it works.  So then I scrapped the console.log calls and just did an alert call.... WORKS.  So I thought okay it must have to do with writing the variable to a string so I tried temporary variables doing nothing but holding the string... tried toString().   Nothing worked.

After doing a little Googling I found some others with similar (but a little different) issues.  It seemed that it was just that (for some reason) Internet Explorer was not redrawing the div even when you set new content to it.  Throwing an alert window and apparently console.log with Firebug Lite made it redraw.  Obviously don't want to do that in production though!

So the final solution was simple.... hide the div (set style.display to none), then change the content, and then show the div again.

I really hate Internet Explorer.

Sphinx is My Anti-SQL

With all of this talk about NoSQL (which while I think it definitely has its use cases, but believe is quite overblown), I thought I'd mention my own recent experience with overcoming weaknesses of the traditional relational database. SPHINX!

Sphinx is dubbed as a full-text search application. It of course does that very very well, but it is so much more than that!

The full-text search first drew me to it. We had gotten sufficiently large where MyISAM just was not cutting it for us anymore. As our traffic and table-size expanded the table locks were becoming more and more of a problem and the site had began to crash on a regular basis. I knew that I needed to switch to InnoDB (which doesn't support full-text search), so I had to find an alternative quick because several of our tables needed this feature. Sphinx to the rescue! Worked great.

It didn't take me long after implementing it to realize that it could be leveraged to be so much more than just full-text search. It can be used to offload a significant portion of your intensive or frequent queries (with the help of some from memcache).

With advanced filters, sorting, group bys, and (of course) text search you can fashion a Sphinx query to do incredible things that would grind MySQL to a halt—and they're all very fast!

On one particular table I had gone to extensive denomalization of the data (adding way more columns and redundant data than I'd like), about eight or nine indexes, and I was still having issues with performance. I am migrating all of my queries on this table (slowly) to Sphinx and will be able to probably cut a third of the table size and eventually maybe have it down to one or two indexes. With the strategic use of keyword text combined with intelligent filters you essentially have an index on every possible combination without all of the overhead. And with a single Sphinx query you can do things that would take multiple queries or subqueries in MySQL with calculations like summing, grouping, and sorting. No more filesorts or temporary tables!

I won't give out all of my secrets, but thing outside of the box a little bit (not just about your full-text search problems) and I'll be you can solve at least a few of your headaches with Sphinx.

This article is a sort of series I'm writing with my experiences of learning the EC2 cloud while learning Sphinx search.  The more I learn about Sphinx and use it, the more impressed I am.  And also the more possibilities that I see.  I believe this is going to solve A LOT of problems for us.

Initially the reason for going to Sphinx was because MyISAM tables were not doing it for us anymore and we need to convert to InnoDB, which of course does not support fulltext searches.  Being pretty tapped out on our single server as its 12GB of RAM and already planning for a future migration to the Amazon Cloud anyway... I decided to launch this on EC2 as my first step in that direction.

I have been pouring over the Sphinx documentation, articles, and forum posts over the last couple of days.  And now using it.  I am amazed.  And while laying in bed after getting it running last night, I gained some additional inspiration of how to leverage it in some non-traditional, non-fulltext ways.  But I'll leave that for another day...

So the first thing to do is copy the distribution sample configuration file.

cp /usr/local/etc/sphinx-min.conf /usr/local/etc/sphinx.conf

And edit it...

nano /usr/local/etc/sphinx.conf

Below is a basic sample subset of what I am using...

source articles
{
        type                             = mysql
        sql_host                     = YOUR_IP
        sql_user                     = YOUR_MYSQL_USER
        sql_pass                    = YOUR_MYSQL_PASSWORD
        sql_db                         = YOUR_MYSQL_DATABASE
        sql_port                       = 3306  # optional, default is 3306
        # Here is the actual query from the database, the ID has to be first
        # The $start and $end variables come from sql_query_range and sql_range_step below
        sql_query                    = SELECT id, title, body, type_id, date_published FROM articles
                                                  WHERE id BETWEEN $start AND $end
        # Attributes.... these will not be in the fulltext index, but can be filtered on
        sql_attr_uint                   = type_id
        sql_attr_timestamp              = date_published
        # This will allow us to only index 1000 rows at a time and
        #  and it will wait two seconds between each query
        sql_query_range                 = SELECT MIN(id), MAX(id) FROM articles
        sql_range_step                  = 1000
        sql_ranged_throttle          = 2000
}
index articles_index
{
        source                                  = articles
       # storing the source on the EBS volume... IMPORTANT!
        path                                    = /ebs/data/articles 
        docinfo                                 = extern
        charset_type                            = sbcs
       min_stemming_len                        = 4
       min_word_length                         = 2
       html_strip                              = 1
        enable_star                             = 1
}

indexer
{
       # Default is 32M, you may want to increase more depending on your dataset
        mem_limit           = 128M   
}

searchd
{
        # Note this is the new default port for Sphinx
        listen                                  = 9312   
       # Log and query log on EBS volume also
        log                                     = /ebs/log/searchd.log 
        query_log                               = /ebs/log/query.log
        read_timeout                            = 5
        max_children                            = 30
        pid_file                                = /ebs/log/searchd.pid
        max_matches                             = 1000
        seamless_rotate                         = 1
        preopen_indexes                         = 0
        unlink_old                              = 1
}


Now it's time to actually index for the first time.  This is the amazing part because it is so very very fast!  And because of the steps of 1000 and throttling that we put in above, we can run it on the live database and it shouldn't have a huge impact on performance.  To manually run it at the command line do this...

/usr/local/bin/indexer articles_index

Of course after you really get this all configured, you'll want to add this to a cron job.  And there are all kinds of more complexities to think of like how often you want it to re-index and if you need to set up things like delta tables so that it only queries and re-indexes a smaller set of data.

One very important thing to keep in mind when setting these up and making your plans... whenever indexer runs it is RE-INDEXING the entire dataset.  It is NOT just appending new data.  You do have the ability to merge datasets and search multiple indexes at once.  So the most common solution for large datasets that are frequently updated would be to have a main index with archived data and then another index that will run frequently maybe on just that day's data.  And then once a day a cron job can merge the two indexes... and in the mean time your queries will just search both.

So now that we've done the first indexing, let's search for something at the command line...

/usr/local/bin/search obama

So if you run that query, you should see all of the articles that referenced obama.  Notice that it will return the primary key... not the entire row data.  So on your application side you will need to take those primary keys and look up the full data you need (a very fast query).  Otherwise, you could store more meta data like the article title in another column as an attribute.

Finally, we need to implement this on our application.  My language of choice is PHP.  I had all sorts of problems with "autoconf" trying to compile the Sphinx code into PHP until I finally gave up on the chore after an hour and a half of getting no where but frustrated.  I found it's not even necessary though for most people.  You can just include the API class.

You can download sphinxapi.php here from Google Code.

Here is some very basic PHP code to get and process the results.  It has absolutely no debugging in it, so I'll leave that to you to figure out the rest.  It will search for whatever is in the ?q= argument to the URL.  Be sure to do a print_r of $results to see what is inside of your returned array.

include(SPHINX_API_PATH);
$sc = new SphinxClient();
$sc->SetServer(SPHINX_IP, 9312);
$sc->SetArrayResult(true);
$results = $sc->Query($_GET['q'], 'articles_index');
if ($results) {
    // Process results
    if (is_array($results) && count($results) > 0) {
        foreach ($results['matches'] as $r) {
            $ids[] = $r['id'];
        }
    }
}

Well that's all for now.  I will post more as I progress and optimize it more. Hopefully this helps someone, as I didn't find this information consolidated in one place as I was setting this up.  Enjoy! 

Previous Posts in Series:

Day 2: Installing Sphinx on EC2 and EBS

Day 1: Launching an AMI and Mounting an EBS Volume

Learn more about Sphinx from the source:

Sphinx Fulltext Search Homepage