Recently I spent a lot of time messing about with indexes. Not because I like it - but because there was a little bit of a fuckup at work with scalability issues on a DB. After messing around with SQLProfiler (real life-saver in this case) I gained a bit more of understanding about indexes as I never got the chance to do before.
What I learned about indexes and scalability can be summarized in 2 points:
1) You don't wanna have a clustered index on the primary key on a table once it gets huge (just make it nonclustered) if you plan on running loads of inserts
This is not a problem when the table is small - and since it's good for fast selects everyone's cool with it. Once the table gets populated with LOADS of records or it gets HUGE in size this WILL KILL YOU on any insert or update operation.
Just to give a few figures:
- A clustered index on a table with 45 million records was killing my SQL server - every insert from code was resulting in nasty timeouts (it was taking more than the default 30 secs for standard inserts - Increasing your timeout tolerance won't solve your scalability problems!)
- A clustered index on a HUGE table (15 gigs) with a few million records was slowing down inserts to the point it was taking 15 times as much as it was taking on an empty db (we're talking mSecs but on big batches it's gonna kill you)
2) You wanna have a non-clustered index for frequently selected fields on any table you plan on running extensive selects on (add the fields you select often to a single index)
- On a medium size table (40k records) select time on a given set of fields was doubling in time compared to the same table with a lot less data before adding a triple-index on the set of given fields.
Following the points above I was able to solve my scalability issues (the process I was running is now taking the same time on a huge db compared to an empty one). Obviosuly these rules are not applicable in every case - but understanding the points above might help people when 'scalability' hits the fan.