Full Text Search (Part 3)

Yesterday, I covered some of the more interesting parts of a prototype implementation of Full-Text Search (FTS) on the iPhone; today I want to start to address one of the performance problems in that prototype. To begin with, it takes a ridiculously long time to execute a search on the index. This turns out to have a simple remedy.

Debugging

It’s pretty clear that our performance problems involve Core Data, so the first order of business is to get some insight into what Core Data is doing behind the scenes. Happily, there’s built-in support for this. In Xcode, go to the “Executables” group, right-click your application’s executable, and select “Get Info”. Then add the following argument to the list of “Arguments to be passed on launch”:

-com.apple.CoreData.SQLDebug 1

This will dump all sorts of useful information to stderr (viewable in the GDB console).

When I run the FTS app on my device and perform an FTS search (for “destiny”, which returns 3 hits), I see a lot of data dumped to the console, but the last 6 entries seem to be the most relevant:

2010-02-05 07:54:33.929 FTS[26314:207] CoreData: sql: SELECT DISTINCT 0, t0.Z_PK FROM ZTEXT t0 LEFT OUTER JOIN Z_1HITS t1 ON t0.Z_PK = t1.Z_2HITS LEFT OUTER JOIN ZKEYWORD t2 ON t1.Z_1WORDS = t2.Z_PK WHERE  t2.ZKEYWORD IN  (?)  ORDER BY t0.ZTEXT COLLATE NSCollateLocaleSensitive 
2010-02-05 07:54:39.873 FTS[26314:207] CoreData: annotation: sql connection fetch time: 5.9427s
2010-02-05 07:54:39.885 FTS[26314:207] CoreData: annotation: total fetch execution time: 5.9557s for 3 rows.
2010-02-05 07:54:39.982 FTS[26314:207] CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZTEXT, t0.ZSORTPREFIX FROM ZTEXT t0 WHERE  t0.Z_PK IN  (?,?,?)  ORDER BY t0.ZTEXT COLLATE NSCollateLocaleSensitive  LIMIT 20
2010-02-05 07:54:40.011 FTS[26314:207] CoreData: annotation: sql connection fetch time: 0.0224s
2010-02-05 07:54:40.023 FTS[26314:207] CoreData: annotation: total fetch execution time: 0.0411s for 3 rows.

It appears that the first query retrieves the IDs of the relevant Text objects, and that the second retrieves the data associated with those IDs. The first query is taking up all the time. Let’s look at it more closely.

SQL Tuning

If we format the problematic query for readability, it looks like this:

Select Distinct 0, t0.Z_PK
From ZTEXT t0
Left Outer Join Z_1HITS t1
On t0.Z_PK = t1.Z_2HITS
Left Outer Join ZKEYWORD t2
On t1.Z_1WORDS = t2.Z_PK
Where  t2.ZKEYWORD IN  (?)
Order By t0.ZTEXT Collate NSCollateLocaleSensitive

Put colloquially, this query takes every row in the ZTEXT table (which holds the text blocks themselves) and joins each of them to all the rows of ZKEYWORD (via the many-to-many mapping table Z_1HITS) which represent any of their keywords, and then filters out the result rows which don’t have the proper keywords.

Now, it’s a bit of an open question how the SQLite DB will actually process this query. There are (as we’ll soon see) several ways to execute it, and some are faster than others. For instance, if the DB executes it along the naive lines of the colloquial description given above, it will create and scan an intermediate table equal in size to the Z_1HITS mapping table. (That’s 10,299 rows in my prototype.)

We could re-write the preceding query to suggest a more efficient implementation; if we begin with the keyword rows we’re interested in, we wouldn’t need to discard any rows from (or scan!) an intermediate table. Something like this would work:

Select Distinct 0, t2.Z_PK
From ZKEYWORD t0
Join Z_1HITS t1
On t1.Z_1WORDS = t0.Z_PK
Join ZTEXT t2
On t2.Z_PK = t1.Z_2HITS
Where  t0.ZKEYWORD IN  (?)
Order By t2.ZTEXT Collate NSCollateLocaleSensitive

The question is, does the second version run faster? Is SQLite smart enough to optimize the first query into the second?

We can check by using the command-line SQLite client. To keep things simple, I installed my app in the simulator, and ran my tests on my desktop. This means that absolute timings won’t be comparable to those we saw earlier, but we’re only interested in the relative performance of the two queries at this point. To get started, I fired up a console, executed a cd to ~/Library, and ran this command:

find . -name "FTS.sqlite" -exec sqlite3 {} \;

At the sqlite3 prompt, I enabled timing (with “.timer ON“), and ran my queries. (I replaced the “?” with the string “‘destiny'”, and omitted the Collation clause.) The Core Data query ran in 0.191927s, and my re-written (but equivalent) query ran in 0.003253s. This is a factor of 60x, which tells us that:

  • SQLite is not clever enough to optimize the Core Data query properly
  • What we’re doing is not inherently slow; it’s just a matter of “tricking” Core Data/SQLite into running the “right” query

NSPredicate

One of the facts of life when working with Core Data is that one cannot write one’s own SQL, or (easily) access the underlying database directly. Therefore, persuading Core Data to issue a query that SQLite can process more efficiently is going to require a bit of luck.

The predicate which causes all this trouble is created with this code:

[NSPredicate predicateWithFormat:@"ANY words.keyword IN %@",self.keywords]

The first thing that comes to mind is that a lot of the query’s complexity comes from the 3-table join, required because we’re filtering on keyword strings, and not on Keyword objects. As it happens, it’s easy to change to the latter approach, as I can arrange for the self.keywords iterable to contain Keyword objects instead of strings. If we run with this predicate:

[NSPredicate predicateWithFormat:@"ANY words IN %@",self.keywords]

and re-time on the device, we see the following:

2010-02-05 08:49:22.480 FTS[26384:207] CoreData: sql: SELECT DISTINCT 0, t0.Z_PK FROM ZTEXT t0 LEFT OUTER JOIN Z_1HITS t1 ON t0.Z_PK = t1.Z_2HITS WHERE  t1.Z_1WORDS IN  (?)  ORDER BY t0.ZTEXT COLLATE NSCollateLocaleSensitive 
2010-02-05 08:49:22.606 FTS[26384:207] CoreData: annotation: sql connection fetch time: 0.1247s
2010-02-05 08:49:22.611 FTS[26384:207] CoreData: annotation: total fetch execution time: 0.1317s for 3 rows.
2010-02-05 08:49:22.698 FTS[26384:207] CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZTEXT, t0.ZSORTPREFIX FROM ZTEXT t0 WHERE  t0.Z_PK IN  (?,?,?)  ORDER BY t0.ZTEXT COLLATE NSCollateLocaleSensitive  LIMIT 20
2010-02-05 08:49:22.728 FTS[26384:207] CoreData: annotation: sql connection fetch time: 0.0228s
2010-02-05 08:49:22.742 FTS[26384:207] CoreData: annotation: total fetch execution time: 0.0441s for 3 rows.

The first query now takes 0.1317s, down from 5.9557s: This is a 45-fold improvement, and respectable by any measure. Our total query time is also under .2s; I’d be happier with something in the .05s range, but this is definitely usable.

Addenda

Actually, it seems that the first few queries run slower than normal: If we re-time after using the app for a bit, we can get better numbers:

2010-02-05 09:31:14.136 FTS[26459:207] CoreData: sql: SELECT DISTINCT 0, t0.Z_PK FROM ZTEXT t0 LEFT OUTER JOIN Z_1HITS t1 ON t0.Z_PK = t1.Z_2HITS WHERE  t1.Z_1WORDS IN  (?)  ORDER BY t0.ZTEXT COLLATE NSCollateLocaleSensitive 
2010-02-05 09:31:14.177 FTS[26459:207] CoreData: annotation: sql connection fetch time: 0.0398s
2010-02-05 09:31:14.186 FTS[26459:207] CoreData: annotation: total fetch execution time: 0.0497s for 3 rows.
2010-02-05 09:31:14.270 FTS[26459:207] CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, t0.ZTEXT, t0.ZSORTPREFIX FROM ZTEXT t0 WHERE  t0.Z_PK IN  (?,?,?)  ORDER BY t0.ZTEXT COLLATE NSCollateLocaleSensitive  LIMIT 20
2010-02-05 09:31:14.292 FTS[26459:207] CoreData: annotation: sql connection fetch time: 0.0170s
2010-02-05 09:31:14.298 FTS[26459:207] CoreData: annotation: total fetch execution time: 0.0276s for 3 rows.

The 0.0497 seen here is a 120x improvement over our starting point, and the total query time is under .08s. I’m prepared to call that good.

And, if you’re wondering, the original numbers don’t meaningfully change over repeated trials.

Share and Enjoy:
  • Twitter
  • Facebook
  • Digg
  • Reddit
  • HackerNews
  • del.icio.us
  • Google Bookmarks
  • Slashdot
This entry was posted in iPhone. Bookmark the permalink.

Comments are closed.