Today I’m going to post minor revisions to some earlier Full-Text Search (FTS) code. These changes address index construction.
Text Objects
My revisions affect my Text
class, which is the managed object subclass I use to model searchable text elements.
Here is my Text.h
file:
#import <CoreData/CoreData.h>
@class Keyword;
@interface Text : NSManagedObject
{
BOOL computedKeywords;
}
@property (nonatomic, retain) NSString* text;
@property (nonatomic, retain) NSSet* words;
@end
@interface Text (CoreDataGeneratedAccessors)
- (void)addWordsObject:(Keyword*)value;
- (void)removeWordsObject:(Keyword*)value;
- (void)addWords:(NSSet*)value;
- (void)removeWords:(NSSet*)value;
@end
and here is my Text.m
file:
#import "Text.h"
#import "Keyword.h"
#import "Tokenizer.h"
@interface Text (OtherAccessors)
@property (nonatomic, retain) NSString* sortPrefix;
- (void)setPrimitiveText:(NSString*)newText;
@end
@implementation Text
@dynamic text;
@dynamic words;
- (void)setText:(NSString*)newText
{
[self willChangeValueForKey:@"text"];
[self setPrimitiveText:newText];
[self didChangeValueForKey:@"text"];
NSUInteger l = [newText length];
self.sortPrefix = [newText substringToIndex:MIN(l, 20)];
}
- (void)willSave
{
// Superclass
[super willSave];
// Omit keyword processing for deleted objects
if (self.isDeleted || computedKeywords) return;
// Process this object for keywords
NSMutableSet* tokens = [[Tokenizer sharedTokenizer] tokenize:self.text];
// Create and perform a request for existing keyword records for these tokens
NSError* err;
NSFetchRequest* request = [[NSFetchRequest alloc] init];
request.entity = [NSEntityDescription entityForName:@"Keyword" inManagedObjectContext:self.managedObjectContext];
request.predicate = [NSPredicate predicateWithFormat:@"keyword IN %@",tokens];
NSMutableSet* keywords = [NSMutableSet setWithArray:[self.managedObjectContext executeFetchRequest:request error:&err]];
[request release];
// Find nonexistent keywords (by removing existing keywords from tokens) ...
[tokens minusSet:[keywords valueForKey:@"keyword"]];
// ... then create them
for (id k in tokens)
{
NSManagedObject* newManagedObject = [NSEntityDescription insertNewObjectForEntityForName:@"Keyword" inManagedObjectContext:self.managedObjectContext];
[newManagedObject setValue:k forKey:@"keyword"];
[keywords addObject:newManagedObject];
}
// Make changes
NSMutableSet* keywordsToAdd = [[keywords mutableCopy] autorelease];
NSMutableSet* keywordsToRemove = [[self.words mutableCopy] autorelease];
[keywordsToAdd minusSet:self.words];
[keywordsToRemove minusSet:keywords];
if ([keywordsToAdd count]) [self addWords:keywordsToAdd];
if ([keywordsToRemove count]) [self removeWords:keywordsToRemove];
// Flag object as ready for saving
// Note that this assumes the object WILL NOT BE CHANGED between this point and the commit to DB
computedKeywords = YES;
}
- (void)didSave
{
computedKeywords = NO;
}
@end
Changes
The relevant differences between this code and that presented earlier have to do with precluding needless or destructive re-indexing. First of all, this line:
if (self.isDeleted || computedKeywords) return;
will force willSave
to return without computing or updating the keyword index if the Text
object is being deleted. If this line is not included, FTS index entries will be created that point to this (soon-to-be-deleted) object; these entries will cause Core Data validation to fail.
This line also skips indexing if the computedKeywords
flag of the Text
object is set. This is a bit of a hack (the flag is set at the end of willSave
, and cleared in didSave
) designed to avoid needless re-indexing of objects. When an object is changed during a willSave
call, Core Data will call willSave
again and again, until, finally, the willSave
call does not change the object. If we update the Text
object’s index (i.e its words
object), Core Data will re-invoke willSave
, which will re-compute the index. Since the index will come out the same, the save will execute successfully, but the second index calculation is a waste of time.
Indexing Speed
These changes increase indexing speed by about 50%, to ~3000 characters per second on the iPhone 3G. For my app, I think this is just on the right side of good enough. Which is lucky, because, aside from some slightly suspicious Objective-C in the preceding class, I can’t see any easy way to make Core Data FTS indexing run much faster than this code does now.