Full Text Search (Part 5)

Today I’m going to post minor revisions to some earlier Full-Text Search (FTS) code. These changes address index construction.

Text Objects

My revisions affect my Text class, which is the managed object subclass I use to model searchable text elements.

Here is my Text.h file:

#import <CoreData/CoreData.h>

@class Keyword;

@interface Text :  NSManagedObject  
{
	BOOL	computedKeywords;
}

@property (nonatomic, retain) NSString* text;
@property (nonatomic, retain) NSSet* words;

@end


@interface Text (CoreDataGeneratedAccessors)
- (void)addWordsObject:(Keyword*)value;
- (void)removeWordsObject:(Keyword*)value;
- (void)addWords:(NSSet*)value;
- (void)removeWords:(NSSet*)value;

@end

and here is my Text.m file:

#import "Text.h"
#import "Keyword.h"
#import "Tokenizer.h"


@interface Text (OtherAccessors)

@property (nonatomic, retain) NSString* sortPrefix;

- (void)setPrimitiveText:(NSString*)newText;

@end


@implementation Text 

@dynamic text;
@dynamic words;


- (void)setText:(NSString*)newText
{
	[self willChangeValueForKey:@"text"];
	[self setPrimitiveText:newText];
	[self didChangeValueForKey:@"text"];
	
	NSUInteger l = [newText length];
	self.sortPrefix = [newText substringToIndex:MIN(l, 20)];
}


- (void)willSave
{
	// Superclass
	[super willSave];

	// Omit keyword processing for deleted objects
	if (self.isDeleted || computedKeywords) return;

	// Process this object for keywords
	NSMutableSet* tokens = [[Tokenizer sharedTokenizer] tokenize:self.text];

	// Create and perform a request for existing keyword records for these tokens
	NSError* err;
	NSFetchRequest* request = [[NSFetchRequest alloc] init];
	request.entity = [NSEntityDescription entityForName:@"Keyword" inManagedObjectContext:self.managedObjectContext];
	request.predicate = [NSPredicate predicateWithFormat:@"keyword IN %@",tokens];
	NSMutableSet* keywords = [NSMutableSet setWithArray:[self.managedObjectContext executeFetchRequest:request error:&err]];
	[request release];

	// Find nonexistent keywords (by removing existing keywords from tokens) ...
	[tokens minusSet:[keywords valueForKey:@"keyword"]];
	// ... then create them
	for (id k in tokens)
	{
		NSManagedObject* newManagedObject = [NSEntityDescription insertNewObjectForEntityForName:@"Keyword" inManagedObjectContext:self.managedObjectContext];
		[newManagedObject setValue:k forKey:@"keyword"];
		[keywords addObject:newManagedObject];
	}
	
	// Make changes
	NSMutableSet* keywordsToAdd = [[keywords mutableCopy] autorelease];
	NSMutableSet* keywordsToRemove = [[self.words mutableCopy] autorelease];
	[keywordsToAdd minusSet:self.words];
	[keywordsToRemove minusSet:keywords];
	if ([keywordsToAdd count]) [self addWords:keywordsToAdd];
	if ([keywordsToRemove count]) [self removeWords:keywordsToRemove];

	// Flag object as ready for saving
	// Note that this assumes the object WILL NOT BE CHANGED between this point and the commit to DB
	computedKeywords = YES;
}


- (void)didSave
{
	computedKeywords = NO;
}

@end

Changes

The relevant differences between this code and that presented earlier have to do with precluding needless or destructive re-indexing. First of all, this line:

if (self.isDeleted || computedKeywords) return;

will force willSave to return without computing or updating the keyword index if the Text object is being deleted. If this line is not included, FTS index entries will be created that point to this (soon-to-be-deleted) object; these entries will cause Core Data validation to fail.

This line also skips indexing if the computedKeywords flag of the Text object is set. This is a bit of a hack (the flag is set at the end of willSave, and cleared in didSave) designed to avoid needless re-indexing of objects. When an object is changed during a willSave call, Core Data will call willSave again and again, until, finally, the willSave call does not change the object. If we update the Text object’s index (i.e its words object), Core Data will re-invoke willSave, which will re-compute the index. Since the index will come out the same, the save will execute successfully, but the second index calculation is a waste of time.

Indexing Speed

These changes increase indexing speed by about 50%, to ~3000 characters per second on the iPhone 3G. For my app, I think this is just on the right side of good enough. Which is lucky, because, aside from some slightly suspicious Objective-C in the preceding class, I can’t see any easy way to make Core Data FTS indexing run much faster than this code does now.

Share and Enjoy:
  • Twitter
  • Facebook
  • Digg
  • Reddit
  • HackerNews
  • del.icio.us
  • Google Bookmarks
  • Slashdot
This entry was posted in iPhone. Bookmark the permalink.

Comments are closed.