SQLite FTS contains and suffix matches

SQLite is used byDash to search throughdocset indexes. Originally, Dash used LIKE
queries which were fast enough, but became increasingly slower as more docsets were added.

SQLite FTS
is amazingly fast, but allows only prefix (e.g. query*
) matches by default. For Dash, I needed to persuade it to also perform contains matches (e.g. *query*
) or suffix matches (e.g. *query
).

How it works

It’s simple, for each term I want to be able to search, I store all of its suffixes.

First of all, the table structure:

CREATE VIRTUAL TABLE searchIndex USING FTS4(suffixes)

Add the term NSString
:

INSERT INTO searchIndex(suffixes) VALUES("NSString SString String tring ring ing ng g")

Search using suffix queries:

SELECT * FROM searchIndex WHERE suffixes MATCH 'string';

Or contains queries:

SELECT * FROM searchIndex WHERE suffixes MATCH 'str*';

Downsides

The only downside I could find was that the database got too large. To avoid this, I compress
the data into its actual term.

The compress and uncompress functions behave in this way:

compress("NSString SString String tring ring ing ng g")
-> NSString

uncompress("NSString")
-> NSString SString String tring ring ing ng g

This compression reduces the database size to what it would be if only the actual terms were added (without all the suffixes).

Speed results

Searching over 1,110,381 terms (in 102 docsets) using contains queries:

Search for "string" using LIKE:    3.22 seconds
Search for "string" using FTS:     0.18 seconds

Search for "s" using LIKE:         6.87 seconds
Search for "s" using FTS:          0.22 seconds

Alternatives

I chose SQLite FTS because I was already familiar with SQLite and I also needed to work around some Dash-specific edge cases (e.g. how symbols are treated).

Depending on your project, these may be suitable alternatives:

  1. PostgreSQL’s wildspeed module
  2. For OS X or iOS apps: Search Kit
稿源:Kapeli Blog (源链) | 关于 | 阅读提示

本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 移动互联 » SQLite FTS contains and suffix matches

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录
切换注册

登录

忘记密码 ?

您也可以使用第三方帐号快捷登录

Q Q 登 录
微 博 登 录
切换登录

注册