At Mapado, as you may know, we want to offer the possibility to search “things to do close to you”. And by close we mean culturally, and geographically too.

About this second point, we did not find a better solution than an auto-complete search field whit a city list in a dropdown (as you can see in the featured image).

Here is the solution we implemented.

What we have looked at

Technically, we have a “city” table in our databases with the following columns : [id, name,] (in fact, there are a little more, but I shortened for the example 🙂 ).

We did not take much time to develop this part before the launch, so we just did a SQL “LIKE” query :

This was quite awful for three reasons, let’s assume we want to search “Saint-Genis-Laval” :

  1. If you search “St-Genis” or even “Saint Genis” (without the “-“), you will not find it,
  2. If your search “Genis” or “Laval”, you will not find it too (because we did not search with a starting “%”),
  3. SQL Like queries are sloooooow

As awful as solution was, we still did know about it, we just had to find the right tool to fix it. To do that, we researched a couple of tools:

  • Algolia seems to be the best of them but it is quite expensive.
  • Elasticsearch, but the “full text search” was not really what we were looking for. We wanted a “record search”, which allow typo, reversed letters, etc.

What we have found

After some hours of digging the Internet in general and Github in particular, Jerry came back with the perfect soft: Simstring developed by Naoaki Okazaki.

The software is a C++ program with a command line utility, and some languages adapter (Python, Ruby, Java, Perl, and I added the PHP one 😉 )

The command line install instruction are on the project webpage. If you do want to install the PHP or another language plugin, you can skip this step, but it is quite nice to understand.

Example

I put a list of 10 000 french firstnamed in a csv file and created a new simstring database:

As you can see, the database creation is quite fast.

Let’s search for my firstname now !

Yeah, I often make a typo in my firstname, but you can see that simstring find “Julien” ! (and in 0 seconds, which seems quite fast too 🙂 )

Now that we adopted it, we just had to make it work in a web environment.

What we have done with it

The developer already provides some language-specific wrappers. The ones for Python and Ruby are already packaged in the Simstring sources that you can download on the webpage. Java and Perl are available on Github. I could have used the Python adapter, but as I am more a PHP guy, I challenged myself to make this work as a PHP extension.

Simstring uses Swig, a wrapper for C++, and after a couple of hours, I finally managed to make it work.

As we use Symfony Framework on our website, I wanted to make a bundle to make the code as simple as it should be.

If you just want to play with simstring, you can skip the Symfony Bundle part. If you want the bundle, you have to install the PHP extension first, and the bundle after that.

PHP Extension installation

Now this is the hard part 🙂

You now should have a “simstring.so” file. If you have problem at this point, please comment on this post, I will do my best to help you !

Now you have to move your simstring.so file in your PHP extension directory, and activate your extension:

If you want to use the PHP extension only, you will have to copy/include the “simstring.php” file, which is your API to Simstring.

If you want to use the Symfony Bundle, you do not have to do this, as the file is included in it and I think my API is a little bit better.

PHP example:

Symfony Bundle

The Symfony Bundle can be found on github and the installation and usage is well documented. You can search text or objects from your database (using a little configuration).

Here is a little example after installation:

You now have a fast and nice record search engine on your application.