13. Full Text Search¶
How should a customer find the product he desires in a more or less unstructured collection of countless products. Hierarchical navigation often doesn’t work and takes too much time. Thanks to the way we use the Internet today, most site visitors expect one central search field in the main navigation bar of a site.
13.1. Search Engine API¶
In Django the most popular API for full-text search is Haystack. While other indexing backends, such as Solr and Whoosh might work as well, the best results have been achieved with Elasticsearch. Therefore this documentation focuses exclusively on Elasticsearch. And since in djangoSHOP every programming interface uses REST, search is no exception here. Fortunately there is a project named drf-haystack, which “restifies” our search results, if use use special serializers.
In this document we assume that the merchant only wants to index his products, but not any arbitrary content, such as for example the terms and condition, as found outside djangoSHOP, but inside djangoCMS.
13.1.1. Configuration¶
Install the Elasticsearch binary. Currently Haystack only supports versions smaller than 2. Then start the service in daemon mode:
./path/to/elasticsearch-version/bin/elasticsearch -d
Check if the server answers on HTTP requests. Pointing a browser onto port http://localhost:9200/ should return something similar to this:
$ curl http://localhost:9200/
{
"status" : 200,
"name" : "Ape-X",
"cluster_name" : "elasticsearch",
"version" : {
...
},
}
In settings.py
, check that 'haystack'
has been added to INSTALLED_APPS
and connects
the application server with the Elasticsearch database:
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://localhost:9200/',
'INDEX_NAME': 'myshop-default',
},
}
In case we need indices for different natural languages on our site, we shall add the non-default
languages to this Python dictionary using a different INDEX_NAME
for each of them.
Finally configure the site, so that search queries are routed to the correct index using the currently active natural language:
HAYSTACK_ROUTERS = ('shop.search.routers.LanguageRouter',)
13.2. Indexing the Products¶
Before we start to search for something, we first must populate its indices. In Haystack one can create more than one kind of index for each item being added to the search database.
Each product type requires its individual indexing class. Note that Haystack does some
autodiscovery, therefore this class must be added to a file named search_indexex.py
. For our
product model SmartCard
, this indexing class then may look like:
from shop.search.indexes import ProductIndex
from haystack import indexes
class SmartCardIndex(ProductIndex, indexes.Indexable):
catalog_media = indexes.CharField(stored=True,
indexed=False, null=True)
search_media = indexes.CharField(stored=True,
indexed=False, null=True)
def get_model(self):
return SmartCard
# more methods ...
While building the index, Haystack performs some preparatory steps:
13.2.1. Populate the reverse index database¶
The base class for our search index declares two fields for holding the reverse indexes and a few additional fields to store information about the indexed product entity:
class ProductIndex(indexes.SearchIndex):
text = indexes.CharField(document=True,
indexed=True, use_template=True)
autocomplete = indexes.EdgeNgramField(indexed=True,
use_template=True)
product_name = indexes.CharField(stored=True,
indexed=False, model_attr='product_name')
product_url = indexes.CharField(stored=True,
indexed=False, model_attr='get_absolute_url')
The first two index fields require a template which renders plain text, which is used to build a
reverse index in the search database. The indexes.CharField
is used for a classic reverse text
index, whereas the indexes.EdgeNgramField
is used for autocompletion.
Each of these index fields require their own template. They must be named according to the following rules:
search/indexes/myshop/<product-type>_text.txt
and
search/indexes/myshop/<product-type>_autocomplete.txt
and be located inside the project’s template folder. The <product-type>
is the classname in
lowercase of the given product model. Create two individual templates for each product type, one
for text search and one for autocompletion.
An example:
{{ object.product_name }}
{{ object.product_code }}
{{ object.manufacturer }}
{{ object.description|striptags }}
{% for page in object.cms_pages.all %}
{{ page.get_title }}{% endfor %}
The last two fields are used to store information about the product’s content, side by side with the indexed entities. That’s a huge performance booster, since this information otherwise would have to be fetched from the relational database, item by item, and then being rendered while preparing the search query result.
We can also add fields to our index class, which stores pre-rendered HTML. In the above example,
this is done by the fields catalog_media
and search_media
. Since we do not provide
a model attribute, we must provide two methods, which creates this content:
class SmartCardIndex(ProductIndex, indexes.Indexable):
# other fields and methods ...
def prepare_catalog_media(self, product):
return self.render_html('catalog', product, 'media')
def prepare_search_media(self, product):
return self.render_html('search', product, 'media')
These methods themselves invoke render_html
which takes the product and renders it using
a templates named catalog-product-media.html
or search-product-media.html
respectively.
These templates are looked for in the folder myshop/products
or, if not found there in the
folder shop/products
. The HTML snippets for catalog-media are used for autocompletion search,
whereas search-media is used for normal a normal full-text search invocation.
13.2.2. Building the Index¶
To build the index in Elasticsearch, invoke:
./manage.py rebuild_index --noinput
Depending on the number of products in the database, this may take some time.
13.3. Search Serializers¶
Haystack for Django REST Framework is a small library aiming to simplify using Haystack with Django REST Framework. It takes the search results returned by Haystack, treating them the similar to Django database models when serializing their fields. The serializer used to render the content for this demo site, may look like:
from rest_framework import serializers
from shop.search.serializers import ProductSearchSerializer as ProductSearchSerializerBase
from .search_indexes import SmartCardIndex, SmartPhoneIndex
class ProductSearchSerializer(ProductSearchSerializerBase):
media = serializers.SerializerMethodField()
class Meta(ProductSearchSerializerBase.Meta):
fields = ProductSearchSerializerBase.Meta.fields + ('media',)
index_classes = (SmartCardIndex, SmartPhoneIndex)
def get_media(self, search_result):
return search_result.search_media
This serializer is part of the project, since we must adopt it to whatever content we want to display on our site, whenever a visitor enters some text into the search field.
13.4. Search View¶
In the Search View we link the serializer together with a djangoCMS apphook. This
ProductSearchApp
can be added to the same file, we already used to declare the
ProductsListApp
used to render the catalog view:
from cms.app_base import CMSApp
from cms.apphook_pool import apphook_pool
class ProductSearchApp(CMSApp):
name = _("Search")
urls = ['myshop.urls.search']
apphook_pool.register(ProductSearchApp)
as all apphooks, it requires a file defining its urlpatterns:
from django.conf.urls import patterns, url
from shop.search.views import SearchView
from myshop.serializers import ProductSearchSerializer
urlpatterns = patterns('',
url(r'^', SearchView.as_view(
serializer_class=ProductSearchSerializer,
)),
)
13.4.1. Search Results¶
As with all other pages in djangoSHOP, the page displaying our search results is a normal CMS page too. It is suggested to create this page on the root level of the page tree.
As the page title use “Search” or whatever is appropriate in our natural language. Then we change into advanced setting.
As a template use one with a big placeholder, since it must display our search results.
In the page Id field, use “shop-search-product”. Some prepared default templates use this hard coded string.
Set the input field Soft root to checked. This hides this special page from our menu list.
As Application, select “Search”. This selects the apphook we created in the previous section.
Then save the page, change into Structure mode and locate the Main Content Container. Add a container with a Row and Column. As the child of this column chose the Search Results plugin from section Shop.
Finally publish the page and enter some text into the search field. It should render a list of found products.
13.5. Autocompletion in Catalog List View¶
As we have seen in the previous example, the Product Search View is suitable to search for any item in the product database. However, the site visitor sometimes might just refine the list of items shown in the catalog’s list view. Here loading a new page which uses a completely different layout, may by inappropriate.
Instead, when someone enters some text into the search field, djangoSHOP starts to narrow down the list of items in the Catalog List View by typing query terms into the search field. This is specially useful in situations where hundreds of products are displayed together on the same page and the customer needs to pick out the correct one by entering some search terms.
To extend the existing Catalog List View for autocompletion, locate the file containing the
urlpatterns, which are used by the apphook ProductsListApp
. In doubt, consult the file
myshop/cms_app.py
.
Into these urlpatterns add the following entry:
from django.conf.urls import patterns, url
from shop.search.views import SearchView
from myshop.serializers import CatalogSearchSerializer
urlpatterns = patterns('',
# previous patterns
url(r'^search-catalog$', SearchView.as_view(
serializer_class=CatalogSearchSerializer,
)),
# other patterns
)
Note
Be careful the the regular expression for ^search-catalog$
matches before the
product’s detail view, which usually is looks for patterns matching ^(?P<slug>[\w-]+)$
.
The CatalogSearchSerializer
used here is very similar to the ProductSearchSerializer
we have
seen in the previous section. The only difference is, that instead of the search_media
field
is uses the catalog_media
field, which renders the result items media in a layout appropriate
for the catalog’s list view.