<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
  <id>https://blog.readthedocs.com</id>
  <title>Read the Docs Blog - Posts by Safwan Rahman</title>
  <updated>2023-07-08T10:50:29.810421+00:00</updated>
  <link href="https://blog.readthedocs.com"/>
  <link href="https://blog.readthedocs.com/archive/author/safwan-rahman/atom.xml" rel="self"/>
  <generator uri="https://ablog.readthedocs.org/" version="0.10.26">ABlog</generator>
  <entry>
    <id>https://blog.readthedocs.com/search-improvements/</id>
    <title>Improved Search</title>
    <updated>2019-02-11T00:00:00+00:00</updated>
    <author>
      <name>Safwan Rahman</name>
    </author>
    <content type="html">&lt;section id="improved-search"&gt;

&lt;p&gt;Have you ever struggled with a poorly documented software project?
What about a well documented project but you can’t find the right section inside the docs?
The Read the Docs &lt;a class="reference external" href="https://docs.readthedocs.io/en/latest/team.html#development-team"&gt;core team&lt;/a&gt; has realized the importance of good search for documentation
and got me to take the challenge as a Google Summer of Code student.
The main goal of my GSoC project was to refactor the search code together with upgrading the backend
search engine, as well as adding more features to our search functionality like exact match search,
case insensitive search, search as you type, suggestions and more.&lt;/p&gt;
&lt;section id="google-summer-of-code"&gt;
&lt;h2&gt;Google Summer of Code&lt;/h2&gt;
&lt;p&gt;Google Summer of Code is a global program where students work with an open source organization
on a 3 month programming project. The core team of Read the Docs proposed some &lt;a class="reference external" href="https://git.io/fN9GK"&gt;Project Ideas&lt;/a&gt;,
one of them was &lt;strong&gt;Refactor &amp;amp; improve our search code&lt;/strong&gt;. I (&lt;a class="reference external" href="https://github.com/safwanrahman"&gt;Safwan Rahman&lt;/a&gt;) was keen to get my hands dirty with
&lt;a class="reference external" href="https://www.elastic.co/products/elasticsearch"&gt;Elasticsearch&lt;/a&gt; and grasped the opportunity to do so by applying
for this project and I got accepted.&lt;/p&gt;
&lt;p&gt;I have worked full time for from April to August to upgrade the whole codebase
to compatible with &lt;a class="reference external" href="https://www.elastic.co/guide/en/elasticsearch/reference/6.3/index.html"&gt;Elasticsearch 6.x&lt;/a&gt; and also implemented various features like:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/issues/2457"&gt;Exact Matching Search&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/issues/2328"&gt;Case Insensitive Search&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/pull/4292"&gt;Improved Result Order&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/pull/4368"&gt;Zero Downtime Indexing&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Together with this, we are planning more features like the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/issues/504"&gt;Search as You Type and Autocomplete&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/issues/4289"&gt;Code Search&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of my search related work can be seen in the &lt;a class="reference external" href="https://github.com/orgs/rtfd/projects/3"&gt;Search Project Board&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="background"&gt;
&lt;h2&gt;Background&lt;/h2&gt;
&lt;p&gt;Search is a vital part of any documentation hosting platform, so people can get the
information they need. As a documentation hosting platform, the same rules apply to
Read the Docs. Because of having a small core team, the search functionality
of Read the Docs has lagged behind for quite a while now. Initially the search code
was voluntarily contributed by &lt;a class="reference external" href="https://github.com/robhudson"&gt;Rob Hudson&lt;/a&gt;,  &lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/pull/493"&gt;back in 2013&lt;/a&gt; and then improved by other
contributors. The search infrastructure was already outdated as
Read the Docs were using &lt;a class="reference external" href="https://www.elastic.co/guide/en/elasticsearch/reference/1.3/index.html"&gt;Elasticsearch 1.3.x&lt;/a&gt; which was already reached its
End of Life in 2016. Therefore, Upgrading the search infrastructure was badly needed.&lt;/p&gt;
&lt;section id="built-in-search-vs-read-the-docs-search"&gt;
&lt;h3&gt;Built in Search vs Read the Docs Search&lt;/h3&gt;
&lt;p&gt;Both Sphinx and MkDocs already have built in search functionality. But the features are very limited.
At Read the Docs, we have felt the limitations and therefore we index the documentations in our
&lt;a class="reference external" href="https://www.elastic.co/products/elasticsearch"&gt;Elasticsearch&lt;/a&gt; index so that we can provide better search experience like:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;Search across multiple projects&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Advanced query syntax&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Search inside subprojects&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improved search result order&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Public Search API (Documentation pending)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Currently, we do not index MkDocs documents with&lt;/strong&gt; &lt;a class="reference external" href="https://www.elastic.co/products/elasticsearch"&gt;Elasticsearch&lt;/a&gt;,
but &lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/issues/1088"&gt;any kind of help is welcome&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="new-features"&gt;
&lt;h2&gt;New Features&lt;/h2&gt;
&lt;p&gt;In the 4 months of full time work, I have implemented these features along with
many bug fixes. Some of the major features are as following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/issues/2457"&gt;Exact Matching Search&lt;/a&gt;: Exact matching is one of our most highly requested
features for Read the Docs. Now you can search for exact word in documentation
by having your query inside quotation (&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;/code&gt;). So if you search
for &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;&amp;quot;Here&lt;/span&gt; &lt;span class="pre"&gt;is&lt;/span&gt; &lt;span class="pre"&gt;foo&amp;quot;&lt;/span&gt;&lt;/code&gt; (with the quotation), you will get all the documentation where the full
&lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Here&lt;/span&gt; &lt;span class="pre"&gt;is&lt;/span&gt; &lt;span class="pre"&gt;foo&lt;/span&gt;&lt;/code&gt; phrase exists.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Simple Query String Syntax support: Now you can use &lt;a class="reference external" href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html#_simple_query_string_syntax"&gt;Simple Query String Syntax&lt;/a&gt; for
searching. For example, if you search with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Mozilla&lt;/span&gt; &lt;span class="pre"&gt;+-Firefox&lt;/span&gt;&lt;/code&gt;, you will get documentation
where the &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Mozilla&lt;/span&gt;&lt;/code&gt; word is present, but &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Firefox&lt;/span&gt;&lt;/code&gt; word is not present.
For more information, you can look at the &lt;a class="reference external" href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html#_simple_query_string_syntax"&gt;Simple Query String Syntax&lt;/a&gt; documentation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/issues/2328"&gt;Case Insensitive Search&lt;/a&gt;: The search is now case insensitive. So if you search for &lt;cite&gt;Foo&lt;/cite&gt;,
you will get all the documentation which have one of either &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Foo&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;FOO&lt;/span&gt;&lt;/code&gt;, &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;foo&lt;/span&gt;&lt;/code&gt; or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;fOO&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/pull/4292"&gt;Improved Result order&lt;/a&gt;: The result order is improved dramatically. Therefore if you Search
with &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Foo&lt;/span&gt; &lt;span class="pre"&gt;Bar&lt;/span&gt;&lt;/code&gt;, you will first see the documentation which have both &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Foo&lt;/span&gt; &lt;span class="pre"&gt;Bar&lt;/span&gt;&lt;/code&gt;, then
you will see other documentation which have either &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Foo&lt;/span&gt;&lt;/code&gt; or &lt;code class="docutils literal notranslate"&gt;&lt;span class="pre"&gt;Bar&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/issues/2013"&gt;Auto Removing from Search Index&lt;/a&gt;: In the past, if any page got removed from documentation,
the page was still available in documentation. But from now on, the page will be removed
automatically from the search index as soon as its no longer in the documentation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/pull/4368"&gt;Zero Downtime Indexing&lt;/a&gt;: As search is an important part of documentation, there will be no
downtime while we reindex our Elasticsearch Index. So the search will be much more reliable
than before.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="code-improvements"&gt;
&lt;h2&gt;Code Improvements&lt;/h2&gt;
&lt;p&gt;Code quality is very important in development world, specially in open source.
As I have rewritten the search functionality from scratch, the code quality
is improved in many ways like test coverage and documentation. So its easy for
any contributor to start working on the search functionality&lt;/p&gt;
&lt;/section&gt;
&lt;section id="contributors-wanted"&gt;
&lt;h2&gt;Contributors Wanted&lt;/h2&gt;
&lt;p&gt;As Read the Docs is an open source project backed by a small team of developers,
most of them are busy to keep things up and running only. Therefore, its quite
hard for them to take time to implement new features. If you know some bit of
Django or Python and Elasticsearch, you can contribute into the search functionality
of Read the Docs. If you need any support to start contributing, you can get in touch with
me or any member of  Read the Docs team. You can find all of us at &lt;cite&gt;#readthedocs&lt;/cite&gt; freenode
IRC channel or &lt;a class="reference external" href="https://gitter.im/rtfd/readthedocs.org"&gt;readthedocs gitter&lt;/a&gt; channel. I am &lt;cite&gt;safwan&lt;/cite&gt; at IRC and &lt;cite&gt;&amp;#64;safwanrahman&lt;/cite&gt;
at gitter.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;To conclude, I must say that the Search improvement in Read the Docs was very
necessary and I’m glad I could improve it in such a short amount of time.
There are an infinite number of ways it can be improved and I believe we can compete
with major search engines in terms of documentation searching.
Due to the constraints of only working for three months,
a number of compelling features were left out such as &lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/issues/504"&gt;Search as You Type and Autocomplete&lt;/a&gt; and
&lt;a class="reference external" href="https://github.com/rtfd/readthedocs.org/issues/4289"&gt;Code Search&lt;/a&gt; functionality. Moreover, proper documentation is needed for the search
architecture. I have tried to write test cases for most of the scenario, but because of
time constrains, a lot of code is out of test coverage.&lt;/p&gt;
&lt;p&gt;I strongly hope that we will get the left behind work done within a short amount of time.
This can be done easily if we get more contributors donate their time for improving Read the Docs.
We don’t need superhero or coding guru, just need people who understand Python, Django and
Elasticsearch and have some time to write some code for us. You are a &lt;strong&gt;Superhero&lt;/strong&gt; to us
if you can lend your time and effort to improve Read the Docs.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
</content>
    <link href="https://blog.readthedocs.com/search-improvements/" rel="alternate"/>
    <summary>Have you ever struggled with a poorly documented software project?
What about a well documented project but you can’t find the right section inside the docs?
The Read the Docs core team has realized the importance of good search for documentation
and got me to take the challenge as a Google Summer of Code student.
The main goal of my GSoC project was to refactor the search code together with upgrading the backend
search engine, as well as adding more features to our search functionality like exact match search,
case insensitive search, search as you type, suggestions and more.Google Summer of Code is a global program where students work with an open source organization
on a 3 month programming project. The core team of Read the Docs proposed some Project Ideas,
one of them was Refactor &amp; improve our search code. I (Safwan Rahman) was keen to get my hands dirty with
Elasticsearch and grasped the opportunity to do so by applying
for this project and I got accepted.</summary>
    <category term="gsoc" label="gsoc"/>
    <category term="search" label="search"/>
    <category term="feature" label="feature"/>
    <published>2019-02-11T00:00:00+00:00</published>
  </entry>
</feed>
