The WHY and the HOW of Elasticsearch for Polaris OS

 

Making technologies at your fingertips daily is our main goal. In a very simple way, let’s learn together one of the tools we use to empower your -institutional- research. Focus on Elasticsearch with Manuel today.

Who is Manuel Guzman ?

Manuel has been our Research and Development Director for 4 years now. He has been working in software development for more than 20 years (!). More than a professional, he has a passionate interest in technologies that make life easier and sports.

 

  

Manuel, what is concretely Elasticsearch ? 

It’s a search engine able to store and analyze a large quantity of documents, on which it is possible to do real-time searches. Indeed, its query language, coupled with textual analyzers, derived from research in artificial intelligence, can retrieve and aggregate information in a fast and relevant way. For example, it is possible to extract statistics in billions of documents in real-time. 

To summarize: Elasticsearch is an ideal search engine (like the well-known Google) that can be set according to our needs.

And concretely in Polaris OS, what gives ?

Elasticsearch is the essential tool that indexes your data and allows you to extract essential information. Concrete examples? Scientific publications, patents, chapters, theses, research projects, clinical trials, ... from which the key information is extracted to be analyzed in order to best meet your needs: name / first name of the researcher and collaborator involved , affiliations, research areas, keywords / concepts related to publication and any other metadata associated to your publications. It could also search across publications’ full text.

If your institution uses a data analysis and data management system as powerful as Polaris OS, the results of your queries will be relevant AND fast, will it not ?!

For institutional research laboratories/institutions, what is the real advantage of using elasticsearch ?

The primary advantage is the immediate availability of their research work in their repository. As soon as the work is approved, it becomes findable in the database because all indexing work happens when the information comes into the database.

Why do we use it not only as a search engine but also as a database?

Technical complexity of a system defines the infrastructure needed to support it. In a classical installation, another database engine would be next to Elasticsearch to store the data which would be later transferred to the search engine. In the case of Polaris OS we have decided to use Elasticsearch not only as search engine but also as the main and only database to:

  • Make data immediately available for search. As soon as a publication is validated it can be found by the users, no need for additional indexing processes.

  • Ensure high availability as it is provided out of the box by configuring it to hold multiple copies of each data piece in different places.

  • Handle growth by a simple operation of adding a node to the cluster. Redistribution of the data across the available nodes is provided as a standard functionality.

  • Reduce the technical complexity of the system and, in consequence, limit infrastructure needs

What is its position in relation to a relational database?

Each database technology provides services adapted to different needs. Relational databases remain a good choice for highly structured data and 2-dimensional representations. On the other hand, object databases as Elasticsearch allow to handle low structured data with multiple levels of depth without the need of creating new tables or indexes for each depth level. These are also known as NoSQL databases for Not Only SQL.

Why using a free and open to use technology ?

Elasticsearch is distributed under the Elastic License which allows free use of the product and ensures continuity of the product as it grants access to every new version released to the market. Many discussions have turned around the question on whether Elasctisearch is still an open source software since it changed its licenses in 2021. Despite the validity of all the arguments, the bottom line is that the software can be used and freely distributed in a system like Polaris OS.

As a reminder, a software is open source when its license allows the source code to be used, modified and / or shared according to defined terms and conditions, often freely. We have an example at hand ... Polaris OS created by MyScienceWork is also available on GitHub.