Open Source Cloud Authors: Liz McMillan, Zakia Bouachraoui, William Schmarzo, Elizabeth White, Yeshim Deniz

Related Topics: @CloudExpo, Microsoft Cloud, Open Source Cloud

@CloudExpo: Blog Post

Solr vs Azure Search

Search-as-a-service from Microsoft Azure

Microsoft Azure, a cloud platform, is rapidly expanding its scope to include newer enterprise class services. Some of the significant new additions are:

  • Azure Search: Azure Search Service is a fully managed, cloud-based service that allows developers to build rich search applications using REST APIs. It includes full-text search scoped over your content, plus advanced search behaviors similar to those found in commercial web search engines, such as type-ahead, suggested queries based on near matches, and faceted navigation.
  • Azure Machine Learning: Azure Machine Learning makes it possible for people without deep data science backgrounds to start mining data for predictions. ML Studio, an integrated development environment, uses drag-and-drop gestures and simple data flow graphs to set up experiments. For many tasks, you don't have to write a single line of code.
  • Azure Stream Analytics: Azure Stream Analytics is a fully managed service providing low latency, highly available, scalable complex event processing over streaming data in the cloud.

All these new services with a road map for new ones will position Azure as a leading platform in the enterprise adoption of PaaS.

In the following notes, I compare the open source search platform Solr against the capabilities of Azure Search services and note some advantages enterprises may derive by adopting the PaaS implementation of search.

Solr Features Compared with Azure Search
Solr is a fast open source enterprise search platform from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.

The following are the some of the aspects in the usage of Solr in enterprises against that of Azure Search. As the open source vs commercial software is a religious debate, the intent is not aimed at the argument, as the most enterprises define their own IT Policies between the choice of Open Source vs commercial products and same sense will prevail here also, the below notes are meant for understanding the new Azure service in the light of an existing proven search platform.


Usage In Solr

Usage In Azure Search

Installation & Setup

While Solr can be installed as a self-contained engine by using Jetty. Most sites utilize Tomcat as the container for the Solr web application.

As typical of many open source products, there are few more dependencies like Apache Commons, SLF4J and JDK needs to be installed as part of setup.

Being a PaaS platform, Azure Search is a fully managed and readily available service and any of the internal dependencies are managed by the Azure platform.


Solr works on a pre defined schema and every Solr instance of Solr requires a schema.xml file, which provides the structure of the documents that will be stored as part of that instance.

As typical of any database schema, this consists of two major sections.

Types section - Definition for all types.

Fields Section - Definition of document structures using types.

Solr also supports a Schema less mode , Solr's dynamic field capability reduces up-front configuration requirements for fields with predictable naming patterns. For example, the following dynamic field definition maps any field name with suffix "_i" to the "int" field type.

In Azure Search, a JSON schema that defines the index is needed. The schema specifies the field-attribute combinations supported in your search application. Fields contain searchable data, such as product names, descriptions, customer comments, brands, prices, promotional notifications, and so forth. Attributes inform the types of operations that can be performed. Examples of the more commonly used attributes include whether a field supports full-text search (searchable=true), filters (filterable=true), or facets (facetable=true).

Azure Search uses most typical enterprise data types like Edm.String, Collection(Edm.String), Edm.DateTimeOffset, Edm.Int32.

At this time there is no clear cut documentation on Schema less operations in Azure Search, but mostly this feature can be work around with appropriate field naming conventions.

Document Ingestion (Loading)

Solr provides command line utilities that will help in loading the documents.

There is a also Web Service api which can be invoked for Updating and deleting specific documents.

Solr schema defines a primary key for the document collection, which will be used for Update decisions.

We can upload, merge or delete documents from a specified index using HTTP POST. For large numbers of updates, batching of documents (up to 1000 documents per batch, or about 16 MB per batch) is recommended.

Much like Solr the request pay load will contain a "key_field_name" to uniquely identify a document for updating requests.

Azure Search supports, upload: An upload action is similar to an "upsert" where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case.

Searching Documents

Solr is built for searching and hence has rich set of features to support search.


  • Faceted Searching based on unique field values, explicit queries, date ranges, numeric ranges or pivot
  • Spelling suggestions for user queries
  • Auto-suggest functionality for completing user queries
  • Simple join capability between two document types
  • Numeric field statistics such as min, max, average, standard deviation


  • Function Query - influence the score by user specified complex functions of numeric fields or query relevancy scores.


  • More Like This suggestions for given document

To query your search data, your application sends a request that includes the service URL and an api-key for authenticating the request, along with a search query formulated from either OData syntax or a simple query syntax that provides the same functionality. When a query is sent to the Search API, the search engine in Azure Search processes the query and returns the results in a JSON document which can then be parsed and added to the presentation layer of your application.

Azure Search uses a simple query syntax for search text. This syntax is designed to be end-user friendly and is processed in a way that is tolerant to errors.

Azure Search supports a subset of the OData expression syntax for $filter.

Some of the salient features of Solr are also fully supported in Azure Search.

  • Full-text search
  • Scoring profiles
  • Faceted navigation
  • Suggestions for type-ahead or autocomplete
  • Count of the search hits returned for a query
  • Highlighted hits

Value Proposition for Azure Search
As we see from above, Azure Search tries to match the features of Solr in most aspects, however Solr is a seasoned search engine and Azure Search is in its preview stage, so some small deficiencies may occur in the understanding and proper application of Azure Search, however there is one area where the Azure Search may be a real winner for enterprises, which is ‘Scalability & Availability'.

Solr installation require highly competent administrator to ensure that Solr installations scales to 10s of 1000s of documents and yet the searches are load balanced against multiple nodes and the performance is not affected.

Solr adopts a number of features to support this level of massive scalability.

When your data is too large for one node, you can break it up and store it in sections by creating one or more shards. Each is a portion of the logical index, or core, and it's the set of all nodes containing that section of the index.

SolrCloud is the name of a set of new distributed capabilities in Solr. Passing parameters to enable these capabilities will enable you to set up a highly available, fault tolerant cluster of Solr servers. Use SolrCloud when you want high scale, fault tolerant, distributed indexing and search capabilities.

Implementing SolrCloud and associated maintenance requires good knowledge from administrators.

However Azure Search, really makes scalability a much simpler thing. When we provision a new Azure Search service, the following building blocks are automatically managed. A Standard search is allocated in user-defined bundles of partitions (storage) and replicas (service workloads). You can scale up on partitions or replicas independently, adding more of whatever resource is needed.

Every search service starts with a minimum of one replica and one partition. If you signed up for dedicated resources using the Standard pricing tier, you can click the SCALE tile in the service dashboard to readjust the number of partitions and replicas used by your service. When you add either resource, the service uses them automatically. No further action is required on your part.

Increasing queries per second (QPS) or achieving high availability is done by adding replicas. Each replica has one copy of an index, so adding one more replica translates to one more index that can be used to service query requests. Currently, the rule of thumb is that you need at least 3 replicas for high availability.

Most service applications have a built-in need for more replicas rather than partitions, as most applications that utilize search can fit easily into a single partition that can support up to 15 million documents. For those cases where an increased document count is required, you can add partitions.

As always utilizing a commercial PaaS option comes with a price, but enterprises do find a trade-off between the ease of maintenance and quick go to market on choosing a managed platform versus self-maintained products. Also Azure Search is currently in the beta and hence we may have to wait for deploying mission critical and production applications, but it is worth to get started with pilot projects and it will be in the best interest of Microsoft to quickly make the service to mission critical standards.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

IoT & Smart Cities Stories
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...