Open Source Cloud Authors: Elizabeth White, William Schmarzo, Pat Romanski, Liz McMillan, Jason Bloomberg

Related Topics: @CloudExpo, Microsoft Cloud, Open Source Cloud

@CloudExpo: Blog Post

Solr vs Azure Search

Search-as-a-service from Microsoft Azure

Microsoft Azure, a cloud platform, is rapidly expanding its scope to include newer enterprise class services. Some of the significant new additions are:

  • Azure Search: Azure Search Service is a fully managed, cloud-based service that allows developers to build rich search applications using REST APIs. It includes full-text search scoped over your content, plus advanced search behaviors similar to those found in commercial web search engines, such as type-ahead, suggested queries based on near matches, and faceted navigation.
  • Azure Machine Learning: Azure Machine Learning makes it possible for people without deep data science backgrounds to start mining data for predictions. ML Studio, an integrated development environment, uses drag-and-drop gestures and simple data flow graphs to set up experiments. For many tasks, you don't have to write a single line of code.
  • Azure Stream Analytics: Azure Stream Analytics is a fully managed service providing low latency, highly available, scalable complex event processing over streaming data in the cloud.

All these new services with a road map for new ones will position Azure as a leading platform in the enterprise adoption of PaaS.

In the following notes, I compare the open source search platform Solr against the capabilities of Azure Search services and note some advantages enterprises may derive by adopting the PaaS implementation of search.

Solr Features Compared with Azure Search
Solr is a fast open source enterprise search platform from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.

The following are the some of the aspects in the usage of Solr in enterprises against that of Azure Search. As the open source vs commercial software is a religious debate, the intent is not aimed at the argument, as the most enterprises define their own IT Policies between the choice of Open Source vs commercial products and same sense will prevail here also, the below notes are meant for understanding the new Azure service in the light of an existing proven search platform.


Usage In Solr

Usage In Azure Search

Installation & Setup

While Solr can be installed as a self-contained engine by using Jetty. Most sites utilize Tomcat as the container for the Solr web application.

As typical of many open source products, there are few more dependencies like Apache Commons, SLF4J and JDK needs to be installed as part of setup.

Being a PaaS platform, Azure Search is a fully managed and readily available service and any of the internal dependencies are managed by the Azure platform.


Solr works on a pre defined schema and every Solr instance of Solr requires a schema.xml file, which provides the structure of the documents that will be stored as part of that instance.

As typical of any database schema, this consists of two major sections.

Types section - Definition for all types.

Fields Section - Definition of document structures using types.

Solr also supports a Schema less mode , Solr's dynamic field capability reduces up-front configuration requirements for fields with predictable naming patterns. For example, the following dynamic field definition maps any field name with suffix "_i" to the "int" field type.

In Azure Search, a JSON schema that defines the index is needed. The schema specifies the field-attribute combinations supported in your search application. Fields contain searchable data, such as product names, descriptions, customer comments, brands, prices, promotional notifications, and so forth. Attributes inform the types of operations that can be performed. Examples of the more commonly used attributes include whether a field supports full-text search (searchable=true), filters (filterable=true), or facets (facetable=true).

Azure Search uses most typical enterprise data types like Edm.String, Collection(Edm.String), Edm.DateTimeOffset, Edm.Int32.

At this time there is no clear cut documentation on Schema less operations in Azure Search, but mostly this feature can be work around with appropriate field naming conventions.

Document Ingestion (Loading)

Solr provides command line utilities that will help in loading the documents.

There is a also Web Service api which can be invoked for Updating and deleting specific documents.

Solr schema defines a primary key for the document collection, which will be used for Update decisions.

We can upload, merge or delete documents from a specified index using HTTP POST. For large numbers of updates, batching of documents (up to 1000 documents per batch, or about 16 MB per batch) is recommended.

Much like Solr the request pay load will contain a "key_field_name" to uniquely identify a document for updating requests.

Azure Search supports, upload: An upload action is similar to an "upsert" where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case.

Searching Documents

Solr is built for searching and hence has rich set of features to support search.


  • Faceted Searching based on unique field values, explicit queries, date ranges, numeric ranges or pivot
  • Spelling suggestions for user queries
  • Auto-suggest functionality for completing user queries
  • Simple join capability between two document types
  • Numeric field statistics such as min, max, average, standard deviation


  • Function Query - influence the score by user specified complex functions of numeric fields or query relevancy scores.


  • More Like This suggestions for given document

To query your search data, your application sends a request that includes the service URL and an api-key for authenticating the request, along with a search query formulated from either OData syntax or a simple query syntax that provides the same functionality. When a query is sent to the Search API, the search engine in Azure Search processes the query and returns the results in a JSON document which can then be parsed and added to the presentation layer of your application.

Azure Search uses a simple query syntax for search text. This syntax is designed to be end-user friendly and is processed in a way that is tolerant to errors.

Azure Search supports a subset of the OData expression syntax for $filter.

Some of the salient features of Solr are also fully supported in Azure Search.

  • Full-text search
  • Scoring profiles
  • Faceted navigation
  • Suggestions for type-ahead or autocomplete
  • Count of the search hits returned for a query
  • Highlighted hits

Value Proposition for Azure Search
As we see from above, Azure Search tries to match the features of Solr in most aspects, however Solr is a seasoned search engine and Azure Search is in its preview stage, so some small deficiencies may occur in the understanding and proper application of Azure Search, however there is one area where the Azure Search may be a real winner for enterprises, which is ‘Scalability & Availability'.

Solr installation require highly competent administrator to ensure that Solr installations scales to 10s of 1000s of documents and yet the searches are load balanced against multiple nodes and the performance is not affected.

Solr adopts a number of features to support this level of massive scalability.

When your data is too large for one node, you can break it up and store it in sections by creating one or more shards. Each is a portion of the logical index, or core, and it's the set of all nodes containing that section of the index.

SolrCloud is the name of a set of new distributed capabilities in Solr. Passing parameters to enable these capabilities will enable you to set up a highly available, fault tolerant cluster of Solr servers. Use SolrCloud when you want high scale, fault tolerant, distributed indexing and search capabilities.

Implementing SolrCloud and associated maintenance requires good knowledge from administrators.

However Azure Search, really makes scalability a much simpler thing. When we provision a new Azure Search service, the following building blocks are automatically managed. A Standard search is allocated in user-defined bundles of partitions (storage) and replicas (service workloads). You can scale up on partitions or replicas independently, adding more of whatever resource is needed.

Every search service starts with a minimum of one replica and one partition. If you signed up for dedicated resources using the Standard pricing tier, you can click the SCALE tile in the service dashboard to readjust the number of partitions and replicas used by your service. When you add either resource, the service uses them automatically. No further action is required on your part.

Increasing queries per second (QPS) or achieving high availability is done by adding replicas. Each replica has one copy of an index, so adding one more replica translates to one more index that can be used to service query requests. Currently, the rule of thumb is that you need at least 3 replicas for high availability.

Most service applications have a built-in need for more replicas rather than partitions, as most applications that utilize search can fit easily into a single partition that can support up to 15 million documents. For those cases where an increased document count is required, you can add partitions.

As always utilizing a commercial PaaS option comes with a price, but enterprises do find a trade-off between the ease of maintenance and quick go to market on choosing a managed platform versus self-maintained products. Also Azure Search is currently in the beta and hence we may have to wait for deploying mission critical and production applications, but it is worth to get started with pilot projects and it will be in the best interest of Microsoft to quickly make the service to mission critical standards.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

@ThingsExpo Stories
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect at GE, and Ibrahim Gokcen, who leads GE's advanced IoT analytics, focused on the Internet of Things / Industrial Internet and how to make it operational for business end-users. Learn about the challenges posed by machine and sensor data and how to marry it with enterprise data. They also discussed the tips and tricks to provide the Industrial Internet as an end-user consumable service using Big Data Analytics and Industrial C...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term.
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
With privacy often voiced as the primary concern when using cloud based services, SyncriBox was designed to ensure that the software remains completely under the customer's control. Having both the source and destination files remain under the user?s control, there are no privacy or security issues. Since files are synchronized using Syncrify Server, no third party ever sees these files.
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Internet-of-Things discussions can end up either going down the consumer gadget rabbit hole or focused on the sort of data logging that industrial manufacturers have been doing forever. However, in fact, companies today are already using IoT data both to optimize their operational technology and to improve the experience of customer interactions in novel ways. In his session at @ThingsExpo, Gordon Haff, Red Hat Technology Evangelist, shared examples from a wide range of industries – including en...
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Rodrigo Coutinho is part of OutSystems' founders' team and currently the Head of Product Design. He provides a cross-functional role where he supports Product Management in defining the positioning and direction of the Agile Platform, while at the same time promoting model-based development and new techniques to deliver applications in the cloud.
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of bus...
delaPlex is a global technology and software development solutions and consulting provider, deeply committed to helping companies drive growth, revenue and marketplace value. Since 2008, delaPlex's objective has been to be a trusted advisor to its clients. By redefining the outsourcing industry's business model, the innovative delaPlex Agile Business Framework brings an unmatched alliance of industry experts, across industries and functional skillsets, to clients anywhere around the world.
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...