Welcome!

Open Source Cloud Authors: Elizabeth White, Liz McMillan, Yeshim Deniz, Zakia Bouachraoui, Pat Romanski

Related Topics: @CloudExpo, Microsoft Cloud, Open Source Cloud

@CloudExpo: Blog Post

Solr vs Azure Search

Search-as-a-service from Microsoft Azure

Microsoft Azure, a cloud platform, is rapidly expanding its scope to include newer enterprise class services. Some of the significant new additions are:

  • Azure Search: Azure Search Service is a fully managed, cloud-based service that allows developers to build rich search applications using REST APIs. It includes full-text search scoped over your content, plus advanced search behaviors similar to those found in commercial web search engines, such as type-ahead, suggested queries based on near matches, and faceted navigation.
  • Azure Machine Learning: Azure Machine Learning makes it possible for people without deep data science backgrounds to start mining data for predictions. ML Studio, an integrated development environment, uses drag-and-drop gestures and simple data flow graphs to set up experiments. For many tasks, you don't have to write a single line of code.
  • Azure Stream Analytics: Azure Stream Analytics is a fully managed service providing low latency, highly available, scalable complex event processing over streaming data in the cloud.

All these new services with a road map for new ones will position Azure as a leading platform in the enterprise adoption of PaaS.

In the following notes, I compare the open source search platform Solr against the capabilities of Azure Search services and note some advantages enterprises may derive by adopting the PaaS implementation of search.

Solr Features Compared with Azure Search
Solr is a fast open source enterprise search platform from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.

The following are the some of the aspects in the usage of Solr in enterprises against that of Azure Search. As the open source vs commercial software is a religious debate, the intent is not aimed at the argument, as the most enterprises define their own IT Policies between the choice of Open Source vs commercial products and same sense will prevail here also, the below notes are meant for understanding the new Azure service in the light of an existing proven search platform.

Feature

Usage In Solr

Usage In Azure Search

Installation & Setup

While Solr can be installed as a self-contained engine by using Jetty. Most sites utilize Tomcat as the container for the Solr web application.

As typical of many open source products, there are few more dependencies like Apache Commons, SLF4J and JDK needs to be installed as part of setup.

Being a PaaS platform, Azure Search is a fully managed and readily available service and any of the internal dependencies are managed by the Azure platform.

Schema

Solr works on a pre defined schema and every Solr instance of Solr requires a schema.xml file, which provides the structure of the documents that will be stored as part of that instance.

As typical of any database schema, this consists of two major sections.

Types section - Definition for all types.

Fields Section - Definition of document structures using types.

Solr also supports a Schema less mode , Solr's dynamic field capability reduces up-front configuration requirements for fields with predictable naming patterns. For example, the following dynamic field definition maps any field name with suffix "_i" to the "int" field type.

In Azure Search, a JSON schema that defines the index is needed. The schema specifies the field-attribute combinations supported in your search application. Fields contain searchable data, such as product names, descriptions, customer comments, brands, prices, promotional notifications, and so forth. Attributes inform the types of operations that can be performed. Examples of the more commonly used attributes include whether a field supports full-text search (searchable=true), filters (filterable=true), or facets (facetable=true).

Azure Search uses most typical enterprise data types like Edm.String, Collection(Edm.String), Edm.DateTimeOffset, Edm.Int32.

At this time there is no clear cut documentation on Schema less operations in Azure Search, but mostly this feature can be work around with appropriate field naming conventions.

Document Ingestion (Loading)

Solr provides command line utilities that will help in loading the documents.

There is a also Web Service api which can be invoked for Updating and deleting specific documents.

Solr schema defines a primary key for the document collection, which will be used for Update decisions.

We can upload, merge or delete documents from a specified index using HTTP POST. For large numbers of updates, batching of documents (up to 1000 documents per batch, or about 16 MB per batch) is recommended.

Much like Solr the request pay load will contain a "key_field_name" to uniquely identify a document for updating requests.

Azure Search supports, upload: An upload action is similar to an "upsert" where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case.

Searching Documents

Solr is built for searching and hence has rich set of features to support search.

 

  • Faceted Searching based on unique field values, explicit queries, date ranges, numeric ranges or pivot
  • Spelling suggestions for user queries
  • Auto-suggest functionality for completing user queries
  • Simple join capability between two document types
  • Numeric field statistics such as min, max, average, standard deviation

 

  • Function Query - influence the score by user specified complex functions of numeric fields or query relevancy scores.

 

  • More Like This suggestions for given document

To query your search data, your application sends a request that includes the service URL and an api-key for authenticating the request, along with a search query formulated from either OData syntax or a simple query syntax that provides the same functionality. When a query is sent to the Search API, the search engine in Azure Search processes the query and returns the results in a JSON document which can then be parsed and added to the presentation layer of your application.

Azure Search uses a simple query syntax for search text. This syntax is designed to be end-user friendly and is processed in a way that is tolerant to errors.

Azure Search supports a subset of the OData expression syntax for $filter.

Some of the salient features of Solr are also fully supported in Azure Search.

  • Full-text search
  • Scoring profiles
  • Faceted navigation
  • Suggestions for type-ahead or autocomplete
  • Count of the search hits returned for a query
  • Highlighted hits

Value Proposition for Azure Search
As we see from above, Azure Search tries to match the features of Solr in most aspects, however Solr is a seasoned search engine and Azure Search is in its preview stage, so some small deficiencies may occur in the understanding and proper application of Azure Search, however there is one area where the Azure Search may be a real winner for enterprises, which is ‘Scalability & Availability'.

Solr installation require highly competent administrator to ensure that Solr installations scales to 10s of 1000s of documents and yet the searches are load balanced against multiple nodes and the performance is not affected.

Solr adopts a number of features to support this level of massive scalability.

When your data is too large for one node, you can break it up and store it in sections by creating one or more shards. Each is a portion of the logical index, or core, and it's the set of all nodes containing that section of the index.

SolrCloud is the name of a set of new distributed capabilities in Solr. Passing parameters to enable these capabilities will enable you to set up a highly available, fault tolerant cluster of Solr servers. Use SolrCloud when you want high scale, fault tolerant, distributed indexing and search capabilities.

Implementing SolrCloud and associated maintenance requires good knowledge from administrators.

However Azure Search, really makes scalability a much simpler thing. When we provision a new Azure Search service, the following building blocks are automatically managed. A Standard search is allocated in user-defined bundles of partitions (storage) and replicas (service workloads). You can scale up on partitions or replicas independently, adding more of whatever resource is needed.

Every search service starts with a minimum of one replica and one partition. If you signed up for dedicated resources using the Standard pricing tier, you can click the SCALE tile in the service dashboard to readjust the number of partitions and replicas used by your service. When you add either resource, the service uses them automatically. No further action is required on your part.

Increasing queries per second (QPS) or achieving high availability is done by adding replicas. Each replica has one copy of an index, so adding one more replica translates to one more index that can be used to service query requests. Currently, the rule of thumb is that you need at least 3 replicas for high availability.

Most service applications have a built-in need for more replicas rather than partitions, as most applications that utilize search can fit easily into a single partition that can support up to 15 million documents. For those cases where an increased document count is required, you can add partitions.

Summary
As always utilizing a commercial PaaS option comes with a price, but enterprises do find a trade-off between the ease of maintenance and quick go to market on choosing a managed platform versus self-maintained products. Also Azure Search is currently in the beta and hence we may have to wait for deploying mission critical and production applications, but it is worth to get started with pilot projects and it will be in the best interest of Microsoft to quickly make the service to mission critical standards.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

IoT & Smart Cities Stories
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Early Bird Registration Discount Expires on August 31, 2018 Conference Registration Link ▸ HERE. Pick from all 200 sessions in all 10 tracks, plus 22 Keynotes & General Sessions! Lunch is served two days. EXPIRES AUGUST 31, 2018. Ticket prices: ($1,295-Aug 31) ($1,495-Oct 31) ($1,995-Nov 12) ($2,500-Walk-in)
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...