Click here to close now.

Welcome!

Open Source Cloud Authors: Bart Copeland, Carmen Gonzalez, Pat Romanski, Liz McMillan, Larry Dragich

Blog Feed Post

Quick History: glm()

by Joseph Rickert I recently wrote about some R resources that are available for generalized linear models (GLMs). Looking over the material, I was amazed by the amount of effort that is continuing to go into GLMs, both with with respect to new theoretical developments and also in response to practical problems such as the need to deal with very large data sets. (See packages biglm, ff, ffbase, RevoScaleR for example.) This led me to wonder about the history of the GLM and its implementations. An adequate exploration of this topic would occupy a serious science historian (which I am definitely not) for a considerable amount of time. However, I think even a brief look at what apears to be the main line of the development of the GLM in R provides some insight into how good software influences statistical practice. A convenient place to start is with the 1972 paper Generalized Linear Models by Nelder and Wedderburn This seems to be the first paper  to give the GLM a life of its own.  The authors pulled things together by: grouping the Normal, Poisson, Binomial (probit) and gamma distributions together as members of the exponential family applying maximum likelihood estimation via the iteratively reweighted least squares algorithm to the family introducing the terminology “generalized linear models” suggesting  that this unification would be a pedagogic improvement that would “simplify the teaching of the subject to both specialists and non-specialists” It is clear that the GLM was not “invented” in 1972. But, Nelder and Wedderburn were able to package up statistical knowledge and a tradition of analysis going pretty far back in a way that will forever shape how statisticians think about generalizations of linear models. For a brief, but fairly detailed account of the history of the major developments in the in categorical data analysis, logistic regression and loglinear models in the early 20th century leading up to the GLM see Chapter 10 of Agresti 1996. (One very interesting fact highlighted by Agresti is that the iteratively reweighted least squares algorithm that Nelder and Weddergurn used to fit GLMs is the method that R.A. Fisher introduced in 1935 to for fitting probit models by means of maximum likelihood.) The first generally available software to implement a wide range of GLMs seems to have been the Fortran based GLIM system which was developed by the Royal Statistical Society’s Working Party on Statistical Computing, released in 1974 and developed through 1993. My guess is that GLIM dominated the field for nearly 20 years until it was eclipsed by the growing popularity of the 1991 version of S, and the introduction of PROC GENMOD in version 6.09 of SAS that was released in the 1993 timeframe. (Note that the first edition of the manual for the MatLab Statistics Toolbox also dates from 1993.) In any event, in the 1980s, the GLM became the “go to” statistical tool that it is today. In the chapter on Generalized Linear Models that they contributed to Chambers and Hastie’s landmark 1992 book, Hastie and Pregibon write that “GLMS have become popular over the past 10 years, partly due to the computer package GLIM …” It is dangerous temptation to attribute more to a quotation like this than the authors intended. Nevertheless, I think it does offer some support for the idea that in a field such as statistics, theory shapes the tools and then the shape of the tools exerts some influence on how the theory develops. R’s glm() function was, of course,  modeled on the S implementation, The stats package documentation states: The original R implementation of glm was written by Simon Davies working for Ross Ihaka at the University of Auckland, but has since been extensively re-written by members of the R Core team.The design was inspired by the S function of the same name described in Hastie & Pregibon (1992). I take this to mean that the R implementation of glm() was much more than just a direct port of the S code. glm() has come a long way. It is very likely that only the SAS PROC GENMOD implementation of the GLM has matched R’s glm()in popularity over the past decade. However, SAS’s closed environment has failed to match open-source R’s ability to foster growth and stimulate creativity. The performance, stability and rock solid reliability of glm() has contributed to making GLMs a basic tool both for statisticians and for the new generation of data scientists as well.   How GLM implementations will develop outside of R in the future is not clear at all. Python’s evolving glm implementation appears to be in the GLIM tradition. (The Python documentation references the paper by Green (1984) which, in-turn, references GLIM.) Going back to first principles is always a good idea, however Python's GLM function apparently only supports one parameter exponential families. The Python developers have a long way to go before they can match R's rich functionality.The Julia glm function is clearly being modeled after R and shows much promise. However, recent threads on the julia-stats google group forum indicate that the Julia developers are just now beginning to work on basic glm() functionality. ReferencesAgresti, Alan, An Introduction to Categorical Data Analysis: John Wiley and Sons (1996)Chambers, John M. and Trevor J. Hastie (ed.), Statistical Models In S: Wadsworth & Brooks /Cole (1992)Green, P.J., Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives: Journal of the Royal Statistical Society, Series (1984)McCullagh, P. and J. A. Nelder. Generalized Linear Models: Chapman & Hall (1990)Nelder, J.A and R.W.M. Wedderburn, Generalized Linear Models: K. R. Statist Soc A (1972), 135, part 3, p. 370

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
"In the IoT space we are helping customers, mostly enterprises and industry verticals where time-to-value is critical, and we help them with the ability to do faster insights and actions using our platform so they can transform their business operations," explained Venkat Eswara, VP of Marketing at Vitria, in this SYS-CON.tv interview at @ThingsExpo, held June 9-11, 2015, at the Javits Center in New York City.
SYS-CON Events announced today that WHOA.com, an ISO 27001 Certified secure cloud computing company, participated as “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which took place June 9-11, 2015, at the Javits Center in New York City, NY. WHOA.com is a leader in next-generation, ISO 27001 Certified secure cloud solutions. WHOA.com offers a comprehensive portfolio of best-in-class cloud services for business including Infrastructure as a Service (IaaS), Secure Cloud Desktop, Cloud Storage, Disaster Recovery, Integrated Applications and Security.
SYS-CON Events announced today that Intelligent Systems Services will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Established in 1994, Intelligent Systems Services Inc. is located near Washington, DC, with representatives and partners nationwide. ISS’s well-established track record is based on the continuous pursuit of excellence in designing, implementing and supporting nationwide clients’ mission-critical systems. ISS has completed many successful projects in Healthcare, Commercial, Manu...
The Internet of Things is not only adding billions of sensors and billions of terabytes to the Internet. It is also forcing a fundamental change in the way we envision Information Technology. For the first time, more data is being created by devices at the edge of the Internet rather than from centralized systems. What does this mean for today's IT professional? In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists addressed this very serious issue of profound change in the industry.
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal an...
SYS-CON Events announced today that kintone has been named “Bronze Sponsor” of SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. kintone promotes cloud-based workgroup productivity, transparency and profitability with a seamless collaboration space, build your own business application (BYOA) platform, and workflow automation system.
Discussions about cloud computing are evolving into discussions about enterprise IT in general. As enterprises increasingly migrate toward their own unique clouds, new issues such as the use of containers and microservices emerge to keep things interesting. In this Power Panel at 16th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the state of cloud computing today, and what enterprise IT professionals need to know about how the latest topics and trends affect their organization.
SYS-CON Events announced today that SoftLayer, an IBM company, has been named “Gold Sponsor” of SYS-CON's 17th International Cloud Expo®, which will take place November 3–5, 2015 at the Santa Clara Convention Center in Santa Clara, CA. SoftLayer operates a global cloud infrastructure platform built for Internet scale. With a global footprint of data centers and network points of presence, SoftLayer provides infrastructure as a service to leading-edge customers ranging from Web startups to global enterprises. SoftLayer’s modular architecture, full-featured API, and sophisticated automation pro...
The enterprise market will drive IoT device adoption over the next five years. In his session at @ThingsExpo, John Greenough, an analyst at BI Intelligence, division of Business Insider, analyzed how companies will adopt IoT products and the associated cost of adopting those products. John Greenough is the lead analyst covering the Internet of Things for BI Intelligence- Business Insider’s paid research service. Numerous IoT companies have cited his analysis of the IoT. Prior to joining BI Intelligence, he worked analyzing bank technology for Corporate Insight and The Clearing House Payment...
17th Cloud Expo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises are using some form of XaaS – software, platform, and infrastructure as a service.
SYS-CON Events announced today that CommVault has been named “Bronze Sponsor” of SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. A singular vision – a belief in a better way to address current and future data management needs – guides CommVault in the development of Singular Information Management® solutions for high-performance data protection, universal availability and simplified management of data on complex storage networks. CommVault's exclusive single-platform architecture gives companies unp...
"ciqada is a combined platform of hardware modules and server products that lets people take their existing devices or new devices and lets them be accessible over the Internet for their users," noted Geoff Engelstein of ciqada, a division of Mars International, in this SYS-CON.tv interview at @ThingsExpo, held June 9-11, 2015, at the Javits Center in New York City.
Buzzword alert: Microservices and IoT at a DevOps conference? What could possibly go wrong? In this Power Panel at DevOps Summit, moderated by Jason Bloomberg, the leading expert on architecting agility for the enterprise and president of Intellyx, panelists peeled away the buzz and discuss the important architectural principles behind implementing IoT solutions for the enterprise. As remote IoT devices and sensors become increasingly intelligent, they become part of our distributed cloud environment, and we must architect and code accordingly. At the very least, you'll have no problem fillin...
The 17th International Cloud Expo has announced that its Call for Papers is open. 17th International Cloud Expo, to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, APM, APIs, Microservices, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!
The 5th International DevOps Summit, co-located with 17th International Cloud Expo – being held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Among the proven benefits, DevOps is corr...
Internet of Things (IoT) will be a hybrid ecosystem of diverse devices and sensors collaborating with operational and enterprise systems to create the next big application. In their session at @ThingsExpo, Bramh Gupta, founder and CEO of robomq.io, and Fred Yatzeck, principal architect leading product development at robomq.io, discussed how choosing the right middleware and integration strategy from the get-go will enable IoT solution developers to adapt and grow with the industry, while at the same time reduce Time to Market (TTM) by using plug and play capabilities offered by a robust IoT ...
SYS-CON Events announced today that Secure Infrastructure & Services will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Secure Infrastructure & Services (SIAS) is a managed services provider of cloud computing solutions for the IBM Power Systems market. The company helps mid-market firms built on IBM hardware platforms to deploy new levels of reliable and cost-effective computing and high availability solutions, leveraging the cloud and the benefits of Infrastructure-as-a-Service (IaaS...
Today air travel is a minefield of delays, hassles and customer disappointment. Airlines struggle to revitalize the experience. GE and M2Mi will demonstrate practical examples of how IoT solutions are helping airlines bring back personalization, reduce trip time and improve reliability. In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect with GE, and Dr. Sarah Cooper, M2Mi’s VP Business Development and Engineering, will explore the IoT cloud-based platform technologies driving this change including privacy controls, data transparency and integration of real time context wi...
The basic integration architecture, as defined by ESBs, hasn’t changed for more than a decade. Most cloud integration providers still rely on an ESB architecture and their proprietary connectors. As a result, enterprise integration projects suffer from constraints of availability and reliability of these connectors that are not re-usable across other integration vendors. However, the rapid adoption of APIs and almost ubiquitous availability of APIs amongst most SaaS and Cloud applications are rapidly redefining traditional integration approaches and their reliance on proprietary connectors. ...
SYS-CON Events announced today that Dyn, the worldwide leader in Internet Performance, will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Dyn is a cloud-based Internet Performance company. Dyn helps companies monitor, control, and optimize online infrastructure for an exceptional end-user experience. Through a world-class network and unrivaled, objective intelligence into Internet conditions, Dyn ensures traffic gets delivered faster, safer, and more reliably than ever.