Welcome!

Open Source Cloud Authors: William Schmarzo, Elizabeth White, Yeshim Deniz, Pat Romanski, Greg Schulz

Blog Feed Post

Quick History: glm()

by Joseph Rickert I recently wrote about some R resources that are available for generalized linear models (GLMs). Looking over the material, I was amazed by the amount of effort that is continuing to go into GLMs, both with with respect to new theoretical developments and also in response to practical problems such as the need to deal with very large data sets. (See packages biglm, ff, ffbase, RevoScaleR for example.) This led me to wonder about the history of the GLM and its implementations. An adequate exploration of this topic would occupy a serious science historian (which I am definitely not) for a considerable amount of time. However, I think even a brief look at what apears to be the main line of the development of the GLM in R provides some insight into how good software influences statistical practice. A convenient place to start is with the 1972 paper Generalized Linear Models by Nelder and Wedderburn This seems to be the first paper  to give the GLM a life of its own.  The authors pulled things together by: grouping the Normal, Poisson, Binomial (probit) and gamma distributions together as members of the exponential family applying maximum likelihood estimation via the iteratively reweighted least squares algorithm to the family introducing the terminology “generalized linear models” suggesting  that this unification would be a pedagogic improvement that would “simplify the teaching of the subject to both specialists and non-specialists” It is clear that the GLM was not “invented” in 1972. But, Nelder and Wedderburn were able to package up statistical knowledge and a tradition of analysis going pretty far back in a way that will forever shape how statisticians think about generalizations of linear models. For a brief, but fairly detailed account of the history of the major developments in the in categorical data analysis, logistic regression and loglinear models in the early 20th century leading up to the GLM see Chapter 10 of Agresti 1996. (One very interesting fact highlighted by Agresti is that the iteratively reweighted least squares algorithm that Nelder and Weddergurn used to fit GLMs is the method that R.A. Fisher introduced in 1935 to for fitting probit models by means of maximum likelihood.) The first generally available software to implement a wide range of GLMs seems to have been the Fortran based GLIM system which was developed by the Royal Statistical Society’s Working Party on Statistical Computing, released in 1974 and developed through 1993. My guess is that GLIM dominated the field for nearly 20 years until it was eclipsed by the growing popularity of the 1991 version of S, and the introduction of PROC GENMOD in version 6.09 of SAS that was released in the 1993 timeframe. (Note that the first edition of the manual for the MatLab Statistics Toolbox also dates from 1993.) In any event, in the 1980s, the GLM became the “go to” statistical tool that it is today. In the chapter on Generalized Linear Models that they contributed to Chambers and Hastie’s landmark 1992 book, Hastie and Pregibon write that “GLMS have become popular over the past 10 years, partly due to the computer package GLIM …” It is dangerous temptation to attribute more to a quotation like this than the authors intended. Nevertheless, I think it does offer some support for the idea that in a field such as statistics, theory shapes the tools and then the shape of the tools exerts some influence on how the theory develops. R’s glm() function was, of course,  modeled on the S implementation, The stats package documentation states: The original R implementation of glm was written by Simon Davies working for Ross Ihaka at the University of Auckland, but has since been extensively re-written by members of the R Core team.The design was inspired by the S function of the same name described in Hastie & Pregibon (1992). I take this to mean that the R implementation of glm() was much more than just a direct port of the S code. glm() has come a long way. It is very likely that only the SAS PROC GENMOD implementation of the GLM has matched R’s glm()in popularity over the past decade. However, SAS’s closed environment has failed to match open-source R’s ability to foster growth and stimulate creativity. The performance, stability and rock solid reliability of glm() has contributed to making GLMs a basic tool both for statisticians and for the new generation of data scientists as well.   How GLM implementations will develop outside of R in the future is not clear at all. Python’s evolving glm implementation appears to be in the GLIM tradition. (The Python documentation references the paper by Green (1984) which, in-turn, references GLIM.) Going back to first principles is always a good idea, however Python's GLM function apparently only supports one parameter exponential families. The Python developers have a long way to go before they can match R's rich functionality.The Julia glm function is clearly being modeled after R and shows much promise. However, recent threads on the julia-stats google group forum indicate that the Julia developers are just now beginning to work on basic glm() functionality. ReferencesAgresti, Alan, An Introduction to Categorical Data Analysis: John Wiley and Sons (1996)Chambers, John M. and Trevor J. Hastie (ed.), Statistical Models In S: Wadsworth & Brooks /Cole (1992)Green, P.J., Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives: Journal of the Royal Statistical Society, Series (1984)McCullagh, P. and J. A. Nelder. Generalized Linear Models: Chapman & Hall (1990)Nelder, J.A and R.W.M. Wedderburn, Generalized Linear Models: K. R. Statist Soc A (1972), 135, part 3, p. 370

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
DevOps at Cloud Expo – being held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real r...
SYS-CON Events announced today that Systena America will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Systena Group has been in business for various software development and verification in Japan, US, ASEAN, and China by utilizing the knowledge we gained from all types of device development for various industries including smartphones (Android/iOS), wireless communication, security technology and IoT serv...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in compute, storage and networking technologies, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/...
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
SYS-CON Events announced today that Peak 10, Inc., a national IT infrastructure and cloud services provider, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Peak 10 provides reliable, tailored data center and network services, cloud and managed services. Its solutions are designed to scale and adapt to customers’ changing business needs, enabling them to lower costs, improve performance and focus intern...
The 21st International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive ad...
Everywhere we turn in our industry we can find strong opinions about the direction, type and nature of cloud’s impact on computing and business. Another word that is used in every context in our industry is “hybrid.” In his session at 20th Cloud Expo, Alvaro Gonzalez, Director of Technical, Partner and Field Marketing at Peak 10, will use a combination of a few conceptual props and some research recently commissioned by Peak 10 to offer a real-world consideration of how the various categories of...
In his opening keynote at 20th Cloud Expo, Michael Maximilien, Research Scientist, Architect, and Engineer at IBM, will motivate why realizing the full potential of the cloud and social data requires artificial intelligence. By mixing Cloud Foundry and the rich set of Watson services, IBM's Bluemix is the best cloud operating system for enterprises today, providing rapid development and deployment of applications that can take advantage of the rich catalog of Watson services to help drive insigh...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will look at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deli...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Five years ago development was seen as a dead-end career, now it’s anything but – with an explosion in mobile and IoT initiatives increasing the demand for skilled engineers. But apart from having a ready supply of great coders, what constitutes true ‘DevOps Royalty’? It’ll be the ability to craft resilient architectures, supportability, security everywhere across the software lifecycle. In his keynote at @DevOpsSummit at 20th Cloud Expo, Jeffrey Scheaffer, GM and SVP, Continuous Delivery Busine...
SYS-CON Events announced today that WineSOFT will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Based in Seoul and Irvine, WineSOFT is an innovative software house focusing on internet infrastructure solutions. The venture started as a bootstrap start-up in 2010 by focusing on making the internet faster and more powerful. WineSOFT’s knowledge is based on the expertise of TCP/IP, VPN, SSL, peer-to-peer, mob...
SYS-CON Events announced today that EARP Integration will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. EARP Integration is a passionate software house. Since its inception in 2009 the company successfully delivers smart solutions for cities and factories that start their digital transformation. EARP provides bespoke solutions like, for example, advanced enterprise portals, business intelligence systems an...
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @CloudExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.