Click here to close now.

Welcome!

Open Source Authors: Roger Strukhoff, Elizabeth White, Carmen Gonzalez, Liz McMillan, Pat Romanski

Blog Feed Post

Quick History: glm()

by Joseph Rickert I recently wrote about some R resources that are available for generalized linear models (GLMs). Looking over the material, I was amazed by the amount of effort that is continuing to go into GLMs, both with with respect to new theoretical developments and also in response to practical problems such as the need to deal with very large data sets. (See packages biglm, ff, ffbase, RevoScaleR for example.) This led me to wonder about the history of the GLM and its implementations. An adequate exploration of this topic would occupy a serious science historian (which I am definitely not) for a considerable amount of time. However, I think even a brief look at what apears to be the main line of the development of the GLM in R provides some insight into how good software influences statistical practice. A convenient place to start is with the 1972 paper Generalized Linear Models by Nelder and Wedderburn This seems to be the first paper  to give the GLM a life of its own.  The authors pulled things together by: grouping the Normal, Poisson, Binomial (probit) and gamma distributions together as members of the exponential family applying maximum likelihood estimation via the iteratively reweighted least squares algorithm to the family introducing the terminology “generalized linear models” suggesting  that this unification would be a pedagogic improvement that would “simplify the teaching of the subject to both specialists and non-specialists” It is clear that the GLM was not “invented” in 1972. But, Nelder and Wedderburn were able to package up statistical knowledge and a tradition of analysis going pretty far back in a way that will forever shape how statisticians think about generalizations of linear models. For a brief, but fairly detailed account of the history of the major developments in the in categorical data analysis, logistic regression and loglinear models in the early 20th century leading up to the GLM see Chapter 10 of Agresti 1996. (One very interesting fact highlighted by Agresti is that the iteratively reweighted least squares algorithm that Nelder and Weddergurn used to fit GLMs is the method that R.A. Fisher introduced in 1935 to for fitting probit models by means of maximum likelihood.) The first generally available software to implement a wide range of GLMs seems to have been the Fortran based GLIM system which was developed by the Royal Statistical Society’s Working Party on Statistical Computing, released in 1974 and developed through 1993. My guess is that GLIM dominated the field for nearly 20 years until it was eclipsed by the growing popularity of the 1991 version of S, and the introduction of PROC GENMOD in version 6.09 of SAS that was released in the 1993 timeframe. (Note that the first edition of the manual for the MatLab Statistics Toolbox also dates from 1993.) In any event, in the 1980s, the GLM became the “go to” statistical tool that it is today. In the chapter on Generalized Linear Models that they contributed to Chambers and Hastie’s landmark 1992 book, Hastie and Pregibon write that “GLMS have become popular over the past 10 years, partly due to the computer package GLIM …” It is dangerous temptation to attribute more to a quotation like this than the authors intended. Nevertheless, I think it does offer some support for the idea that in a field such as statistics, theory shapes the tools and then the shape of the tools exerts some influence on how the theory develops. R’s glm() function was, of course,  modeled on the S implementation, The stats package documentation states: The original R implementation of glm was written by Simon Davies working for Ross Ihaka at the University of Auckland, but has since been extensively re-written by members of the R Core team.The design was inspired by the S function of the same name described in Hastie & Pregibon (1992). I take this to mean that the R implementation of glm() was much more than just a direct port of the S code. glm() has come a long way. It is very likely that only the SAS PROC GENMOD implementation of the GLM has matched R’s glm()in popularity over the past decade. However, SAS’s closed environment has failed to match open-source R’s ability to foster growth and stimulate creativity. The performance, stability and rock solid reliability of glm() has contributed to making GLMs a basic tool both for statisticians and for the new generation of data scientists as well.   How GLM implementations will develop outside of R in the future is not clear at all. Python’s evolving glm implementation appears to be in the GLIM tradition. (The Python documentation references the paper by Green (1984) which, in-turn, references GLIM.) Going back to first principles is always a good idea, however Python's GLM function apparently only supports one parameter exponential families. The Python developers have a long way to go before they can match R's rich functionality.The Julia glm function is clearly being modeled after R and shows much promise. However, recent threads on the julia-stats google group forum indicate that the Julia developers are just now beginning to work on basic glm() functionality. ReferencesAgresti, Alan, An Introduction to Categorical Data Analysis: John Wiley and Sons (1996)Chambers, John M. and Trevor J. Hastie (ed.), Statistical Models In S: Wadsworth & Brooks /Cole (1992)Green, P.J., Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives: Journal of the Royal Statistical Society, Series (1984)McCullagh, P. and J. A. Nelder. Generalized Linear Models: Chapman & Hall (1990)Nelder, J.A and R.W.M. Wedderburn, Generalized Linear Models: K. R. Statist Soc A (1972), 135, part 3, p. 370

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
The Open Compute Project is a collective effort by Facebook and a number of players in the datacenter industry to bring lessons learned from the social media giant's giant IT deployment to the rest of the world. Datacenters account for 3% of global electricity consumption – about the same as all of Switzerland or the Czech Republic -- according to people I met at the recent Open Compute Summit in San Jose. With increasing mobility at the edge of the cloud and vast new dataflows being predicted with the growth of the Internet of Things (and The Coming Age of Many Zettabytes) in the near...
GENBAND has announced that SageNet is leveraging the Nuvia platform to deliver Unified Communications as a Service (UCaaS) to its large base of retail and enterprise customers. Nuvia’s cloud-based solution provides SageNet’s customers with a full suite of business communications and collaboration tools. Two large national SageNet retail customers have recently signed up to deploy the Nuvia platform and the company will continue to sell the service to new and existing customers. Nuvia’s capabilities include HD voice, video, multimedia messaging, mobility, conferencing, Web collaboration, deskt...
The list of ‘new paradigm’ technologies that now surrounds us appears to be at an all time high. From cloud computing and Big Data analytics to Bring Your Own Device (BYOD) and the Internet of Things (IoT), today we have to deal with what the industry likes to call ‘paradigm shifts’ at every level of IT. This is disruption; of course, we understand that – change is almost always disruptive.
SYS-CON Events announced today that Cisco, the worldwide leader in IT that transforms how people connect, communicate and collaborate, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Cisco makes amazing things happen by connecting the unconnected. Cisco has shaped the future of the Internet by becoming the worldwide leader in transforming how people connect, communicate and collaborate. Cisco and our partners are building the platform for the Internet of Everything by connecting the...
SYS-CON Events announced today that robomq.io will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. robomq.io is an interoperable and composable platform that connects any device to any application. It helps systems integrators and the solution providers build new and innovative products and service for industries requiring monitoring or intelligence from devices and sensors.
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
Temasys has announced senior management additions to its team. Joining are David Holloway as Vice President of Commercial and Nadine Yap as Vice President of Product. Over the past 12 months Temasys has doubled in size as it adds new customers and expands the development of its Skylink platform. Skylink leads the charge to move WebRTC, traditionally seen as a desktop, browser based technology, to become a ubiquitous web communications technology on web and mobile, as well as Internet of Things compatible devices.
Docker is an excellent platform for organizations interested in running microservices. It offers portability and consistency between development and production environments, quick provisioning times, and a simple way to isolate services. In his session at DevOps Summit at 16th Cloud Expo, Shannon Williams, co-founder of Rancher Labs, will walk through these and other benefits of using Docker to run microservices, and provide an overview of RancherOS, a minimalist distribution of Linux designed expressly to run Docker. He will also discuss Rancher, an orchestration and service discovery platf...
Sonus Networks introduced the Sonus WebRTC Services Solution, a virtualized Web Real-Time Communications (WebRTC) offer, purpose-built for the Cloud. The WebRTC Services Solution provides signaling from WebRTC-to-WebRTC applications and interworking from WebRTC-to-Session Initiation Protocol (SIP), delivering advanced real-time communications capabilities on mobile applications and on websites, which are accessible via a browser.
SYS-CON Events announced today that Aria Systems, the leading innovator in recurring revenue, has been named “Bronze Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Proven by the world’s most demanding enterprises, including AAA NCNU, Constant Contact, Falck, Hootsuite, Pitney Bowes, Telekom Denmark, and VMware, Aria helps enterprises grow their recurring revenue businesses. With Aria’s end-to-end active monetization platform, global brands can get to market faster with a wider variety of products and services, while maximizin...
SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes for use cases across the industrial, enterprise, and consumer segments.
SYS-CON Events announced today that Akana, formerly SOA Software, has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Akana’s comprehensive suite of API Management, API Security, Integrated SOA Governance, and Cloud Integration solutions helps businesses accelerate digital transformation by securely extending their reach across multiple channels – mobile, cloud and Internet of Things. Akana enables enterprises to share data as APIs, connect and integrate applications, drive part...
After making a doctor’s appointment via your mobile device, you receive a calendar invite. The day of your appointment, you get a reminder with the doctor’s location and contact information. As you enter the doctor’s exam room, the medical team is equipped with the latest tablet containing your medical history – he or she makes real time updates to your medical file. At the end of your visit, you receive an electronic prescription to your preferred pharmacy and can schedule your next appointment.
The WebRTC Summit 2014 New York, to be held June 9-11, 2015, at the Javits Center in New York, NY, announces that its Call for Papers is open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 16th International Cloud Expo, @ThingsExpo, Big Data Expo, and DevOps Summit.
SYS-CON Events announced today that Solgenia will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Solgenia is the global market leader in Cloud Collaboration and Cloud Infrastructure software solutions. Designed to “Bridge the Gap” between Personal and Professional Social, Mobile and Cloud user experiences, our solutions help large and medium-sized organizations dr...
SYS-CON Events announced today that Liaison Technologies, a leading provider of data management and integration cloud services and solutions, has been named "Silver Sponsor" of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York, NY. Liaison Technologies is a recognized market leader in providing cloud-enabled data integration and data management solutions to break down complex information barriers, enabling enterprises to make smarter decisions, faster.
The 3rd International Internet of @ThingsExpo, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
SYS-CON Events announced today that CommVault has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. A singular vision – a belief in a better way to address current and future data management needs – guides CommVault in the development of Singular Information Management® solutions for high-performance data protection, universal availability and sim...
Cloud is not a commodity. And no matter what you call it, computing doesn’t come out of the sky. It comes from physical hardware inside brick and mortar facilities connected by hundreds of miles of networking cable. And no two clouds are built the same way. SoftLayer gives you the highest performing cloud infrastructure available. One platform that takes data centers around the world that are full of the widest range of cloud computing options, and then integrates and automates everything. Join SoftLayer on June 9 at 16th Cloud Expo to learn about IBM Cloud's SoftLayer platform, explore se...
SYS-CON Media announced today that 9 out of 10 " most read" DevOps articles are published by @DevOpsSummit Blog. Launched in October 2014, @DevOpsSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce softw...