Leveraging Statistical Analytical Software to Predict Presidential Polling Results

LeveragingStatistical Analytical Software to Predict Presidential PollingResults


Inthe past prediction of elections, there are various instances whereinaccurate results have been predicted by various famous forecastingcompanies in the United States. These predictions have a strong willof changing the perception of various voters across these states.Most of the organization therefore rely on the prediction to plan onhow to vote. Due to the current technological development, it isimportant to institute some of the state of art techniques in theanalysis of the electoral results that would help in the informedprediction of these poll results. The prediction of accurate resultsby the forecasting companies has the effect of maintaining the goodreputation of the company hence its continuity. One of the techniquesthat have been used in this advent is the inception of theStatistical Analysis Software SAS. The hypothesis of this research isto test whether the SAS can be used to improve both the accuracy andauthenticity in the prediction of the presidential polls results overthe years. The research confirms that indeed this modern techniquehas helped in creating high level of in depth analysis and theprofound presentation of the data in question. Some of the SAS topthree way in which the data have been analyzed represented are suchas the G-bar graph, G-map and the Tree Map. The representation givesthe data a wide variety of perceptive analysis which allows forgaining of full knowledge of the whole context of presidential pollsanalysis.

Inthe historical polling of the presidential elections, there has beendiscrepancies levelled on the polling results which indeed affectsthe authenticity of the polling companies. Inaccuracies have beenseen in various polling prediction companies who predicts theseresults. Some of the reasons for these inaccuracies is the inceptionof political influence, the improper collection of the sample dataand the less informed analysis of the data in of the polling sample.The authenticity of every opinion poll company rests upon theauthenticity of the results in question. In this context, thesecompanies need to device on the best way to leverage the predictionof the polling results to allow them be on the business consistently.The accuracy therefore is essential for every company as it providesa platform for authenticity thereby giving improving the reputationof these companies. This research paper looks into how the modernstatistical analysis software has helped in ensuring that theelection polls are predicted in a more accurate way. It givesanalysis into the ways in which these models have been used toproduce and predict the most accurate results for the elections.

TheThesis Statement

Historically,presidential election polling systems have proven to be grosslyinaccurate. There is the question of how expert forecasters canleverage modern Statistical Analysis Software to improve electionforecasts. There is need to streamline the forecasting standards ofthese elections so as to maintain the credibility and reputation ofthe poll forecasting companies. This research unveils some of thepertinent tenets of the statistical analysis software models that areinvariably used in ensuring the companies have the best connotationsin terms of determining the polls. One of the research questions thatthis paper is keen on is how the predictability of the countrieselections can be improved to the accurate stance using the modernstatistical techniques

TheResearch Objectives

Someof the aims and goals of this research are as outline below,

  • To investigate on the main causes of inaccuracy in the prediction of the poll results during the presidential campaigns

  • To provide the insight on how the SAS models could be used to ensure that the accurate results are reached while carrying out the forecasting of the opinion polls

  • To provide a background study on the success and failure of the modern statistical techniques in ensuring that the accuracy of the forecasting of the results of the presidential elections


Overthe years, there has been a huge difference in terms of thepredication of the election results and the actual poll results. Thisdiscrepancy is triggered by many factors that lead to the inaccurateresults.

1936Presidential Election Alf Landon beats FDR

Consideredto be one of the leading and prestigious publishers at its time TheLiterary Digest (TLD)had existed for nearly 50 years earning the prestige of accuratelypredicting every presidential election from 1920 to 1932.Unsurprisingly, their presidential election polling system/techniques was widely considered to be the authority or “goldstandard” for how all premium publishers should conduct surveys(Sheu,Chen, Su, &amp Wang, 2005).As one can foresee, SAS modeling would have saved TLD significantembarrassment and much more. The integration of this software couldhave created a platform for profound investigations in to the realraw data of the past presidential results and given a much wider lookinto the whole context of statistical analysis. Through the model, itwould have scrutinized some of the areas in which the probabilisticcomparison would or would not work. The reputation of any forecastingand research company is based on the authenticity of the results inquestion. The TLD could have built its reputation and would not haveran out of business since if at all they used the SAS model topredict these results. The SAS model therefore is seen to create agood leverage on the connotations of the prediction of thepresidential results.

Unrivaledat its time, The Literary Digest boasted a sample size of over 2.4million respondents out of 10 million people polled. All surveyedindividuals were derived from their subscriber base, and orregistered automobile owners and telephone users. The resultspredicated Landon would win in a landslide with over 57% of thepopular vote (Sheu,Chen, Su, &amp Wang, 2005).Should TLD been accurate, imagine the prodigious impact on US andWorld history FDR’s New Deal wouldn’t have passed (includingSocial Security), Alf would have lead us during WW2 and no SEC tobridle a runaway WallSt. Until a paradigm shift occurred from anunlikely source. The paradigm shift had a quite a negative impact onTLD since it was contrary to the connotation of this company. it isto this effect that most of the people lost faith and belief in theforecasting company. This typical example should that credibility,authenticity and accuracy are some of the rule of the game in thisbusiness. The state of art analysis of the election data is veryessential as it will give the company back its reputation throughhigh level of accuracy in the depiction of results.

A2-person start-up research/polling LLC with less than a years’worth of experience decisively outsmarted TheLiterary Digest inthe 1936 election. They accurately predicted FDR’s landmarklandslide election victory (actual results: FDR 60%, Landon 36.5%popular vote). More surprisingly, suffering from significantly lesscapital, the small LLC (now named Gallop, Inc) only polled 50,000people compared to TLD’s 10 Million pollsters. Three majorconsequences developed from this momentous upset 1). The LiteraryDigest folded its business the following year 2). It debunked theonce often cited polling adage that sample size is everything 3). Itushered a new age of public opinion polls backed by more scientificmethodologies eventually yielding to the advent of SAS models (Sheu,Chen, Su, &amp Wang, 2005).The SAS model has the ability to integrate even the most negligibledata for the analysis to came an informed decision. it ensures thatnot much of the data is left out in the context of the analysis. Thisnotion has the effect of ensuring that the company not only have theprofound way of decision making but it also has the accurate resultsfor the whole process of forecasting.

Anecdotally,The Gallop Poll suffered its own major embarrassment when it famouslyincorrectly predicted Dewey to triumph Truman in the 1948 election.One of the main reasons why Gallop postulated an inaccurateprediction was due to the fact that the company had not adoptedbetter statistical methods for informed analysis. The analysis wasbased on the old statistical methods with limited amount of datacapture and simultaneous analysis. The SAS has an interactiveplatform fitted with decision support systems that enables the enduser interaction with it to be quite intuitive. The intuitiveinteraction allows for analysis of possible outcome for the samedecision which indeed helps in providing informed decisions for wholeprediction process.

TheEmpirical Model

2012Presidential Election Romney vs Obama

Modernresearch institutions and professional polling firms have certainlyimproved their techniques since the days of the FDR and Trumanelections. However, biased polling, poor scientific methodologies,and lack of transparency have still lead to election pollinginaccuracies. The consequences of these inaccuracies cannot beunderrated as they affect the reputation of these predictioncompanies and hence their profitability is compromised. It isimportant therefore to devise some of the informed ways in whichthese inaccuracies could be minimized. The minimization is verycritical as it will allow these companies to stay in the businessunswervingly. The hypothesis of this research is that SAS modelingwill materially diminish these accuracies. In this research, the useof the SAS modelling has been described on how it helps in revealingthe most accurate results for the prediction of the presidentialelections. In the next methodology section, the top 3 ways on how theSAS is employed are discussed and their merits given in comprehensivedetails. The research is uses the data of presidential election ofthe year 2012 where the main contestants were the current presidentBarrack Obama and his opponent, Mitt Romney.



Inthis research SAS modelling is put perspective and how it can help inrevealing more accurate results for prediction of the presidentialelections. Statistical Analysis Software SAS modelling in itsimmediate context is the software that has the capability of datamining, managing, retrieving and altering the data to the desirednature in order the true and informed results of the whole data set.In this context, the software has the capability of ensuring that itintegrates various datasets into one and analyze them in accordanceto the specifications of the end user (Sheu,Chen, Su, &amp Wang, 2005).The intuitive interaction platform has helped it in ensuring that themost accurate results are obtained. The software additionally hasprofound capabilities of performing high level statistical techniqueswhich are indeed quite important for the inception of an informeddecision on the statistical analysis and results. SAS also providesthe point and click graphical user interphase which allows thenon-technical users to understand and interpret the whole context ofthe results from. This capability of making the high volumes of datalook quite simple helps the user to quickly understand the wholecontext of data examination.

TheSAS software program has the DATA phase which is responsible for theretrieving of the information from the various sources. It thecarries out the manipulation of the data. This manipulation includesthe instinctive removal of the outliers of the data that might bringon board the advent of discrepancies of the whole dataset and itsinterpretation. In many occasions, the utilization of the DATA stephelps in the creation of the SAS dataset. Lastly, the PROC step isdeemed to analyze the date in its immediate context (Sheu,Chen, Su, &amp Wang, 2005).The role of the SAS here is to ensure that all the important facetsof the data is addressed in the a more detailed manner. Thisconnotation is strengthened by the fact that most dataset are deemedto be in a critical relationship which indeed helps in provisionsrequired for the analysis. Some of the example of the SAS examplesare such as the SPSS, STATA, and MAXSTAT among others. The softwarepackages have made the statistical analysis quite easy which createthe high level of authenticity to the whole context of production ofresults.

TheSAS that is used in the research is SPSS. The excel data attached tothis document content the election results from the year 1940 to theyear 2012. The main aim for the data analysis is to understand theextent to which the votes from various political parties can be usedto predict those who would not turn out vote. Three sets of datawould be used here the data about the voters who did not turn out tovote (NV), Republican voters (RV), and the Democratic voters (DV). Aregression model would be used here so as to help in understandingthe whole context of relationship between the voter turnout and thosewho do not turn out to vote. The main aim of this analysis is tounderstand the extent to which the sample of projection could bereduced to help get the exact opinion of the citizens on the voting.The nonvoters would be the dependent variable while the democraticand the republican voters would be independent variable. Theregression model is as shown below.

NV= a + RV + DV + e


NV= nonvoters (dependent variable)

a= autonomous variable

RV= republican voters

DV= Democratic voters

e= stochastic variable

Thetrend analysis for the voter turnout and according the politicalparty since 1940 -2012 are is as shown below.

DataAnalysis and Representation

Themain data that have been used here are the past presidential electionprediction results as outline by the famous forecasting companieslike the Huffington Post and the CNN. Some of the top three dataanalysis and representation techniques that have been utilized in theanalysis are the SAS/GRAPH G-map and the G-Chart Bar chart which arecombined on the same page to show both the graphical distribution andthe geographical distribution of these possible votes from variousregions in the united states. In carrying out the composition of thewhole context of SAS software, it is important to note that in depthanalysis could be done through the scrutiny of these data. the secondtype of modelling that is done through SAS software is through therepresentation of the elections possible results through the regionaland geographical coloring. Lastly, there is the representation ofthese data by regional boxes in accordance with the sizes of theelectoral votes casted in each state. The next section unveils thein-depth analysis of the three types of the data analysis techniquesusing the SAS software.

SAS/GRAPH,G-Bar Graph and G-Map Combined

ThisSAS analysis representation combines the bar graph is the electionparameters and the geographical map on how they are dispersed acrossthe United States. This representation is vital in unveiling thenumbers as well as the representation of the whole context ofpresidential election prediction. The data analyzed here emanatesfrom the various sources which are then combined to come up with thewhole context of both the regional and national level analysis of theelection for these presidential contestants. Below is therepresentation of this bar and map analysis of the President Obamaand his then rival contestant, Romney.

Fromthe above bar graph and map above, we can derive a lot of informationabout the presidential elections of the year 2012. One of them is thefact that the Romney who was republican captured the most of themiddle part of the United States while Obama who was Democrat tookmuch of the peripheral states. It is also evident that the most ofthe voting block leaned more to Obama than Romney. Thisrepresentation by SAS allows for interactive learning of thestrongholds of each presidential aspirant and allows for profoundanalysis into the whole context of presidential elections andprediction of the same. The trend brought forward has indeed helpedthe data analysis to be quite easy and the reasons for the failure ofsome presidential aspirants are well known. The analysis also showsclearly why Obama had high level of electoral votes than Romney. Oneof the reasons was the fact that the Democrats managed to capture themore than half of the regions where there were more electoral collegevotes. While the Republicans only concentrated on the middle part ofthe United states without taking keen concern of the variouselectoral college on the peripheral states. one weakness that thisrepresentation have is the fact that it only represents the regionalexpanse presentation but it does not give the real numbers of popularvotes that could be got from the elections in these regions. In shortit does not reveal the numbers got in the regions. This createsbiasness in terms of expanse as opposed to the votes casts therein ineach region.

ConventionalMap Representation

Inthis analysis, the election wins in each state is represented in amap and demarcated by colors in each region. the image of thepresidential aspirant is attached as the legend to surrounded withthe color they represent. This representation is very essential as ithelps in ensuring that one clearly understands the regions where eachaspirant got more electoral college votes. For this research, thedata for the 2000 election results are put into perspective andpresented in the form of a traditional colored map as shown below.

Fromthe above data analysis and representation, it is evident that Georgebush took most of the electoral votes as connotes by the fact thatwhole map is more red than blue as represented by the G-Map. However,it is important that the analysis representation has the biasness ofareas size and electoral votes. For example, there are some regionswhere there is less population while their geographical expanse isquite larger. While other areas especially near major towns andcities have the more votes thus making the representation quite bias.The matching of the color of the states win with the color of thepresidential aspirant’s frame makes it quite easy to determine theperson with the highest number of votes brought forward.

Useof Rectangular Tree Map

Anotherpertinent way in presenting the electoral data to predict election isthrough the tree map. One of the advantage of this representation isthe fact that it suits each regional with the size of electoral votesthat comes from each region. Therefore, each size of the box isproportional to the electoral votes got from electoral college.However, one disadvantage is that the geographical size of eachregion is lost making it again inadequate. The manipulation of thisdata by the SAS is very important as it helps the end user to get thewhole depiction of the data and the true position of the election forwhich country. The tree map diagram is as shown below.


Insummary, the SAS software has the capability of ensuring that thepresidential elections are analyzed with high level of accuracy.Integration of both the geographical and electoral college votes datato suit the analysis create a platform of authenticity. The softwareprompts the comprehensive collection of data to represent the wholecontext of applications on the platform for accurate results. Some ofthe techniques used by the SAS are the use of the combined G-map andthe G-bar graph to allow for extensive capture of every data for theanalysis. The next one is the geographical representation of theelectoral votes casted for each electoral presidential aspirant.Lastly we have the tree map which outlines them in terms of the datageographically in terms of sizes according to the sizes of theelectoral votes present for each region. In general, the inception ofthe SAS has enhanced a more informed predication of the presidentialresults.


Sheu,C. F., Chen, C. T., Su, Y. H., &amp Wang, W. C. (2005). Using SASPROC NLMIXED to fit item response theory models. BehaviorResearch Methods,37(2),202-218.