{
"$type": "site.standard.document",
"canonicalUrl": "https://frankhecker.com/2010/11/16/exploring-howard-county-election-data-with-r-part-3/",
"path": "/2010/11/16/exploring-howard-county-election-data-with-r-part-3/",
"publishedAt": "2010-11-16T05:37:58.000Z",
"site": "at://did:plc:77mn3ult3b72tpvtqqva6tat/site.standard.publication/3mpfmfpu4u72n",
"tags": [
"howardcounty",
"politics"
],
"textContent": "In [part 1][] of this series I discussed downloading and installing the [R statistical package][R stati] and loading it with Howard County election data, and then in [part 2][] we began to explore how to use that data to estimate the percentages of voters in the 2010 general election who are Democrats, Republicans, or unaffiliated or members of other parties. In our initial explorations we discovered that the percentage of those voting who were Republicans seemed to be relatively static over the years.\n\nNow it’s time to continue our exploration, this time looking at the historical data for the percentage of voters who were Democrats or unaffiliated or other. Let’s repeat what we did for the Republican data, plotting the percentage of Democratic voters hgg$PctVotersD over the years:\n\nThe resulting graph shows a clear downward trend over the years:\n\n[](/assets/images/hoco-gub-gen-pct-voters-d-vs-years1.png)\n\nThis might seem surprising in combination with the graph in the previous post showing that the Republican share of total voters has remained relatively stable over the years. Given that Democratic registration in Howard County has supposedly been outpacing Republican registration by a considerable margin, shouldn’t the percentage of Democratic voters be trending upward over the years, and the percentage of Republican voters trending downward?\n\nPart of the answer may lie in the difference between registering voters and having those voters actually turn out for elections. However another part of the answer lies in the role of unaffiliated and other voters. Let’s plot the percentage of unaffiliated and other voters hgg$PctVotersOther for comparison:\n\nThe resulting graph shows a clear and (at first glance) almost perfectly linear upward trend in the percentage of people voting who are unaffiliated or belong to other parties.\n\n[](/assets/images/hoco-gub-gen-pct-voters-other-vs-years1.png)\n\nSo possibly what’s happening is that the rising percentage of unaffiliated and other voters is cutting into the Democratic fraction of voters more than into the Republican fraction.\n\nBut that’s a question for another day. For now let’s continue with trying to estimate the various percentages of voters for each party and for independents. To help us do that, let’s plot all the values in one graph. We’ll start with a plot like the one we did for Republican voters in the previous post, and then add to it the values for Democratic voters and for unaffiliated and other voters:\n\nNote that as in the original plot we set the vertical or “y” axis to go from 0 to 60%. In this new plot we also use the xlim parameter to set the horizontal (“x”) axis to go from 1990 to 2010, in order to help us envision how the historical trends might project forward to this year. To the graph produced by plot() we then add points for hgg$PctVotersD and hgg$PctVotersOther, both plotted against hgg$Year. (Note that the points() function does not start a brand-new graph, but simply overlays new data points on the graph already being displayed.)\n\n[](/assets/images/hoco-gub-gen-pct-voters-vs-years.png)\n\nFrom the above graph we can do a quick eyeball estimate of where the percentages of voters might end up in 2010, assuming historical trends continue. The percentage of unaffiliated voters looks like it might be around 17-18%, and the percentage of Democrats around 47-48%; that would leave the percentage of Republican voters around 35% or so.\n\nHowever with R we can produce a more exact prediction by creating a “linear model” of the data. In a linear relationship change in one variable is associated with a proportional change in another variable. For example, based on the data for hgg$PctVotersOther:\n\nan increase of four years (i.e., between elections) appears to be associated with an increase of over 1% in the percentage of unaffiliated and other voters, or about a quarter to a third of a percent per year. To get a more exact estimate we use the lm() function:\n\nHere lm() tries to find a linear relationship between hgg$PctVotersOther and hgg$Year, such that given a value of hgg$Year we can predict a corresponding value for hgg$PctVotersOther. This produces two numbers of interest. The first number, 0.3522, is the estimated increase per year in the percentage of unaffiliated and other voters. (This is known as the “slope” of the line.)\n\nThe second number, -691.6035 (known as the “intercept”), is the value that hgg$PctVotersOther would have if we projected back to hgg$Year having a value of zero. Of course this doesn’t make sense in real life, but simply serves to help calculate estimated values of hgg$PctVotersOther. For example, if hgg$Year has the value 1990 then we calculate the estimated value of hgg$PctVotersOther in that year by multiplying the slope value (0.3522) by 1990 and then adding the intercept value (-691.6035):\n\nr\n> hgg$PctVotersOther[1]\n[1] 9.39\n>\nr\n> abline(-691.6035, 0.3522)\n>\nr\n> 0.3522 2010 - 691.6035\n[1] 16.3185\n>\n``\n\nSo our first prediction is that unaffiliated and other voters will be 16.3% of those voting in Howard County in the 2010 general election. I’ll continue this analysis in the [next post][next po], in which we’ll find an estimate for the proportion of Democratic voters.\n\n\n The abline()` function gets its name from the traditional mathematical equation for a line, $y = ax + b$, in which $x$ is a variable on which $y$ depends, $a$ is a constant value giving the slope, and $b$ is a second constant value giving the intercept.\n\n[part 1]: {{< relref \"2010-11-07-exploring-howard-county-election-data-with-r-part-1.md\" >}}\n[R stati]: http://en.wikipedia.org/wiki/R_%28programming_language%29\n[part 2]: {{< relref \"2010-11-13-exploring-howard-county-election-data-with-r-part-2.md\" >}}\n[next po]: {{< relref \"2010-11-17-exploring-howard-county-election-data-with-r-part-4.md\" >}}\n[least s]: http://en.wikipedia.org/wiki/Least_squares\n\n *\n\nTrevor - 2010-11-16 14:49{#c8c1d956-002}\n\nThis is great stuff. You ever thought about a second career in polling?\n\nhecker - 2010-11-16 15:02{#c8c1d956-003}\n\nI don't think I'm cut out to be a pollster, to be honest. This is just a hobby, and I plan to keep it that way.",
"title": "Exploring Howard County election data with R, part 3"
}