diff_months: 11

Logistic multilevel model in Voting Assignment

Download Solution Now
Added on: 2023-01-14 11:28:20
Order Code: CLT274635
Question Task Id: 0
  • Subject Code :


  • Country :


QUESTION B: Estimating Constituency-Level Results from the EU Referendum [25 points]

 In the 2016 UK referendum on leaving the EU, the results of the vote were not released for individual electoral constituencies. However, many scholars would like to know why people voted to leave the EU, and how support for leaving differed across constituencies. One previous study has already estimated constituency-level support for ‘leave’ in an authoritative way. Your tasks in this question are (i) to produce estimates of the percentage of voters that voted ‘leave’ in every constituency using multilevel modeling and post-stratification that are as close as possible to this existing set of estimates, as measured by the Mean Absolute Error (MAE), and (ii) to use your results to explain why people voted to leave.

 You need to:

  1. Estimate an appropriate logistic multilevel model explaining voting for leave, using the predictors in the dataset.1
  2. Present the multilevel model results and interpret how the variables affect voting to leave the EU (Note: you do not need to discuss statistical significance).
  • Produce post-stratified estimates of the percentage of people who voted ‘leave’ in all 631 constituencies in England, Scotland and Wales
  1. Compare your results to the existing estimates using the Mean Absolute Error

 You should present and explain your approach and results in a brief report, explaining why your estimates do or not perform well compared to the existing estimates. Note: if you cannot get very close to the existing results, do not worry. Your grade depends on the quality of your analysis, presentation and interpretation, not how close your results are to the existing estimates.

 The survey data is called “e” and is in the file “eusurvey.Rda”. It comes from the 2017 British

Election Study and it contains the following variables:


Variable name         Variable description

cname                     constituency name

ccode                      constituency code

leave                       dependent variable: =1 if respondent voted to leave EU, 0 if respondent voted to remain in the EU

votecon                   =1 if respondent voted Conservative in the 2015 election, 0 otherwise

voteukip                  =1 if respondent voted UKIP in the 2015 election, 0 otherwise [note: UKIP is the United Kingdom Independence Party, which campaigns in favour of the UK leaving the EU]

female                     =1 if female, 0 otherwise

age                          in years

highed                     =1 if respondent is educated to degree level or higher, 0 otherwise lowed    =1 if respondent has no educational qualifications, 0 otherwise c_con15 percent vote for Conservative party in the constituency, 2015 election c_ukip15     percent vote for UKIP in the constituency, 2015 election c_unemployed                               constituency unemployment rate, percent

c_whitebritish          percent of constituency population who are white British

c_deprived              percent of constituency population living in poverty

  As in the practical exercise, use the option “nAGQ=0” to avoid estimation errors

Post-stratification data for the 631 constituencies is called “post” and is contained in the file “eupoststrat.Rda”. Each row contains one particular demographic group in one constituency. In addition to the variables in “e”, it also contains these variables:

Variable name         Variable description

c_count                   Number of people in the demographic group

c_total                     Number of people in the constituency

percent                    percent of constituency represented by the demographic group

 Finally, the comparison data containing the existing estimates by constituency is called “est” and is in the file “existing_estimates.Rda”. In addition to the constituency name and code, it contains the existing estimate of the leave vote share for each constituency (called estimate).


This part of the final essay contains one question. It is worth 40 points. Again, 5 points are reserved for clarity of presentation, especially tables and figures. See Q+A session 5 for guidelines on presentation.

 The question requires you to write a brief report. It is up to you how you structure the report, but it is advisable to keep introductory material to a minimum, given the word limit. Your report should discuss your methods, your results and the conclusions that you draw from them.

 QUESTION C: Describing and Classifying Tweets [40 points]

 Many companies monitor social media posts in order to gauge how customers feel about their company and their competitors. For this question, imagine that you have been hired as a consultant by one of the major American airline companies to analyse tweets about airlines. They want to find out how people talk about airlines on Twitter, and then build a predictive tool that can classify tweets in future into ‘negative’ or ‘positive’ sentiment toward airlines, to help them respond better to their customers in real time. They have provided you with a dataset of 11,541 tweets about airlines that have been labelled as ‘negative’ or ‘positive’ by their staff. The dataset also identifies which airline each tweet is talking about.

 Your task is to prepare a brief report that describes the tweets, and recommends a classification method for future tweets. You need to:

  1. Use appropriate tools to describe the tweets. In particular, what words are associated with negative or positive sentiment? How does word usage differ across the different airlines?
  2. Use your analysis from i) to build a short dictionary of negative and positive words describing airlines, then use it to classify tweets as ‘negative’ if they contain more negative than positive language, and ‘positive’ otherwise [code for creating your own dictionary is provided below]Use the lasso logit method to classify the tweets into ‘negative’ and ‘positive’.
  3. Compare the performance of your classifiers from ii) and iii), and use this analysis to decide which one would be the better classifier for the company to use for future tweets

The dataset for this question is called “tweets” and is contained in the file “tweets.Rda”. It contains the following variables:

Variable name         Variable description

text                          The text of each tweet

sentiment                 Labeled sentiment of each tweet: 1=negative, 0=positive

airline                     The airline company featured in the tweet: United, JetBlue, American

                                 Airlines, US Airways, Virgin America or Southwest       

You should first create a corpus of tweets using the following code:

 tweetCorpus <- corpus(tweets$text, docvars = tweets)

Here is some advice for part ii):

  • Your dictionary should contain a minimum of 5 words and a maximum of 15 words in each category
  • You are not expected to exhaustively compare the performance of different dictionaries. Instead, simply choose one dictionary based on your analysis from i), explaining how you chose the words.

 Code for creating a dictionary:

You can create a dictionary called “mydict” in R that contains two categories (‘negative’ and ‘positive’) using the following code:

 neg.words <- c() pos.words <- c()

 mydict <- dictionary(list(negative = neg.words,

positive = pos.words))

 You need to insert your chosen sets of negative and positive words in ‘neg.words’ and ‘pos.words’. This dictionary can then be used with quanteda in exactly the same way as any of the existing built- in dictionaries.

  • Uploaded By : Katthy Wills
  • Posted on : January 14th, 2023
  • Downloads : 0
  • Views : 195

Download Solution Now

Can't find what you're looking for?

Whatsapp Tap to ChatGet instant assistance

Choose a Plan


80 USD
  • All in Gold, plus:
  • 30-minute live one-to-one session with an expert
    • Understanding Marking Rubric
    • Understanding task requirements
    • Structuring & Formatting
    • Referencing & Citing


30 50 USD
  • Get the Full Used Solution
    (Solution is already submitted and 100% plagiarised.
    Can only be used for reference purposes)
Save 33%


20 USD
  • Journals
  • Peer-Reviewed Articles
  • Books
  • Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more