Analyze the frequency distributions of common function

Assignment Help Computer Engineering
Reference no: EM132023794

Question: In this homework assignment, you are going to use clustering methods to solve a mystery in history: who wrote the disputed essays, Hamilton or Madison?

1. About the Federalist Papers Quote from the Library of Congress

The Federalist Papers were a series of eighty-five essays urging the citizens of New York to ratify the new United States Constitution. Written by Alexander Hamilton, James Madison, and John Jay, the essays originally appeared anonymously in New York newspapers in 1787 and 1788 under the pen name "Publius." A bound edition of the essays was first published in 1788, but it was not until the 1818 edition published by the printer Jacob Gideon that the authors of each essay were identified by name. The Federalist Papers are considered one of the most important sources for interpreting and understanding the original intent of the Constitution.

2. About the disputed authorship

The original essays can be downloaded from the Library of Congress.

In the author column, you will find 74 essays with identified authors: 51 essays written by Hamilton, 15 by Madison, 3 by Hamilton and Madison, 5 by Jay. The remaining 11 essays, however, is authored by "Hamilton or Madison". These are the famous essays with disputed authorship. Hamilton wrote to claim the authorship before he was killed in a duel. Later Madison also claimed authorship. Historians were trying to find out which one was the real author.

3. Computational approach for authorship attribution

In 1960s, statistician Mosteller and Wallace analyzed the frequency distributions of common function words in the Federalist Papers, and drew their conclusions. This is a pioneering work on using mathematical approaches for authorship attribution.

Nowadays, authorship attribution has become a classic problem in the data mining field, with applications in forensics (e.g. deception detection), and information organization.

In this homework you are provided with the Federalist Paper data set. The features are a set of "function words", for example, "upon". The feature value is the percentage of the word occurrence in an essay. For example, for the essay "Hamilton_fed_31.txt", if the function word "upon" appeared 3 times, and the total number of words in this essay is 1000, the feature value is 3/1000=0.3%

Now you are going to try solving this mystery using clustering algorithms k-Means and HAC. Document your analysis process and draw your conclusion on who wrote the disputed essays. Provide evidence for each method to demonstrate what patterns had been learned to predict the disputed papers, for example, visualize the clustering results and show where the disputed papers are located in relation to Hamilton and Madison's papers. By the way, where are the papers with joint authorship located? For k-Means, analyze the centroids to explain which attributes are most useful for clustering. Hint: the centroid values on these dimensions should be far apart from each other to be able to distinguish the clusters.

Information related to above question is enclosed below:

Attachment:- fedPapers851.rar

Reference no: EM132023794

Questions Cloud

Receiving utils of satisfaction : Kathy should consume units of good X and units of good Y and will be receiving utils of satisfaction. At this utility maximization point
Research the most recent session of the texas legislature : Research the most recent session of the Texas Legislature. The session ended on May 29, 2017 and there was also a special session called.
Explain the behavioral biases at work : Explain the behavioral biases at work in this example for why this investment option may be so popular.
Transaction costs in buying and or selling foreign : How could I make a riskless profit without any money tied up assuming that there are no transaction costs in buying and or selling foreign exchange
Analyze the frequency distributions of common function : In 1960s, statistician Mosteller and Wallace analyzed the frequency distributions of common function words in the Federalist Papers, and drew their conclusions.
What is national or state or local civic engagement : Why is it your personal responsibility to be involved in politics? Why is it your social responsibility to get involved in local, state or federal politics?
Is this an indication that their ticket pricing strategy : Is this an indication that their ticket pricing strategy is not optimal? Why or why not?
How much output should the monopolist produce : There are no fixed costs of production. How much output should the monopolist produce in order to maximize profit?
Analyzing the attack using given information : Attack Analysis: After collecting evidence and analyzing the attack, the third party was able to recreate the attack. No-Internal-Controls, LLC has a number.

Reviews

Write a Review

Computer Engineering Questions & Answers

  What is the maximum phase error allowed

If the portion of the simple-mentation margin allocated for phase errors is 0.25dB, what is the maximum phase error allowed if the target BER is 10-5?

  Which of the two executes faster

There are two ways of loading the accumulator with a number: Which of the two executes faster? Why?

  Design implement and test the not gate

Design, implement, and test the following logic gates. For parts 1-4, your code must reside on the EEPROM (ROM). For parts 5 and 6, your code must be in program section of RAM (PROG)

  Determine the van driest equation for the mixing length

Using the Van Driest equation for the mixing length in the sublayer, determine u+ as a function of y+ for p + = 0 and v+0 = 0 by numerical integration.

  Write a program that reads the french name of a country

Write a program that reads the French name of a country and adds the article: le for masculine or la for feminine, such as le Canada or la Belgiqe.

  What are some of the "old thinking" concerns about

Can you think of some disadvantages to telecommuting? What jobs will not be suitable for telecommuting.

  Design a sequential circuit whose state diagram is given

Design a sequential circuit whose state diagram is given in Fig. using a 3-bit register and a 16 x 4 ROM.

  Explain the fourier transform of the autocorrelation

Prove that the Fourier transform of the autocorrelation of a random signal yields the spectrum, i.e., the power measured in a 1-Hz bandwidth at each frequency.

  How might you improve your performance in the case

After a computer forensics investigation, you need to meet with your department or group of fellow investigators and critique the case in an effort to improve your work. define how to make a self-evaluation of your work by answering.

  Describe the computer the client

What questions should you ask and how should you proceed? Write a one- to two-page report describing the computer the client used, who else had access to it, and any other relevant facts that should be investigated.

  Use the internet to find at least three different erp

use the internet to find at least three different erp software systems that are available. determine what the main

  Explain the effects that the cascading style sheet will have

Explain the effects that the Cascading Style Sheet (.css) will have on the Web page. There must be a minimum of two (2) changes. Be sure to include detail. (Example: The Cascading Style Sheet (.css) will make the background color grey and change a..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd