Expert Homework Help Online & Write My Essay Service

Hire best homework helpers for online homework help 24/7. Are you looking for online homework help? Try our excellent homework help who can help you get A+ grade in your assignment.

Order my paper
Calculate your essay price
(550 words)

Approximate price: $22

19 k happy customers
9.5 out of 10 satisfaction rate
527 writers active

please solve the following questions and please APA references the material in docs

1. Consider the following Training Data Set:• Apply the Naïve Bayesian Classifier to this data set and compute the probability score for P(y = 1|X) for X = (1,0,0)Show your workYour Thoughts?Training Data SetX1X2X3Y1110110000000101101101112. List some prominent use cases of the Naïve Bayesian Classifier.3. What gives the Naïve Bayesian Classifier the advantage of being computationally inexpensive?4. Why should we use log-likelihoods rather than pure probability values in the Naïve Bayesian Classifier?5. What is a confusion matrix and how it is used to evaluate theeffectiveness of the model?6. Consider the following data set with two input features temperature and seasonYour Thoughts?What is the Naïve Bayesian assumption?Is the Naïve Bayesian assumption satisfied for this problem?TemperatureSeasonElectricty Usage-10 to 50 FWinterHigh50 to 70 FWinterLow70 to 85 FSummerLow85 to 110 FSummerHigh

Advanced Analytics – Theory and Methods
Copyright © 2014 EMC Corporation. All Rights Reserved.
Copyright © 2014 EMC Corporation. All rights reserved.
Module 4: Analytics Theory/Methods
1
Module 4: Analytics Theory/Methods
1
Advanced Analytics – Theory and Methods
Naïve Bayesian Classifiers
During this lesson the following topics are covered:
• Naïve Bayesian Classifier
• Theoretical foundations of the classifier
• Use cases
• Evaluating the effectiveness of the classifier
• The Reasons to Choose (+) and Cautions (-) with the use of
the classifier
Copyright © 2014 EMC Corporation. All Rights Reserved.
Module 4: Analytics Theory/Methods
2
The topics covered in this lesson are listed.
Copyright © 2014 EMC Corporation. All rights reserved.
Module 4: Analytics Theory/Methods
2
Classifiers
Where in the catalog should I place this product listing?
Is this email spam?
Is this politician Democrat/Republican/Green?
• Classification: assign labels to objects.
• Usually supervised: training set of pre-classified examples.
• Our examples:
 Naïve Bayesian
 Decision Trees
 (and Logistic Regression)
Copyright © 2014 EMC Corporation. All Rights Reserved.
Module 4: Analytics Theory/Methods
3
The primary task performed by classifiers is to assign labels to objects. Labels in classifiers are
pre-determined unlike in clustering where we discover the structure and assign labels.
Classifier problems are supervised learning methods. We start with a training set of preclassified examples and with the knowledge of probabilities we assign class labels.
Some use case examples are shown in the slide. Based on the voting pattern on issues we
could classify whether a politician has an affiliation to a party or a principle. Retailers use
classifiers to assign proper catalog entry locations for their products. Most importantly the
classification of emails as spam is another useful application of classifier methods.
Logistic regression, discussed in the previous lesson, can be viewed and used as a classifier. We
will discuss Naïve Bayesian Classifiers in this lesson and the use of Decision Trees in the next
lesson.
Copyright © 2014 EMC Corporation. All rights reserved.
Module 4: Analytics Theory/Methods
3
Naïve Bayesian Classifier
• Determine the most probable class label for each object
 Based on the observed object attributes
 Naïvely assumed to be conditionally independent of each other
 Example:
 Based on the objects attributes {shape, color, weight}
 A given object that is {spherical, yellow, < 60 grams}, may be classified (labeled) as a tennis ball  Class label probabilities are determined using Bayes’ Law • Input variables are discrete • Output:  Probability score – proportional to the true probability  Class label – based on the highest probability score Copyright © 2014 EMC Corporation. All Rights Reserved. Module 4: Analytics Theory/Methods 4 The Naïve Bayesian Classifier is a probabilistic classifier based on Bayes' Law and naïve conditional independence assumptions. In simple terms, a Naïve Bayes Classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. For example, an object can be classified into a particular category based on its attributes such as shape, color, and weight. A reasonable classification for an object, that is spherical, yellow and less than 60 grams in weight, may be a tennis ball. Even if these features depend on each other or upon the existence of the other features, a Naïve Bayesian Classifier considers all of these properties to independently contribute to the probability that the object is a tennis ball. The input variables are generally discrete (categorical) but there are variations to the algorithms that work with continuous variables as well. For this lesson, we will consider only discrete input variables. Although weight may be considered a continuous variable, in the tennis ball example, weight was grouped into intervals in order to make weight a categorical variable. The output typically returns a probability score and class membership. The output from most implementations are log probability scores for the class (we will address this later in the lesson) and we assign the class label that corresponds to the highest log probability score. Copyright © 2014 EMC Corporation. All rights reserved. Module 4: Analytics Theory/Methods 4 Naïve Bayesian Classifier - Use Cases • Preferred method for many text classification problems.  Try this first; if it doesn't work, try something more complicated • Use cases  Spam filtering, other text classification tasks  Fraud detection Copyright © 2014 EMC Corporation. All Rights Reserved. Module 4: Analytics Theory/Methods 5 Naïve Bayesian Classifiers are among the most successful known algorithms for learning to classify text documents. Spam filtering is the best known use of Naïve Bayesian Text Classification. Bayesian Spam Filtering has become a popular mechanism to distinguish illegitimate spam email from legitimate email. Many modern mail clients implement Bayesian Spam Filtering. Naïve Bayesian Classifiers are used to detect fraud. For example in auto insurance, based on a training data set with attributes (such as driver’s rating, vehicle age, vehicle price, is it a claim by the policy holder, police report status, claim genuine ) we can classify a new claim as genuine or not. References: Spam filtering (http://en.wikipedia.org/wiki/Bayesian_spam_filtering) http://www.cisjournal.org/archive/vol2no4/vol2no4_1.pdf Hybrid Recommender System Using Naive Bayes Classifier and Collaborative Filtering (http://eprints.ecs.soton.ac.uk/18483/ Online applications (http://www.convo.co.uk/x02/) Copyright © 2014 EMC Corporation. All rights reserved. Module 4: Analytics Theory/Methods 5 Building a Training Dataset to Predict Good or Bad Credit • Predict the credit behavior of a credit card applicant from applicant's attributes:  Personal status  Job type  Housing type  Savings amount • These are all categorical variables and are better suited to Naïve Bayesian Classifier than to logistic regression. Copyright © 2014 EMC Corporation. All Rights Reserved. Module 4: Analytics Theory/Methods 6 Let us look into a specific use case example. We present here the same example we worked with in Lesson 2 of this module with the Apriori algorithm. The training dataset consists of attributes: personal status, job type, housing type and amount of money in their savings account. They are represented as categorical variables which are well suited for Naïve Bayesian Classifier. With this training set we want to predict the credit behavior of a new customer. This problem could be solved with logistic regression as well. If there are multiple levels for the outcome you want to predict, then Naïve Bayesian Classifier is a better solution. Next, we will go through the technical basis for Naïve Bayesian Classifiers and will revisit this credit dataset later. Copyright © 2014 EMC Corporation. All rights reserved. Module 4: Analytics Theory/Methods 6 Technical Description - Bayes' Law P(C | A)  P( A  C ) P( A | C ) P(C )  P( A) P( A) • C is the class label:  C ϵ {C1, C2, … Cn} • A is the observed object attributes  A = (a1, a2, … am) • P(C | A) is the probability of C given A is observed  Called the conditional probability Copyright © 2014 EMC Corporation. All Rights Reserved. Module 4: Analytics Theory/Methods 7 Bayes' Law states: P(C | A)*P(A) = P(A | C)*P(C) = P(A ^ C). That is, the conditional probability that C is true given that A is true, denoted P(C|A), times the probability of A is the same as the conditional probability that A is true given that C is true, denoted P(A|C), times the probability of C. Both of these terms are equal to P(A^C) that is the probability A and C are simultaneously true. If we divide all three terms by P(A), then we get the form shown on the slide. The reason that Bayes’ Law is important is that we may not know P(C|A) (and we want to), but we do know P(A|C) and P(C) for each possible value of C from the training data. As we will see later, it is not necessary to know P(A) for the purposes of Naïve Bayes Classifiers. An example using Bayes Law: John flies frequently and likes to upgrade his seat to first class. He has determined that, if he checks in for his flight at least two hours early, the probability that he will get the upgrade is .75; otherwise, the probability that he will get the upgrade is .35. With his busy schedule, he checks in at least two hours before his flight only 40% of the time. Suppose John didn’t receive an upgrade on his most recent attempt. What is the probability that he arrived late? C = John arrives late A = John did not receive an upgrade P(C) = Probability John arrives late = .6 P(A) = Probability John did not receive an upgrade = 1 – ( .4 x .75 + .6 x .35) = 1 -.51 = .49 P(A|C) = Probability that John did not receive an upgrade given that he arrived late = 1 - .35 = .65 P(C|A) = Probability that John arrived late given that he did not receive his upgrade = P(A|C)P(C)/P(A) = (.65 x .6)/.49 = .80 (approx) In this simple example, C can take one of two possible values {arriving early, arriving late) and there is only one attribute which can take one of two possible values {received upgrade, did not receive upgrade}. Next, we will generalize Bayes’ Law to multiple attributes and apply the naïve independence assumptions. Copyright © 2014 EMC Corporation. All rights reserved. Module 4: Analytics Theory/Methods 7 Apply the Naïve Assumption and Remove a Constant • For observed attributes A = (a1, a2, … am), we want to compute P(Ci | A)  P(a1 , a2 ,...,am | Ci ) P(Ci ) P(a1 , a2 ,...,am ) i  1, 2,...,n and assign the classifier, Ci , with the largest P(Ci|A) • Two simplifications to the calculations  Apply naïve assumption - each aj is conditionally independent of each other, then m P(a1 , a2 ,...,am | Ci )  P(a1 | Ci ) P(a2 | Ci )  P(am | Ci )   P(a j | Ci ) j 1  Denominator P(a1,a2,…am) is a constant and can be ignored Copyright © 2014 EMC Corporation. All Rights Reserved. Module 4: Analytics Theory/Methods 9 The general approach is to assign the classifier label, Ci, to the object with attributes A = (a1, a2, … am) that corresponds to the largest value of P(Ci|A). The probability that a set of attribute values A (comprised of m variables a 1 thru am) should be labeled with a classification Ci will equal the probability that of the set of variables a1 thru am given Ci is true, times the probability of Ci all divided by the probability of the set of attribute values a1 thru am . The conditional independence assumption is that the probability of observing the value of a particular attribute given Ci is independent of the other attributes. This naïve assumption simplifies the calculation of P(a1, a2, …, am|Ci) as shown on the slide. Since P(a1, a2, …, am) appears in the denominator of P(Ci|A), for all values of i, removing the denominator will have no impact on the relative probability scores and will simplify calculations. Next, these two simplifications to the calculations will be applied to build the Naïve Bayesian Classifier. Copyright © 2014 EMC Corporation. All rights reserved. Module 4: Analytics Theory/Methods 9 Building a Naïve Bayesian Classifier • Applying the two simplifications  m  P(Ci | a1 , a2 ,...,am )    P(a j | Ci )  P(Ci ) i  1, 2,...,n  j 1  • To build a Naïve Bayesian Classifier, collect the following statistics from the training data:  P(Ci) for all the class labels.  P(aj| Ci) for all possible aj and Ci  Assign the classifier label, Ci, that maximizes the value of  m    P (a j | Ci )  P (Ci )    j 1  Copyright © 2014 EMC Corporation. All Rights Reserved. i  1, 2,...,n Module 4: Analytics Theory/Methods 10 Applying the two simplifications, P(Ci|a1, a2, …, am) is proportional to the product of the various P(aj|Ci), for j=1,2,…m, times P(Ci). From a training dataset, these probabilities can be computed and stored for future classifier assignments. We now return to the credit applicant example. Copyright © 2014 EMC Corporation. All rights reserved. Module 4: Analytics Theory/Methods 10 Naïve Bayesian Classifiers for the Credit Example • Class labels: {good, bad}  P(good) = 0.7  P(bad) = 0.3 • Conditional Probabilities  P(own|bad) = 0.62  P(own|good) = 0.75  P(rent|bad) = 0.23  P(rent|good) = 0.14  … and so on Copyright © 2014 EMC Corporation. All Rights Reserved. Module 4: Analytics Theory/Methods 11 To build a Naïve Bayesian Classifier we need to collect the following statistics: 1. Probability of all class labels – Probability of good credit and probability of bad credit. From the all data available in the training set we determine P(good) = 0.7 and P(bad) = 0.3 2. In the training set, there are several attributes: personal_status, job, housing, and saving_status. For each attribute and its possible values, we need to compute the conditional probabilities given bad or good credit. For example, relative to the housing attribute, we need to compute P(own|bad), P(own|good), P(rent|bad), P(rent|good), etc. Copyright © 2014 EMC Corporation. All rights reserved. Module 4: Analytics Theory/Methods 11 Naïve Bayesian Classifier for a Particular Applicant • Given applicant attributes of A= {female single, owns home, self-employed, savings > $1000}
• Since P(good|A) > (bad|A),
assign the applicant the label
“good” credit
aj
Ci
P(aj| Ci)
female single
good
0.28
female single
bad
0.36
own
good
0.75
own
bad
0.62
self emp
good
0.14
self emp
bad
0.17
savings>1K
good
0.06
savings>1K
bad
0.02
P(good|A) ~ (0.28*0.75*0.14*0.06)*0.7 = 0.0012
P(bad|A) ~ (0.36*0.62*0.17*0.02)*0.3 = 0.0002
Copyright © 2014 EMC Corporation. All Rights Reserved.
Module 4: Analytics Theory/Methods 12
Here we have an example of an applicant who is female, single, owns a home, is self-employed
and has savings over $1000 in her savings account. How will we classify this person? Will she
be scored as a person with good or bad credit?
Having built the classifier with the training set we find P(good|A) which is equal to 0.0012 (see
the computation on the slide) and P(bad|A) is 0.0002. Since P(good|A) is the maximum of the
two probability scores, we assign the label “good” credit.
The score is only proportional to the probability. It doesn’t equal the probability, because we
haven’t included the denominator. However, both formulas have the same denominator, so we
don’t need to calculate it in order to know which quantity is bigger.
Notice, though, how small in magnitude these scores are. When we are looking at problems
with a large number of attributes, or attributes with a very high number of levels, these values
can become very small in magnitude.
Copyright © 2014 EMC Corporation. All rights reserved.
Module 4: Analytics Theory/Methods
12
Naïve Bayesian Implementation Considerations
• Numerical underflow
 Resulting from multiplying several probabilities near zero
 Preventable by computing the logarithm of the products
• Zero probabilities due to unobserved attribute/classifier pairs
 Resulting from rare events
 Handled by smoothing (adjusting each probability by a small amount)
• Assign the classifier label, Ci, that maximizes the value of
 m

  log P(a j | Ci )   log P(Ci )


 j 1

where i = 1,2,…,n and
P’ denotes the adjusted probabilities
Copyright © 2014 EMC Corporation. All Rights Reserved.
Module 4: Analytics Theory/Methods 13
Multiplying several probability values, each possibly close to zero, invariably leads to the
problem of numerical underflow. So an important implementation guideline is to compute the
logarithm of the product of the probabilities, which is equivalent to the summation of the
logarithm of the probabilities. Although the risk of underflow may increase as the number of
attributes increase, the use of logarithms should be applied regardless of the number of
attribute dimensions.
Additionally, to address the possibility of probabilities equal to zero, smoothing techniques can
be employed to adjust the probabilities to ensure non-zero values. Applying a smoothing
technique assigns a small non-zero probability to rare events not included in the training
dataset. Also, the smoothing addresses the possibility of taking the logarithm of zero.
The R implementation of Naïve Bayes incorporates the smoothing directly into the probability
tables. Essentially, the Laplace smoothing that R uses adds one (or a small value) to every
count. For example, if we have 100 “good” customers, and 20 of them rent their housing, the
“raw” P(rent | good) = 20/100 = 0.2; with Laplace smoothing add adding one to the counts,
the calculation would be P(rent | good) ~ (20 + 1)/(100+3) = 0.20388, where there are 3
possible values for housing (own, rent, for free).
Fortunately, the use of the logarithms and the smoothing techniques are already implemented
in standard software packages for Naïve Bayes Classifiers. However, if for performance
reasons, the Naïve Bayes Classifier algorithm needs to be coded directly into an application,
these considerations should be implemented.
Copyright © 2014 EMC Corporation. All rights reserved.
Module 4: Analytics Theory/Methods
13
Diagnostics
• Hold-out data
 How well does the model classify new instances?
• Cross-validation
• ROC curve/AUC
Copyright © 2014 EMC Corporation. All Rights Reserved.
Module 4: Analytics Theory/Methods 14
The diagnostics we used in regression can be used to validate the effectiveness of the model
we built. The technique of using the hold-out data and performing N-fold cross validations and
using the ROC/Area Under the Curve methods can be deployed with Naïve Bayesian Classifier
as well.
Copyright © 2014 EMC Corporation. All rights reserved.
Module 4: Analytics Theory/Methods
14
Diagnostics: Confusion Matrix
Prediction
true positives (TP)
Actual
Class
good
bad
good
671
29
700
bad
38
262
300
709
291
1000
false positives (FP)
false negatives (FN)
true negatives (TN)
Overall success rate (or accuracy):
(TP + TN) / (TP+TN+FP+FN) = (671+262)/1000 ≈ 0.93
TPR:
FPR:
FNR:
TP / (TP + FN) = 671 / (671+29) = 671/700 ≈ 0.96
FP / (FP + TN) = 38 / (38 + 262) = 38/300 ≈ 0.13
FN / (TP + FN) = 29 / (671 + 29) = 29/700 ≈ 0.04
Precision:
TP/ (TP + FP) = 671/709 ≈ 0.95
Recall (or TPR): TP / (TP + FN) ≈ 0.96
Copyright © 2014 EMC Corporation. All Rights Reserved.
Module 4: Analytics Theory/Methods 15
A confusion matrix is a specific table layout that allows visualization of the performance of a model. In
the hypothetical example of confusion matrix shown:
Of 1000 credit score samples, the system predicted that there were good and bad credit, and of the 700
good credits, the model predicted 29 as bad and similarly 38 of the actual bad credits were predicted as
good. All correct guesses are located in the diagonal of the table, so it’s easy to visually inspect the table
for errors, as they will be represented by any non-zero values outside the diagonal.
We define overall success rate (or accuracy) as a metric defining – what we got right – which is the ratio
between the sum of the diagonal values (i.e., TP and TN) vs. the sum of the table. In other words, the
confusion table of a good model has large numbers diagonally and small (ideally zero) numbers offdiagonally.
We saw a true positive rate (TPR) and a false positive rate (FPR) when w…

attachment

Place your order
(550 words)

Approximate price: $22

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more
error: