# please solve the following questions and please APA references the material in docs

1. Consider the following Training Data Set:• Apply the Naïve Bayesian Classifier to this data set and compute the probability score for P(y = 1|X) for X = (1,0,0)Show your workYour Thoughts?Training Data SetX1X2X3Y1110110000000101101101112. List some prominent use cases of the Naïve Bayesian Classifier.3. What gives the Naïve Bayesian Classifier the advantage of being computationally inexpensive?4. Why should we use log-likelihoods rather than pure probability values in the Naïve Bayesian Classifier?5. What is a confusion matrix and how it is used to evaluate theeffectiveness of the model?6. Consider the following data set with two input features temperature and seasonYour Thoughts?What is the Naïve Bayesian assumption?Is the Naïve Bayesian assumption satisfied for this problem?TemperatureSeasonElectricty Usage-10 to 50 FWinterHigh50 to 70 FWinterLow70 to 85 FSummerLow85 to 110 FSummerHigh

Advanced Analytics – Theory and Methods
Module 4: Analytics Theory/Methods
1
Module 4: Analytics Theory/Methods
1
Advanced Analytics – Theory and Methods
Naïve Bayesian Classifiers
During this lesson the following topics are covered:
• Naïve Bayesian Classifier
• Theoretical foundations of the classifier
• Use cases
• Evaluating the effectiveness of the classifier
• The Reasons to Choose (+) and Cautions (-) with the use of
the classifier
Module 4: Analytics Theory/Methods
2
The topics covered in this lesson are listed.
Module 4: Analytics Theory/Methods
2
Classifiers
Where in the catalog should I place this product listing?
Is this email spam?
Is this politician Democrat/Republican/Green?
• Classification: assign labels to objects.
• Usually supervised: training set of pre-classified examples.
• Our examples:
 Naïve Bayesian
 Decision Trees
 (and Logistic Regression)
Module 4: Analytics Theory/Methods
3
The primary task performed by classifiers is to assign labels to objects. Labels in classifiers are
pre-determined unlike in clustering where we discover the structure and assign labels.
Classifier problems are supervised learning methods. We start with a training set of preclassified examples and with the knowledge of probabilities we assign class labels.
Some use case examples are shown in the slide. Based on the voting pattern on issues we
could classify whether a politician has an affiliation to a party or a principle. Retailers use
classifiers to assign proper catalog entry locations for their products. Most importantly the
classification of emails as spam is another useful application of classifier methods.
Logistic regression, discussed in the previous lesson, can be viewed and used as a classifier. We
will discuss Naïve Bayesian Classifiers in this lesson and the use of Decision Trees in the next
lesson.
Module 4: Analytics Theory/Methods
3
Naïve Bayesian Classifier
• Determine the most probable class label for each object
 Based on the observed object attributes
 Naïvely assumed to be conditionally independent of each other
 Example:
 Based on the objects attributes {shape, color, weight}
assign the applicant the label
“good” credit
aj
Ci
P(aj| Ci)
female single
good
0.28
female single
0.36
own
good
0.75
own
0.62
self emp
good
0.14
self emp
0.17
savings>1K
good
0.06
savings>1K
0.02
P(good|A) ~ (0.28*0.75*0.14*0.06)*0.7 = 0.0012
Module 4: Analytics Theory/Methods 12
Here we have an example of an applicant who is female, single, owns a home, is self-employed
and has savings over \$1000 in her savings account. How will we classify this person? Will she
be scored as a person with good or bad credit?
Having built the classifier with the training set we find P(good|A) which is equal to 0.0012 (see
the computation on the slide) and P(bad|A) is 0.0002. Since P(good|A) is the maximum of the
two probability scores, we assign the label “good” credit.
The score is only proportional to the probability. It doesn’t equal the probability, because we
haven’t included the denominator. However, both formulas have the same denominator, so we
don’t need to calculate it in order to know which quantity is bigger.
Notice, though, how small in magnitude these scores are. When we are looking at problems
with a large number of attributes, or attributes with a very high number of levels, these values
can become very small in magnitude.
Module 4: Analytics Theory/Methods
12
Naïve Bayesian Implementation Considerations
• Numerical underflow
 Resulting from multiplying several probabilities near zero
 Preventable by computing the logarithm of the products
• Zero probabilities due to unobserved attribute/classifier pairs
 Resulting from rare events
 Handled by smoothing (adjusting each probability by a small amount)
• Assign the classifier label, Ci, that maximizes the value of
 m

  log P(a j | Ci )   log P(Ci )

 j 1

where i = 1,2,…,n and
Module 4: Analytics Theory/Methods 13
Multiplying several probability values, each possibly close to zero, invariably leads to the
problem of numerical underflow. So an important implementation guideline is to compute the
logarithm of the product of the probabilities, which is equivalent to the summation of the
logarithm of the probabilities. Although the risk of underflow may increase as the number of
attributes increase, the use of logarithms should be applied regardless of the number of
attribute dimensions.
Additionally, to address the possibility of probabilities equal to zero, smoothing techniques can
be employed to adjust the probabilities to ensure non-zero values. Applying a smoothing
technique assigns a small non-zero probability to rare events not included in the training
dataset. Also, the smoothing addresses the possibility of taking the logarithm of zero.
The R implementation of Naïve Bayes incorporates the smoothing directly into the probability
tables. Essentially, the Laplace smoothing that R uses adds one (or a small value) to every
count. For example, if we have 100 “good” customers, and 20 of them rent their housing, the
“raw” P(rent | good) = 20/100 = 0.2; with Laplace smoothing add adding one to the counts,
the calculation would be P(rent | good) ~ (20 + 1)/(100+3) = 0.20388, where there are 3
Fortunately, the use of the logarithms and the smoothing techniques are already implemented
in standard software packages for Naïve Bayes Classifiers. However, if for performance
reasons, the Naïve Bayes Classifier algorithm needs to be coded directly into an application,
these considerations should be implemented.
Module 4: Analytics Theory/Methods
13
Diagnostics
• Hold-out data
 How well does the model classify new instances?
• Cross-validation
• ROC curve/AUC
Module 4: Analytics Theory/Methods 14
The diagnostics we used in regression can be used to validate the effectiveness of the model
we built. The technique of using the hold-out data and performing N-fold cross validations and
using the ROC/Area Under the Curve methods can be deployed with Naïve Bayesian Classifier
as well.
Module 4: Analytics Theory/Methods
14
Diagnostics: Confusion Matrix
Prediction
true positives (TP)
Actual
Class
good
good
671
29
700
38
262
300
709
291
1000
false positives (FP)
false negatives (FN)
true negatives (TN)
Overall success rate (or accuracy):
(TP + TN) / (TP+TN+FP+FN) = (671+262)/1000 ≈ 0.93
TPR:
FPR:
FNR:
TP / (TP + FN) = 671 / (671+29) = 671/700 ≈ 0.96
FP / (FP + TN) = 38 / (38 + 262) = 38/300 ≈ 0.13
FN / (TP + FN) = 29 / (671 + 29) = 29/700 ≈ 0.04
Precision:
TP/ (TP + FP) = 671/709 ≈ 0.95
Recall (or TPR): TP / (TP + FN) ≈ 0.96
Module 4: Analytics Theory/Methods 15
A confusion matrix is a specific table layout that allows visualization of the performance of a model. In
the hypothetical example of confusion matrix shown:
Of 1000 credit score samples, the system predicted that there were good and bad credit, and of the 700
good credits, the model predicted 29 as bad and similarly 38 of the actual bad credits were predicted as
good. All correct guesses are located in the diagonal of the table, so it’s easy to visually inspect the table
for errors, as they will be represented by any non-zero values outside the diagonal.
We define overall success rate (or accuracy) as a metric defining – what we got right – which is the ratio
between the sum of the diagonal values (i.e., TP and TN) vs. the sum of the table. In other words, the
confusion table of a good model has large numbers diagonally and small (ideally zero) numbers offdiagonally.
We saw a true positive rate (TPR) and a false positive rate (FPR) when w…

attachment

## Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
\$26
The price is based on these factors:
Number of pages
Urgency
Basic features
• Free title page and bibliography
• Unlimited revisions
• Plagiarism-free guarantee
• Money-back guarantee
On-demand options
• Writer’s samples
• Part-by-part delivery
• Overnight delivery
• Copies of used sources
Paper format
• 275 words per page
• 12 pt Arial/Times New Roman
• Double line spacing
• Any citation style (APA, MLA, Chicago/Turabian, Harvard)

# Our guarantees

### Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

### Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

### Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.