Question 1 Suppose a school collected some data on students’ preference for hot dogs(HD) vs. hamburgers(HM). We have the following 2×2 contingency table summarizing the statistics. If lift is used to measure the correlation between HD and HM, what is the value for lift(HD, HM)?
HD
¬HD
Σrow
HM
40
24
64
¬HM
210
126
336
Σcol
250
150
400
Question Explanation The correct answer is: "1".
The lift can be calculated by
where supp(A) and supp(B) refer to the relative support of A and B respectively. Thus,
Question 2 Suppose a school collected some data on students’ preference for hot dogs(HD) vs. hamburgers(HM). We have the following 2×2 contingency table summarizing the statistics. If χ
2 is used to measure the correlation between HD and HM, what is the χ
2 score?
HD
¬HD
Σrow
HM
40
24
64
¬HM
210
126
336
Σcol
250
150
400
Question Explanation The correct answer is: "0".
The contingency table with expected values is following table
HD
¬HD
Σrow
HM
40 (40)
24 (24)
64
¬HM
210 (210)
126 (126)
336
Σcol
250
150
400
χ
2 can be evaluated as follows
where O
i is the observed frequency, and E
i is the expected frequency. Since the expected values equal the observed ones, we have χ
2 = 0.
Question 3 What is the value range of the χ2 measure?
Question Explanation By definition, the correct answer is: "[0, +∞)".
Question 4 Which of the following measures is null invariant?
Question Explanation The correct answer is: "Kulcyzynski". Null transactions are not considered in Kulcyzynski.
Question 5 Suppose we are interested in analyzing the transaction history of several supermarkets with respect to purchase of apples(A) and bananas(B). We have the following table summarizing the transactions.
Supermarket
AB
¬AB
A ¬B
¬A ¬B
S1
100,000
7,000
3,000
300
S2
100,000
7,000
3,000
90,000
Denote χ
2 i as the χ
2 measure and c
i as the cosine measure for supermarket S
i (i = 1, 2). Which of the following is correct?
Question Explanation The correct answer is: "χ2 1 ≠ χ2 2, c1 = c2 " χ2 is not null invariant and therefore sensitive to the number of null transactions, while cosine is null invariant.
Question 6 Consider the support-based and null-invariant definitions for negative patterns. For negative pattern threshold ε = 0.011, which of the following patterns would be considered a negative pattern by the null-invariant definition but not the support-based definition?
Question Explanation The correct answer is: "A media content provider has 1,000,000 users. Movie A and Movie B were viewed by 1000 users each in the last month, but only 10 users viewed both."
Null-invariant:
Support based:
Thus, {Movie A, Movie B} is a negative pattern by the null-invariant definition but not the support based definition.
Null-invariant:
Support based:
Thus, {DM, Music} is not a negative pattern by either definition.
Question 7 Consider two patterns
P1 and
P2 such that O(
P1 ) ⊆ O(
P2 ), where O(
Pi ) is the corresponding itemset of pattern
Pi . Take a second to convince yourself that the following is true:
Which of the following patterns in Table 1 is δ-covered by {F, A, C, E, T, S} for δ=0.4? Select all that apply.
Your Answer Score Explanation {A, C, E, S}Correct 0.25 {A, C, T, S}Correct 0.25 {F, A, C, T, S}Inorrect 0.00 {F, A, C, E, S}Correct 0.25 Total 0.75 / 1.00
Question Explanation The correct answers are: "{F, A, C, T, S}" and "{A, C, T, S}". Dist ({F, A, C, E, T, S}, {A, C, E, S}) = 1- 101758/205227 = 0.504 Dist ({F, A, C, E, T, S}, {F, A, C, E, S}) = 1- 101758/205211 = 0.504 Dist ({F, A, C, E, T, S}, {F, A, C, T, S}) = 1- 101758/161563 = 0.370 Dist ({F, A, C, E, T, S}, {A, C, T, S}) = 1- 101758/161576 = 0.370 Thus, {F, A, C, T, S}, {A, C, T, S} are 0.4-covered by {F, A, C, E, T, S}
Question 8 Given the transactions in Table 2, which of the following is true? Select all that apply.
Question Explanation The correct answers are: "(abe) is a (2, 0.5)-robust pattern", "(ac) is a 0.5-core pattern of (abcef)", "(a) is a 0.5-core pattern of (abe)", and "(f) is a 0.5-core pattern of (bcf)". Since (e) is still a 0.5-core pattern of (abe), (abe) is a (2, 0.5)-robust pattern. By the same token, (abcef) is a (4, 0.5)-robust pattern, not a (3, 0.5)-robust pattern. (ac) is a 0.5-core pattern of (abcef) since only one other pattern contains (ac). (a) is a 0.5-core pattern of (abe) since it has a core ratio of 2/3. By the same token, (f) is a 0.5-core pattern of (bcf).
Question 9 A constraint is anti-monotone if an itemset S violates the constraint, so do all of its supersets. Which of following constraints is anti-monotone?
Question Explanation The correct answer is: "range(S.price) < 10"
Suppose X is the superset of S; if so, the following inequalities always hold for all X:
avg(X.price) <> avg(S.price) sum(X.price) ≥ sum(S.price) var(X.price) <> var(S.price) range(X.price) ≥ range(S.price) From these inequalities, we can easily verify that range(S.price) < 10 is anti-monotone.
Question 10 A constraint is monotone if an itemset S satisfies the constraint, so do all of its supersets. Which of following constraints is monotone?
Question Explanation The correct answer is: "sum(S.price) > 20"
Suppose X is the superset of S; if so, the following inequalities always hold for all X:
support of X ≤ support of S min(X.price) ≤ min(S.price) avg(X.price) <> avg(S.price) sum(X.price) ≥ sum(S.price) From these inequalities, we can easily verify that sum(S.price) > 20 is monotone.
Question 11 A constraint is succinct if the constraint c can be enforced by directly manipulating the data. Which of following constraints is succinct?
Question Explanation The correct answer is: "max(S.price) < 20"