求解:数据仓库与数据挖掘题1
不好意思是英文版的。如果分嫌少可以在加!请各位达人帮忙,谢了!
每题10分!
一、Data warehouse design
(1) Enumerate three classes of schemas that are popularly used for modeling data warehouses.
(2) Draw a snowflake schema diagram for the Big_University data warehouse which consists of four dimensions: student, course, semester and instructor, and two measures: count, and avg_grade, where avg_grade is the actual grade of student in the lowest concept layer, whereas in the higher concept layers, avg_grade is the average grade for the given student, course, semester and instructor.
(3) Starting with the base cuboid (student, course, semester, instructor), what specific OLAP operations should be performed in order to list the average grade of each student taken the course of “CS”, eg, roll up from “semester” to “year”?
(4) If each dimension contains 5 layers(including all), eg, student < major < status < university < all, then how many cuboids in this data cube ( including base cuboid and apex cuboid)?
二、Data cube computation
Suppose a base cuboid has 3 dimensions, (A, B, C), with the number of cells shown below: |A| = 1,000,000, |B| = 100, and |C| = 1,000. Suppose each dimension is partitioned evenly into 10 portions for chunking.
(1) Assuming each dimension has only one level, draw the complete lattice of the cube.
(2) If each cube cell stores one measure with 4 bytes, what is the total size of the computed cube if the cube is dense?
(3) If the cube is very sparse, describe an effective multidimensional array structure to store the sparse cube.
(4) State the order for computing the chunks in the cube which requires the least amount of space, and compute the total amount of main memory space required for computing the 2-D planes.
三、Mining association rules
Suppose we have the following transactional data.
TID Items_bought
T100 {K, A, D, B}
T200 {D, A, C, E, B}
T300 {C, A, B, E}
T400 {B, A, D}
Assume that the minimum support and minimum confidence thresholds are 60% and 80%, respectively.
(1) Find the set of frequent itemsets using the Apriori algorithm and FP-tree respectively. Show the derivation of Ck and Lk for each iteration k in Apriori algorithm and show the “conditional pattern base, conditional FP-tree, frequent patterns” for each item in FP-tree as showed in Table 6-1 of textbook.
(2) Generate strong association rules from the frequent itemsets (with support and confidence) found above.
[解决办法]
Enumerate three classes of schemas that are popularly used for modeling data warehouses.
列举出在数据仓库中常用的3类建模模式
[解决办法]
1) Enumerate three classes of schemas that are popularly used for modeling data warehouses.
a:
star schema,snowflake
其他的记不住了
建议看oracle帮助 data warehousing guide 里头都有
[解决办法]
学习,加油!