1 Introduction
- [Fall 2008]
For each data set given below, give specic examples of classication, clustering, association rule mining, and anomaly detection tasks that can be performed on the data. For each task, state how the data matrix should be constructed (i.e., specify the rows and columns of the matrix).(a) Ambulatory Medical Care data 1 , which contains the demographic and medical visit information for each patient (e.g., gender, age, duration of visit, physician's diagnosis, symptoms, medication, etc).
Answer:
Classication
Task: Diagnose whether a patient has a disease.
Row: Patient
Column: Patient's demographic and hospital visit information (e.g., symptoms), along with a class attribute that indicates whether the patient has the disease.Clustering
Task: Find groups of patients with similar medical conditions
Row: A patient visit
Column: List of medical conditions of each patient
Association rule mining Task: Identify the symptoms and medical conditions that co-occur together frequently
Row: A patient visit
Column: List of symptoms and diagnosed medical conditions of the patient
Anomaly detection
Task: Identify healthy looking patients with rare medical disorders
Row: A patient visit
Column: List of demographic attributes, symptoms, and medical test results of the patient 1 See for example, the National Hospital Ambulatory Medical Care Surveyhttp://www.cdc.gov/nchs/about/major/ahcd/ahcd1.htm Introduction to Data Mining 2e Pang-Ning Tan, Michael Steinbach, Vipin Kumar (Test Bank All Chapters, 100% Original Verified, A+ Grade) 1 / 4
- Chapter 1 Introduction
(b) Stock market data, which include the prices and volumes of various stocks on dierent trading days.
Answer:
Classication
Task: Predict whether the stock price will go up or down the next trading day
Row: A trading day
Column: Trading volume and closing price of the stock the previous 5 days and a class attribute that indicates whether the stock went up or down Clustering
Task: Identify groups of stocks with similar price uctuations
Row: A company's stock
Column: Changes in the daily closing price of the stock over the past ten years
Association rule mining Task: Identify stocks with similar uctuation patterns(e.g.,fGoogle-Up, Yahoo-Upg)
Row: A trading day
Column: List of all stock-up and stock-down events on the given day.
Anomaly detection Task: Identify unusual trading days for a given stock (e.g., unusually high volume)
Row: A trading day
Column: Trading volume, change in daily stock price (daily highlow prices), and average price change of its competitor stocks (c) Database of Major League Baseball (MLB).Classication
Task: Predict the winner of a game between two MLB teams.
Row: A game.
Column: Statistics of the home and visiting teams over their past 10 games they had played(e.g., average winning percentage and hitting percentage of their players) Clustering
Task: Identify groups of players with similar statistics
Row: A player
Column: Statistics of the player
Association rule mining Task: Identify interesting player statistics (e.g., 40% of right-handed players have a battingpercentage below 20% when facing left-handed pitchers)
Row: A player
Column: Discretized statistics of the player
Anomaly detection Task: Identify players who performed considerably better than expected in a given season
Row: A (player,season) pair e.g, (player1 in 2007)
Column: Ratio statistics of a player (e.g., ratio of average batting percentage in 2007 tocareer average batting percentage)
2 2 / 4
2 Data 2.1 Types of Attributes 1.classify them as qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have more than one interpretation, so briey indicate your reasoning if you think there may be some ambiguity.(a)
Answer:Discrete, quantitative, ratio.
(b)
Answer:Discrete, quantitative, ratio.
(c)
Answer:Continuous, quantitative, interval or ratio. It is actually
a logratio type (which is somewhere between interval and ratio).(d)
Answer:Discrete, qualitative, ordinal.
(e)
Answer:Discrete, qualitative, nominal.
2.discrete or continuous.qualitative or quantitative nominal, ordinal, interval, or ratio 3 / 4
- Chapter 2 Data
Some cases may have more than one interpretation, so briey indicate your reasoning if you think there may be some ambiguity.(a) Greenwich Mean Time of January 1, 4713 BC.
Answer:Continuous, quantitative, interval
(b)
Answer:Discrete, qualitative, ordinal
(c) or frustrated).
Answer:Discrete, qualitative, nominal
(d)
Answer:Continuous, quantitative, ratio
(e)
Answer:Discrete, qualitative, nominal
(f)
Answer:Continuous, qualitative, ordinal
In terms of energy release, the dierence between 0.0 and 1.0 is not the same as between 1.0 and 2.0. Ordinal attributes are qualitative; yet, can be continuous.(g)
Answer:Continuous, quantitative, interval
(h) measuring years in college.
Answer:Discrete, qualitative, ordinal
3.discrete or continuous AND qualitative or quantitative AND nominal, ordinal, interval, or ratio Indicate your reasoning if you think there may be some ambiguity in some cases.
Example:Age in years.
Answer:Discrete, quantitative, ratio.
- / 4