Data Mining
) Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?
Supervised learning
Hybrid learning
Unsupervised learning
Reinforcement learning
Which one of the following refers to querying the unstructured textual data?
Information access
Information update
Information retrieval
Information manipulation
Which of the following is an essential process in which the intelligent methods are applied to extract data patterns
Data Mining
Warehousing
Text Mining
Data Selection
What is KDD in data mining?
Knowledge Discovery Data
Knowledge Discovery Database
Knowledge Data definition
Knowledge data house
What are the functions of Data Mining?
Association and correctional analysis classification
Prediction and characterization
Cluster analysis and Evolution analysis
All of the above
In the following given diagram, which type of clustering is used?
Naive Bayes
Hierarchal
Partitional
None of the above
Which of the following statements is incorrect about the hierarchal clustering?
The choice of an appropriate metric can influence the shape of the cluster
In general, the splits and merges both are determined in a greedy manner
All of the above
The hierarchal type of clustering is also known as the HCA
Which one of the following can be considered as the final output of the hierarchal type of clustering?
Assignment of each point to clusters
Finalize estimation of cluster centroids
A tree which displays how the close thing are to each other
None of the above
Which one of the following statements about the K-means clustering is incorrect?
The goal of the k-means clustering is to partition (n) observation into (k) clusters
The nearest neighbor is the same as the K-means
K-means clustering can be defined as the method of quantization
All of the above
Which one of the clustering technique needs the merging approach?
Partitioned
Naïve Bayes
None of the above
Hierarchical
The self-organizing maps can also be considered as the instance of _________ type of learning.
Supervised learning
Unsupervised learning
Missing data imputation
None of the above
Which of the following statement is true about the classification?
It is a subdivision of a set
It is a measure of accuracy
It is the task of assigning a classification
None of the above
Which of the following statements is correct about data mining?
It can be referred to as the procedure of mining knowledge from data
Data mining can be defined as the procedure of extracting information from a set of data
The procedure of data mining also involves several other processes like data cleaning, data transformation, and data integration
All of the above
Which of the following can be considered as the classification or mapping of a set or class with some predefined group or classes?
Data set
Data Characterization
Data Sub Structure
Data Discrimination
The analysis performed to uncover the interesting statistical correlation between associated -attributes value pairs are known as the _______.
Mining of correlation
Mining of association
Mining of clusters
All of the above
Which one of the following statements is not correct about the data cleaning?
It refers to the process of data cleaning
It refers to the transformation of wrong data into correct data
It refers to correcting inconsistent data
All of the above
Which one of the following correctly defines the term cluster?
Symbolic representation of facts or ideas from which information can potentially be extracted
Group of similar objects that differ significantly from other objects
Operations on a database to transform or simplify data in order to prepare it for a machine-learning algorithm
All of the above
Which of the following correctly refers the data selection?
A subject-oriented integrated time-variant non-volatile collection of data in support of management
The actual discovery phase of a knowledge discovery process
The stage of selecting the right data for a KDD process
All of the above
Which one of the following correctly refers to the task of the classification?
A measure of the accuracy, of the classification of a concept that is given by a certain theory
The task of assigning a classification to a set of examples
A subdivision of a set of examples into a number of classes
None of the above
Which of the following correctly defines the term "Hybrid"?
Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.
Decision support systems that contain an information base filled with the knowledge of an expert formulated in terms of if-then rules.
None of these
Combining different types of method or information
Which one of the following can be considered as the correct application of the data mining?
Fraud detection
Corporate Analysis & Risk management
All of the above
Management and market analysis
Which of the following refers to the steps of the knowledge discovery process, in which the several data sources are combined?
Data selection
Data integration
Data cleaning
Data transformation
The term "DMQL" stands for _____
Data Mining Query Language
None of the above
DBMiner Query Language
Data Marts Query Language
Which one of the following issues must be considered before investing in data mining?
Compatibility
All of the above
Functionality
Vendor consideration
Which of the following also used as the first step in the knowledge discovery process?
Data selection
Data cleaning
Data transformation
Data integration
�…. Is a data field, representing a characteristic or feature of a data object.
Dimension
Attribute
Fact
A and B
All of the following are attribute types except:
Descriptive
Binary
Numeric
Nominal
Given two objects represented by tuples (3, 5) and (2, 0). What is the Euclidean distance between two objects?
5.1
5
3
None
Given two objects represented by tuples (3, 5) and (2, 0). What is the Manhattan distance between two objects?
5.1
5
6
None
Euclidean distance is also known as ______________
L-1 norm
L-2 norm
L-3 norm
None
�.. Attribute has only a finite or countably infinite set of values while …. Attribute has real numbers as attribute values.
Discrete, continuous
Continuous, discrete
Quantitative, qualitative
Qualitative, quantitative
Value that occurs most frequently in the data?
Mean
Mode
Median
None
Middle value if odd number of values, or average of the middle two values otherwise.
Mean
Mode
Median
None
At symmetric data distribution
Mean = Mode
Mode = Median
A and B
None

The following graph represents …… data.
Positively correlated data
Uncorrelated data
Negatively correlated data
All of the above
All of the following measure data dispersion except:
Variance
Inter-Quartile range
Standard deviation
Mode
Basic statistical descriptions can be displayed graphically by:
Boxplot
Histogram
Scatter plot
All of the above
�… Graph display of tabulated frequencies, shown as bars
Boxplot
Histogram
Scatter plot
All of the above
�… is a numerical measure of how alike two data objects are.
Similarity
Dissimilarity
Proximity
All of the above
Is a numerical measure of how different two data objects are.
Similarity
Dissimilarity
Proximity
All of the above
We use data visualization to ….
To Gain insight into an information space
Search for patterns
Provide qualitative overview of large data sets
All of the above
In Pixel-Oriented Visualization Techniques for a data set of M dimensions we create ….. Windows on the screen.
M
M + 1
2 ^ M
M – 1
Data set has more than mode
Unimodal
Bimodal
Trimodal
All of the above
Data objects are described by attributes.
True
False
Data objects are made up of data sets
True
False
Dissimilarity value is lower when objects are more alike
True
False
Manhattan distance is represented by a direct line
True
False
Similarity value is lower when objects are more alike
True
False
We can achieve the Pixel-Oriented Visualization Techniques using direct visualization
True
False
Scatterplot is one of the ways to achieve Geometric Projection Visualization Techniques.
True
False
In Pixel-Oriented Visualization Techniques the colors of the pixels reflect the corresponding dimensions.
True
False
We use data visualization because it helps find interesting regions and suitable parameters for further quantitative analysis.
True
False
In Parallel Coordinates the axes are equidistant.
True
False
...........,routines work to “clean” the data by filling in missing values, smooth- ing noisy data, identifying or removing outliers, and resolving inconsistencies.
Data integration
Data mining
Data cleaning
None of them
)All of them are methods of Missing Values except.
Ignore the tuple:
Fill in the missing value manually
Use a global constant to fill in the missing value
integration
………, data encoding schemes are applied so as to obtain a reduced or “compressed” representation of the original data.
In dimensionality reduction
Numerosity reduction
A and B
None of them
..........., the data are replaced by alternative, smaller representa-tions using parametric models or nonparametric models
In dimensionality reduction
Numerosity reduction
A and B
None of them
……… methods smooth a sorted data value by consulting its “neighbor- hood,” that is, the values around it.
Smoothing
Binning
Storing
None of them
........... Says that each value of the given attribute must be different from all other values for that attribute.
A unique rule
A consecutive rule
A null rule
None of them
........says that there can be no miss- ing values between the lowest and highest values for the attribute, and that all values must also be unique.
A unique rule
A consecutive rule
A null rule
None of them
………..specifies the use of blanks, question marks, special characters, or other strings that may indicate the null condition.
A unique rule
A consecutive rule
A null rule
None of them
...........tools use simple domain knowledge to detect errors and make corrections in the data. These tools rely on parsing and fuzzy matching techniques when cleaning data from multiple sources.
Data cleaning
Data integration
Data mining
Data scrubbing
Which of the following is a common technique used in data cleaning?
Data normalization
Data aggregation
Outlier detection
Data sampling
Which of the following is a technique used to combine data from multiple sources?
Data transformation
Data reduction
Data integration
Data cleaning
Which of the following is a technique used to remove noise from data?
Data sampling
Data discretization
Data normalization
Data smoothing
Which of the following is a technique used in data reduction?
Data sampling
Data normalization
Outlier detection
Data integration
Which of the following is a technique used to convert categorical data into numerical data?
Data normalization
Encoding
Data integration
Data transformation
Which of the following is a technique used to identify and remove duplicate records in a dataset?
Data sampling
Data integration
Data normalization
Data deduplication
Which of the following is a technique used to remove missing values from a dataset
Data imputation
Data normalization
Outlier detection
Data discretization
Which of the following is a technique used to reduce the number of variables in a dataset?
Principal Component Analysis (PCA)
Data normalization
Outlier detection
Data discretization
Which of the following is a technique used to scale data to a specific range?
Data normalization
Data discretization
Data integration
Data transformation
Which of the following is a technique used to transform data into a new representation?
Data sampling
Data integration
Data normalization
Data transformation
Which of the following is a technique used to identify and remove irrelevant or redundant variables in a dataset?
Principal Component Analysis (PCA)
Data normalization
Outlier detection
Feature selection
Which of the following is a technique used to reduce the dimensionality of a dataset?
Principal Component Analysis (PCA)
Data normalization
Outlier detection
Data discretization
Which of the following is a technique used to convert continuous data into discrete intervals?
Data normalization
Data discretization
Data integration
Data transformation
Which of the following is a technique used to standardize data by subtracting the mean and dividing by the standard deviation?
Data normalization
Data discretization
Data integration
Data transformation
Which of the following is a technique used to replace missing values in a dataset with the median value?
Data imputation
Data normalization
Outlier detection
Data discretization
Which of the following is a technique used to identify and handle inconsistent data in a dataset?
Data sampling
Data transformation
Data cleaning
Data imputation
Which of the following is a technique used to reduce the number of dimensions in a dataset while retaining important information?
Data discretization
Data reduction
Data normalization
Data transformation
Which of the following is a technique used to handle missing values in a dataset by predicting the missing values using statistical models?
Data sampling
Data normalization
Outlier detection
Data imputation
Which of the following is a technique used to transform data into a form that is more suitable for machine learning algorithms?
Data normalization
Data discretization
Data integration
Data transformation
Which of the following is a technique used to reduce the number of dimensions in a dataset by transforming the data into a new space with fewer dimensions?
Principal Component Analysis (PCA)
Data normalization
Outlier detection
Data discretization
Which of the following is a technique used to convert categorical data into numerical data by creating a new binary variable for each category?
One-hot encoding
Data discretization
Data sampling
Data transformation
Data cleaning is the process of identifying and correcting or removing inaccurate or irrelevant data from a dataset.
True
False
Data integration is the process of converting data from one format to another.
True
False
Data transformation is the process of converting data from one format to another.
True
False
Data discretization is the process of converting continuous data into discrete intervals.
True
False
Outlier detection is the process of identifying data points that are significantly different from the rest of the data.
True
False
Data sampling is the process of selecting a subset of data from a larger dataset.
True
False
Data normalization is the process of converting data into a standard format.
True
False
Data imputation is the process of identifying and removing missing values from a dataset.
True
False
Feature selection is the process of identifying and removing irrelevant or redundant variables from a dataset.
True
False
Data smoothing is the process of identifying and removing noise from a dataset.
True
False
Data deduplication is the process of identifying and removing duplicate records from a dataset.
True
False
Data discretization is the process of converting categorical data into numerical data.
True
False
Principal Component Analysis (PCA) is a technique used in data reduction.
True
False
Data transformation is the process of scaling data to a specific range.
True
False
Data normalization and standardization are the same thing.
True
False
Data integration is the process of combining data from multiple sources into a single format.
True
False
Data imputation is the process of identifying and handling inconsistent data in a dataset.
True
False
Data reduction is the process of increasing the number of dimensions in a dataset.
True
False
...... Refers to a data repository that is maintained separately from an organization’s operational databases.
. Database
. Data warehouse
Data lake
None of them
The major features of a data warehouse are.............
. subject-oriented, integrated
time-variant, nonvolatile
Both a and b
None of them
Data warehouses usually require only two operations in data accessing, which are .....
. Initial loading of data
access of data
Both a and b
None of them
Systems that perform online transaction and query processing are called....
OLTP
OLAP
DBMS
None of them
systems that can organize and present data in various formats in order to accommodate the diverse needs of different users, are called.....
OLTP
. OLAP
DBMS
None of them
Data warehouses often adopt a ..... architecture.
. Four-tier
Two-tier
Three-tier
None of them
From bottom to top, the order of the three-tier architecture levels of a data warehouse is.....
Warehouse database server, front-end client layer and OLAP server
warehouse database server, OLAP server and front-end client layer
front-end client layer, OLAP server and warehouse database server
None of them
Examples of data Warehouse Models are....
Enterprise Warehouse
. Data Mart
Virtual Warehouse
All of them
.......contains a subset of corporate-wide data that is of value to a specific group of users.
Enterprise Warehouse
Data Mart
Virtual Warehouse
None of them
.... Data marts are sourced from data captured from external information providers but ..... Data marts are sourced directly from enterprise data warehouses.
Independent, dependent
Dependent, Independent
Dependent, Dependent
None of them
.....gathers data from multiple, heterogeneous, and external sources.
Data extraction
Data cleaning
Data transformation
. None of them
......detects errors in the data and rectifies them when possible.
Data extraction
Data cleaning
Data transformation
None of them
.....which converts data from legacy or host format to warehouse format.
Data extraction
Data cleaning
Data transformation
. None of them
.....are the perspectives or entities with respect to which an organization wants to keep records.
Facts
Dimensions
Table
None of them
.....are numeric measures or quantities by which we want to analyze relationships between dimensions.
Facts
Dimensions
Table
None of them
The most popular data model for a data warehouse is a multidimensional model, which can exist in the form of ......
Star schema
snowflake schema
fact constellation schema
All of them
A concept hierarchy that is a total or partial order among attributes in a database schema is called ....
Schema hierarchy
Partial hierarchy
Total hierarchy
None of them
Measures can be organized into categories, such as....
distributive
algebraic
. holistic
All of them
.... Is an OLAP operation that performs aggregation on a data cube, either by climbing up a concept hierarchy for a dimension or by dimension reduction.
. Roll up
Drill down
Slice and dice
Pivot
.... Is an OLAP operation that's the reverse of roll-up and It navigates from less detailed data to more detailed data.
Roll up
Drill down
Slice and dice
Pivot
.... Is an OLAP operation that performs a selection on one dimension of the given cube, resulting in a subcube.
Roll up
Drill down
Slice and dice
Pivot
.... Is an OLAP operation that is a visualization operation that rotates the data axes in view to provide an alternative data presentation.
Roll up
Drill down
Slice and dice
Pivot
The construction of data warehouses involves data cleaning, data integration, and data transformation.
True
False
Data warehouses provide online transaction processing (OLTP) tools for the interactive analysis of multidimensional data
True
False
Data mining functions can be integrated with OLAP operations to enhance interactive mining of knowledge at multiple levels of abstraction.
True
False
The major features of a data warehouse are subject-oriented, integrated, time-variant, and nonvolatile
True
False
data warehouse doesn't focus on the modeling and analysis of data for decision makers
True
False
A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and online transaction records.
True
False
In data warehouses, data are stored to provide information from an historic perspective (e.g., the past 5–10 years).
True
False
A data warehouse does require transaction processing, recovery, and concurrency control mechanisms.
True
False
Data warehousing employs an query-driven approach.
True
False
query processing in data warehouses does not interfere with the processing at local sources
True
False
An OLTP system is customer-oriented but an OLAP system is market-oriented.
True
False
An OLAP system manages current data that are too detailed but an OLTP system manages large amounts of historic data
True
False
An OLTP system usually adopts an application-oriented database design but an OLAP system adopts an subject-oriented database design.
True
False
The access patterns of an OLTP system consist mainly of short, atomic transactions but accesses to OLAP systems are mostly read-only operations.
True
False
An OLAP server is typically implemented using either a ROLAP model or a MOLAP model
True
False
A virtual warehouse is a set of views over operational databases.
True
False
Metadata are data about data.
True
False
Data warehouses and OLAP tools are based on a one-dimensional data model.
True
False
A data cube allows data to be modeled and viewed in multiple dimensions
True
False
The cuboid that holds the lowest level of summarization is called the base cuboid.
True
False
The 0-Dimension cuboid, which holds the highest level of summarization, is called the apex cuboid
True
False
A concept hierarchy defines a sequence of mappings from a set of low-level concepts to higher-level, more general concepts
True
False
Drill-through is an OLAP operation that executes queries involving more than one fact table.
True
False
Drill-across is an OLAP operation that uses relational SQL facilities to drill through the bottom level of a data cube down to its back-end relational tables.
True
False
A statistical database is a database system that is designed to support statistical applications.
True
False
A starnet model is a model that consists of radial lines emanating from a central point, where each line represents a concept hierarchy for a dimension.
True
False
Concept hierarchies can be used to generalize data or to specialize data.
True
False
{"name":"Data Mining", "url":"https://www.quiz-maker.com/QPREVIEW","txt":") Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?, Which one of the following refers to querying the unstructured textual data?, Which of the following is an essential process in which the intelligent methods are applied to extract data patterns","img":"https://www.quiz-maker.com/3012/CDN/90-4381567/screenshot-2023-06-06-192034.png?sz=1200"}
More Quizzes
Hogyan szerepeltél volna a legutóbbi matekérettségin?
520
Testing With Cucumber
1050
Arian Or Nicean? Who Said it?
1059
Do you have any recommendations in new papers today?
100
Free Phys Sci: Conservation of Energy
201025553
Surf's Up: Face the Ultimate Penguin Surf Challenge
201028164
Free Measuring Angles Worksheets: & Review
201026780
Ultimate Muscular System - MCQs & Answers Challenge
201023853
Rent a Girlfriend Plot Twist: Only True Fans Score High
201035029
Discover Your Inner Superpower: Free Superpower
201025324
Titian's Most Famous Work: Can You Ace It?
201075879
NFL Teams: Can You Name All 32 Teams?
201034543