Survey Maker Survey Templates Contact Privacy & Security

Data Mining

) Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?

Supervised learning

Hybrid learning

Unsupervised learning

Reinforcement learning

Which one of the following refers to querying the unstructured textual data?

Information access

Information update

Information retrieval

Information manipulation

Which of the following is an essential process in which the intelligent methods are applied to extract data patterns

Data Mining

Warehousing

Text Mining

Data Selection

What is KDD in data mining?

Knowledge Discovery Data

Knowledge Discovery Database

Knowledge Data definition

Knowledge data house

What are the functions of Data Mining?

Association and correctional analysis classification

Prediction and characterization

Cluster analysis and Evolution analysis

All of the above

In the following given diagram, which type of clustering is used?

Naive Bayes

Hierarchal

Partitional

None of the above

Which of the following statements is incorrect about the hierarchal clustering?

The choice of an appropriate metric can influence the shape of the cluster

In general, the splits and merges both are determined in a greedy manner

All of the above

The hierarchal type of clustering is also known as the HCA

Which one of the following can be considered as the final output of the hierarchal type of clustering?

Assignment of each point to clusters

Finalize estimation of cluster centroids

A tree which displays how the close thing are to each other

None of the above

Which one of the following statements about the K-means clustering is incorrect?

The goal of the k-means clustering is to partition (n) observation into (k) clusters

The nearest neighbor is the same as the K-means

K-means clustering can be defined as the method of quantization

All of the above

Which one of the clustering technique needs the merging approach?

Partitioned

Naïve Bayes

None of the above

Hierarchical

The self-organizing maps can also be considered as the instance of _________ type of learning.

Supervised learning

Unsupervised learning

Missing data imputation

None of the above

Which of the following statement is true about the classification?

It is a subdivision of a set

It is a measure of accuracy

It is the task of assigning a classification

None of the above

Which of the following statements is correct about data mining?

It can be referred to as the procedure of mining knowledge from data

Data mining can be defined as the procedure of extracting information from a set of data

The procedure of data mining also involves several other processes like data cleaning, data transformation, and data integration

All of the above

Which of the following can be considered as the classification or mapping of a set or class with some predefined group or classes?

Data set

Data Characterization

Data Sub Structure

Data Discrimination

The analysis performed to uncover the interesting statistical correlation between associated -attributes value pairs are known as the _______.

Mining of correlation

Mining of association

Mining of clusters

All of the above

Which one of the following statements is not correct about the data cleaning?

It refers to the process of data cleaning

It refers to the transformation of wrong data into correct data

It refers to correcting inconsistent data

All of the above

Which one of the following correctly defines the term cluster?

Symbolic representation of facts or ideas from which information can potentially be extracted

Group of similar objects that differ significantly from other objects

Operations on a database to transform or simplify data in order to prepare it for a machine-learning algorithm

All of the above

Which of the following correctly refers the data selection?

A subject-oriented integrated time-variant non-volatile collection of data in support of management

The actual discovery phase of a knowledge discovery process

The stage of selecting the right data for a KDD process

All of the above

Which one of the following correctly refers to the task of the classification?

A measure of the accuracy, of the classification of a concept that is given by a certain theory

The task of assigning a classification to a set of examples

A subdivision of a set of examples into a number of classes

None of the above

Which of the following correctly defines the term "Hybrid"?

Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.

Decision support systems that contain an information base filled with the knowledge of an expert formulated in terms of if-then rules.

None of these

Combining different types of method or information

Which one of the following can be considered as the correct application of the data mining?

Fraud detection

Corporate Analysis & Risk management

All of the above

Management and market analysis

Which of the following refers to the steps of the knowledge discovery process, in which the several data sources are combined?

Data selection

Data integration

Data cleaning

Data transformation

The term "DMQL" stands for _____

Data Mining Query Language

None of the above

DBMiner Query Language

Data Marts Query Language

Which one of the following issues must be considered before investing in data mining?

Compatibility

All of the above

Functionality

Vendor consideration

Which of the following also used as the first step in the knowledge discovery process?

Data selection

Data cleaning

Data transformation

Data integration

�…. Is a data field, representing a characteristic or feature of a data object.

Dimension

Attribute

Fact

A and B

All of the following are attribute types except:

Descriptive

Binary

Numeric

Nominal

Given two objects represented by tuples (3, 5) and (2, 0). What is the Euclidean distance between two objects?

5.1

None

Given two objects represented by tuples (3, 5) and (2, 0). What is the Manhattan distance between two objects?

5.1

None

Euclidean distance is also known as ______________

L-1 norm

L-2 norm

L-3 norm

None

�.. Attribute has only a finite or countably infinite set of values while …. Attribute has real numbers as attribute values.

Discrete, continuous

Continuous, discrete

Quantitative, qualitative

Qualitative, quantitative

Value that occurs most frequently in the data?

Mean

Mode

Median

None

Middle value if odd number of values, or average of the middle two values otherwise.

Mean

Mode

Median

None

At symmetric data distribution

Mean = Mode

Mode = Median

A and B

None

The following graph represents …… data.

Positively correlated data

Uncorrelated data

Negatively correlated data

All of the above

All of the following measure data dispersion except:

Variance

Inter-Quartile range

Standard deviation

Mode

Basic statistical descriptions can be displayed graphically by:

Boxplot

Histogram

Scatter plot

All of the above

�… Graph display of tabulated frequencies, shown as bars

Boxplot

Histogram

Scatter plot

All of the above

�… is a numerical measure of how alike two data objects are.

Similarity

Dissimilarity

Proximity

All of the above

Is a numerical measure of how different two data objects are.

Similarity

Dissimilarity

Proximity

All of the above

We use data visualization to ….

To Gain insight into an information space

Search for patterns

Provide qualitative overview of large data sets

All of the above

In Pixel-Oriented Visualization Techniques for a data set of M dimensions we create ….. Windows on the screen.

M + 1

2 ^ M

M – 1

Data set has more than mode

Unimodal

Bimodal

Trimodal

All of the above

Data objects are described by attributes.

True

False

Data objects are made up of data sets

True

False

Dissimilarity value is lower when objects are more alike

True

False

Manhattan distance is represented by a direct line

True

False

Similarity value is lower when objects are more alike

True

False

We can achieve the Pixel-Oriented Visualization Techniques using direct visualization

True

False

Scatterplot is one of the ways to achieve Geometric Projection Visualization Techniques.

True

False

In Pixel-Oriented Visualization Techniques the colors of the pixels reflect the corresponding dimensions.

True

False

We use data visualization because it helps find interesting regions and suitable parameters for further quantitative analysis.

True

False

In Parallel Coordinates the axes are equidistant.

True

False

...........,routines work to “clean” the data by filling in missing values, smooth- ing noisy data, identifying or removing outliers, and resolving inconsistencies.

Data integration

Data mining

Data cleaning

None of them

)All of them are methods of Missing Values except.

Ignore the tuple:

Fill in the missing value manually

Use a global constant to fill in the missing value

integration

………, data encoding schemes are applied so as to obtain a reduced or “compressed” representation of the original data.

In dimensionality reduction

Numerosity reduction

A and B

None of them

..........., the data are replaced by alternative, smaller representa-tions using parametric models or nonparametric models

In dimensionality reduction

Numerosity reduction

A and B

None of them

……… methods smooth a sorted data value by consulting its “neighbor- hood,” that is, the values around it.

Smoothing

Binning

Storing

None of them

........... Says that each value of the given attribute must be different from all other values for that attribute.

A unique rule

A consecutive rule

A null rule

None of them

........says that there can be no miss- ing values between the lowest and highest values for the attribute, and that all values must also be unique.

A unique rule

A consecutive rule

A null rule

None of them

………..specifies the use of blanks, question marks, special characters, or other strings that may indicate the null condition.

A unique rule

A consecutive rule

A null rule

None of them

...........tools use simple domain knowledge to detect errors and make corrections in the data. These tools rely on parsing and fuzzy matching techniques when cleaning data from multiple sources.

Data cleaning

Data integration

Data mining

Data scrubbing

Which of the following is a common technique used in data cleaning?

Data normalization

Data aggregation

Outlier detection

Data sampling

Which of the following is a technique used to combine data from multiple sources?

Data transformation

Data reduction

Data integration

Data cleaning

Which of the following is a technique used to remove noise from data?

Data sampling

Data discretization

Data normalization

Data smoothing

Which of the following is a technique used in data reduction?

Data sampling

Data normalization

Outlier detection

Data integration

Which of the following is a technique used to convert categorical data into numerical data?

Data normalization

Encoding

Data integration

Data transformation

Which of the following is a technique used to identify and remove duplicate records in a dataset?

Data sampling

Data integration

Data normalization

Data deduplication

Which of the following is a technique used to remove missing values from a dataset

Data imputation

Data normalization

Outlier detection

Data discretization

Which of the following is a technique used to reduce the number of variables in a dataset?

Principal Component Analysis (PCA)

Data normalization

Outlier detection

Data discretization

Which of the following is a technique used to scale data to a specific range?

Data normalization

Data discretization

Data integration

Data transformation

Which of the following is a technique used to transform data into a new representation?

Data sampling

Data integration

Data normalization

Data transformation

Which of the following is a technique used to identify and remove irrelevant or redundant variables in a dataset?

Principal Component Analysis (PCA)

Data normalization

Outlier detection

Feature selection

Which of the following is a technique used to reduce the dimensionality of a dataset?

Principal Component Analysis (PCA)

Data normalization

Outlier detection

Data discretization

Which of the following is a technique used to convert continuous data into discrete intervals?

Data normalization

Data discretization

Data integration

Data transformation

Which of the following is a technique used to standardize data by subtracting the mean and dividing by the standard deviation?

Data normalization

Data discretization

Data integration

Data transformation

Which of the following is a technique used to replace missing values in a dataset with the median value?

Data imputation

Data normalization

Outlier detection

Data discretization

Which of the following is a technique used to identify and handle inconsistent data in a dataset?

Data sampling

Data transformation

Data cleaning

Data imputation

Which of the following is a technique used to reduce the number of dimensions in a dataset while retaining important information?

Data discretization

Data reduction

Data normalization

Data transformation

Which of the following is a technique used to handle missing values in a dataset by predicting the missing values using statistical models?

Data sampling

Data normalization

Outlier detection

Data imputation

Which of the following is a technique used to transform data into a form that is more suitable for machine learning algorithms?

Data normalization

Data discretization

Data integration

Data transformation

Which of the following is a technique used to reduce the number of dimensions in a dataset by transforming the data into a new space with fewer dimensions?

Principal Component Analysis (PCA)

Data normalization

Outlier detection

Data discretization

Which of the following is a technique used to convert categorical data into numerical data by creating a new binary variable for each category?

One-hot encoding

Data discretization

Data sampling

Data transformation

Data cleaning is the process of identifying and correcting or removing inaccurate or irrelevant data from a dataset.

True

False

Data integration is the process of converting data from one format to another.

True

False

Data transformation is the process of converting data from one format to another.

True

False

Data discretization is the process of converting continuous data into discrete intervals.

True

False

Outlier detection is the process of identifying data points that are significantly different from the rest of the data.

True

False

Data sampling is the process of selecting a subset of data from a larger dataset.

True

False

Data normalization is the process of converting data into a standard format.

True

False

Data imputation is the process of identifying and removing missing values from a dataset.

True

False

Feature selection is the process of identifying and removing irrelevant or redundant variables from a dataset.

True

False

Data smoothing is the process of identifying and removing noise from a dataset.

True

False

Data deduplication is the process of identifying and removing duplicate records from a dataset.

True

False

Data discretization is the process of converting categorical data into numerical data.

True

False

Principal Component Analysis (PCA) is a technique used in data reduction.

True

False

Data transformation is the process of scaling data to a specific range.

True

False

Data normalization and standardization are the same thing.

True

False

Data integration is the process of combining data from multiple sources into a single format.

True

False

Data imputation is the process of identifying and handling inconsistent data in a dataset.

True

False

Data reduction is the process of increasing the number of dimensions in a dataset.

True

False

...... Refers to a data repository that is maintained separately from an organization’s operational databases.

. Database

. Data warehouse

Data lake

None of them

The major features of a data warehouse are.............

. subject-oriented, integrated

time-variant, nonvolatile

Both a and b

None of them

Data warehouses usually require only two operations in data accessing, which are .....

. Initial loading of data

access of data

Both a and b

None of them

Systems that perform online transaction and query processing are called....

OLTP

OLAP

DBMS

None of them

systems that can organize and present data in various formats in order to accommodate the diverse needs of different users, are called.....

OLTP

. OLAP

DBMS

None of them

Data warehouses often adopt a ..... architecture.

. Four-tier

Two-tier

Three-tier

None of them

From bottom to top, the order of the three-tier architecture levels of a data warehouse is.....

Warehouse database server, front-end client layer and OLAP server

warehouse database server, OLAP server and front-end client layer

front-end client layer, OLAP server and warehouse database server

None of them

Examples of data Warehouse Models are....

Enterprise Warehouse

. Data Mart

Virtual Warehouse

All of them

.......contains a subset of corporate-wide data that is of value to a specific group of users.

Enterprise Warehouse

Data Mart

Virtual Warehouse

None of them

.... Data marts are sourced from data captured from external information providers but ..... Data marts are sourced directly from enterprise data warehouses.

Independent, dependent

Dependent, Independent

Dependent, Dependent

None of them

.....gathers data from multiple, heterogeneous, and external sources.

Data extraction

Data cleaning

Data transformation

. None of them

......detects errors in the data and rectifies them when possible.

Data extraction

Data cleaning

Data transformation

None of them

.....which converts data from legacy or host format to warehouse format.

Data extraction

Data cleaning

Data transformation

. None of them

.....are the perspectives or entities with respect to which an organization wants to keep records.

Facts

Dimensions

Table

None of them

.....are numeric measures or quantities by which we want to analyze relationships between dimensions.

Facts

Dimensions

Table

None of them

The most popular data model for a data warehouse is a multidimensional model, which can exist in the form of ......

Star schema

snowflake schema

fact constellation schema

All of them

A concept hierarchy that is a total or partial order among attributes in a database schema is called ....

Schema hierarchy

Partial hierarchy

Total hierarchy

None of them

Measures can be organized into categories, such as....

distributive

algebraic

. holistic

All of them

.... Is an OLAP operation that performs aggregation on a data cube, either by climbing up a concept hierarchy for a dimension or by dimension reduction.

. Roll up

Drill down

Slice and dice

Pivot

.... Is an OLAP operation that's the reverse of roll-up and It navigates from less detailed data to more detailed data.

Roll up

Drill down

Slice and dice

Pivot

.... Is an OLAP operation that performs a selection on one dimension of the given cube, resulting in a subcube.

Roll up

Drill down

Slice and dice

Pivot

.... Is an OLAP operation that is a visualization operation that rotates the data axes in view to provide an alternative data presentation.

Roll up

Drill down

Slice and dice

Pivot

The construction of data warehouses involves data cleaning, data integration, and data transformation.

True

False

Data warehouses provide online transaction processing (OLTP) tools for the interactive analysis of multidimensional data

True

False

Data mining functions can be integrated with OLAP operations to enhance interactive mining of knowledge at multiple levels of abstraction.

True

False

The major features of a data warehouse are subject-oriented, integrated, time-variant, and nonvolatile

True

False

data warehouse doesn't focus on the modeling and analysis of data for decision makers

True

False

A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and online transaction records.

True

False

In data warehouses, data are stored to provide information from an historic perspective (e.g., the past 5–10 years).

True

False

A data warehouse does require transaction processing, recovery, and concurrency control mechanisms.

True

False

Data warehousing employs an query-driven approach.

True

False

query processing in data warehouses does not interfere with the processing at local sources

True

False

An OLTP system is customer-oriented but an OLAP system is market-oriented.

True

False

An OLAP system manages current data that are too detailed but an OLTP system manages large amounts of historic data

True

False

An OLTP system usually adopts an application-oriented database design but an OLAP system adopts an subject-oriented database design.

True

False

The access patterns of an OLTP system consist mainly of short, atomic transactions but accesses to OLAP systems are mostly read-only operations.

True

False

An OLAP server is typically implemented using either a ROLAP model or a MOLAP model

True

False

A virtual warehouse is a set of views over operational databases.

True

False

Metadata are data about data.

True

False

Data warehouses and OLAP tools are based on a one-dimensional data model.

True

False

A data cube allows data to be modeled and viewed in multiple dimensions

True

False

The cuboid that holds the lowest level of summarization is called the base cuboid.

True

False

The 0-Dimension cuboid, which holds the highest level of summarization, is called the apex cuboid

True

False

A concept hierarchy defines a sequence of mappings from a set of low-level concepts to higher-level, more general concepts

True

False

Drill-through is an OLAP operation that executes queries involving more than one fact table.

True

False

Drill-across is an OLAP operation that uses relational SQL facilities to drill through the bottom level of a data cube down to its back-end relational tables.

True

False

A statistical database is a database system that is designed to support statistical applications.

True

False

A starnet model is a model that consists of radial lines emanating from a central point, where each line represents a concept hierarchy for a dimension.

True

False

Concept hierarchies can be used to generalize data or to specialize data.

True

False

{"name":"Data Mining", "url":"https://www.quiz-maker.com/QPREVIEW","txt":") Which of the following refers to the problem of finding abstracted patterns (or structures) in the unlabeled data?, Which one of the following refers to querying the unstructured textual data?, Which of the following is an essential process in which the intelligent methods are applied to extract data patterns","img":"https://www.quiz-maker.com/3012/CDN/90-4381567/screenshot-2023-06-06-192034.png?sz=1200"}