Data Science Course

Data Science Course

Educational systems are under increasing pressure to reduce costs while maintaining or
improving outcomes for students. To improve educational productivity,1
In the United States, online learning alternatives are proliferating rapidly. Recent estimates
suggest that 1.5 million elementary and secondary students participated in some form of
online learning in 2010 (Wicks 2010). The term online learning can be used to refer to a

Data Science Course

wide range of programs that use the Internet to provide instructional materials and facilitate
interactions between teachers and students and in some cases among students as well.
Online learning can be fully online, with all instruction taking place through the Internet, or
online elements can be combined with face-to-face interactions in what is known as blended
learning (Horn and Staker 2010).
many school

Data Science Course
districts and states are turning to online learning.
The purpose of this report is to support educational administrators and policymakers in
becoming informed consumers of information about online learning and its potential impact
on educational productivity. The report provides foundational knowledge needed to examine
and understand the potential contributions of online learning to educational productivity,
including a conceptual framework for understanding the necessary components of rigorous
productivity analyses, drawing in particular on cost-effectiveness analysis as an accessible
method in education. Five requirements for rigorous cost-effectiveness studies are described:
1) Important design components of an intervention are specified;
2) Both costs and outcomes are measured;

Data Science Course

1 As defined in this report, productivity is a ratio between costs and outcomes that can be improved in one of three ways: by
reducing costs while maintaining outcomes, improving outcomes while maintaining costs or transforming processes in a
way that both reduces costs and improves outcomes. Any improvements in productivity are likely to require initial
investments, but successful efforts reduce costs over the long term, even after these initial investments are taken into
account.
vi
3) At least two conditions are compared;
4) Costs and outcomes are related using a single ratio for each model under study;
5) Other factors not related to the conditions being studied are controlled or held
constant.
The report also includes a review of ways that online learning might offer productivity
benefits compared with traditional place-based schooling. Unfortunately, a review of the
available research that examined the impact of online learning on educational productivity
for secondary school students was found to be lacking. No analyses were found that
rigorously measured the productivity of an online learning system relative to place-based

Data Science Course

instruction in secondary schools.2
Given the limitations of the research regarding the costs and effects of online instruction for
secondary students, the review that follows also draws on examples and research about the
use of online learning for postsecondary instruction. While there are many differences
between higher education and elementary and secondary education (e.g., age and maturity of
students), postsecondary institutions have a broader and longer history with online learning
than elementary and secondary schools. The intention is to use the literature from higher
education to illustrate concepts that may apply to emerging practices in elementary and
secondary education. Findings from the studies of higher education should be applied with
caution to secondary education, as student populations, learning contexts and financial
models are quite different across these levels of schooling.
This lack of evidence supports the call of the National
Educational Technology Plan (U.S. Department of Education 2010a) for a national initiative
to develop an ongoing research agenda dedicated to improving productivity in the education

Data Science Course

sector. The evidence summarized in this report draws on literature that addressed either costs
or effectiveness. These studies typically were limited because they did not bring the two
together in a productivity ratio and compare results with other alternatives.
While rigorously researched models are lacking, the review of the available literature
suggested nine applications of online learning that are seen as possible pathways to
improved productivity:

2 Two research reports—an audit for the Wisconsin State Legislature (Stuiber et al. 2010) and a study of the Florida Virtual
School (Florida Tax Watch Center for Educational Performance and Accountability 2007)—include data about costs and
effects. These reports suggest that online learning environments may hold significant potential for increasing educational
productivity. Both found that online learning environments produced better outcomes than face-to-face schools and at a
lower per-pupil cost than the state average. However, these conclusions must be viewed cautiously because both reports
lacked statistical controls that could have ruled out other explanations of the findings.
vii

Data Science Course

1) Broadening access in ways that dramatically reduce the cost of providing access to
quality educational resources and experiences, particularly for students in remote
locations or other situations where challenges such as low student enrollments make
the traditional school model impractical;
2) Engaging students in active learning with instructional materials and access to a
wealth of resources that can facilitate the adoption of research-based principles and
best practices from the learning sciences, an application that might improve student
outcomes without substantially increasing costs;
3) Individualizing and differentiating instruction based on student performance on
diagnostic assessments and preferred pace of learning, thereby improving the
efficiency with which students move through a learning progression;
4) Personalizing learning by building on student interests, which can result in
increased student motivation, time on task and ultimately better learning outcomes;
5) Making better use of teacher and student time by automating routine tasks and
enabling teacher time to focus on high-value activities;
6) Increasing the rate of student learning by increasing motivation and helping
students grasp concepts and demonstrate competency more efficiently;
7) Reducing school-based facilities costs by leveraging home and community spaces
in addition to traditional school buildings;
8) Reducing salary costs by transferring some educational activities to computers, by
increasing teacher-student ratios or by otherwise redesigning processes that allow for
more effective use of teacher time; and
9) Realizing opportunities for economies of scale through reuse of materials and their
large-scale distribution.
It is important to note that these pathways are not mutually exclusive, and interventions
intended to increase productivity usually involve multiple strategies to impact both the
benefit side (pathways 1–4) and cost side (pathways 5–9).
Determining whether online learning is more or less cost-effective than other alternatives
does not lend itself to a simple yes or no answer. Each of the nine pathways suggests a
viii
plausible strategy for improving educational productivity, but there is insufficient evidence
to draw any conclusions about their viability in secondary schools. Educational stakeholders
at every level need information regarding effective instructional strategies and methods for
improving educational productivity. Studies designed to inform educational decisions should
follow rigorous methodologies that account for a full range of costs, describe key
implementation characteristics and use valid estimates of student learning.
Even less is known about the impact of online learning for students with disabilities.
Regarding potential benefits, the promise of individualized and personalized instruction
suggests an ability to tailor instruction to meet the needs of students with disabilities. For
example, rich multimedia can be found on the Internet that would seem to offer ready
inspiration for meeting the unique needs of the blind or the hearing impaired. In fact,
standards for universal design are available both for the Web and for printed documents. In
addition, tutorial models that rely on independent study are well suited to students with
medical or other disabilities that prevent them from attending brick-and-mortar schools.
However, while online learning offerings should be made accessible to students with
disabilities, doing so is not necessarily cheap or easy.
Any requirement to use a technology, including an online learning program, that is
inaccessible to individuals with disabilities is considered discrimination and is prohibited by
the Americans with Disabilities Act of 1990 and Section 504 of the Rehabilitation Act of
1973, unless those individuals are provided accommodations or modifications that permit
them to receive all the educational benefits provided by the technology in an equally
effective and equally integrated manner. The degree to which programs make such
accommodations is not yet known. To address this need, the U.S. Department of Education
recently funded the Center on Online Learning and Students With Disabilities, a five-year
research effort to identify new methods for using technology to improve learning. Similarly,
research regarding the degree to which current online learning environments meet the needs
of English language learners and how technology might provide a cost-effective alternative
to traditional strategies is just emerging.
The realization of productivity improvements in education will most likely require a
transformation of conventional processes to leverage new capabilities supported by
information and communications technologies. Basic assumptions about the need for seat
time and age-based cohorts may need to be reevaluated to sharpen focus on the needs and
interests of all students as individuals. And as a rigorous evidence accumulates around
effective practices that may require institutional change, systemic incentives may be needed
to spur the adoption of efficient, effective paths to learning.

Implications for Online Learning 1. Strategies should be used to allow learners to perceive and attend to the information so that it can be transferred to working memory. Learners use their sensory systems to register the information in the form of sensations. Strategies to facilitate maximum sensation should be used. Examples include the proper location of the information on the screen, the attributes of the screen (color, graphics, size of text, etc.), the pacing of the information, and the mode of delivery (audio, visuals, animations, video). Learners must receive the information in the form of sensations before perception and processing can occur; however, they must not be overloaded with sensations, which could be counterproductive to the learning process. Non-essential sensations should be avoided to allow learners to attend to the important information. Strategies to promote perception and attention for online learning include those listed below. 10 Theory and Practice of Online Learning • Important information should be placed in the center of the screen for reading, and learners must be able to read from left to right. • Information critical for learning should be highlighted to focus learners’ attention. For example, in an online lesson, headings should be used to organize the details, and formatted to allow learners to attend to and process the information they contain. • Learners should be told why they should take the lesson, so that they can attend to the information throughout the lesson. • The difficulty level of the material must match the cognitive level of the learner, so that the learner can both attend to and relate to the material. Links to both simpler and more complicated materials can be used to accommodate learners at different knowledge levels. 2. Strategies should be used to allow learners to retrieve existing information from long-term memory to help make sense of the new information. Learners must construct a memory link between the new information and some related information already stored in long-term memory. Strategies to facilitate the use of existing schema are listed below. • Use advance organizers to activate an existing cognitive structure or to provide the information to incorporate the details of the lesson (Ausubel, 1960). A comparative advance organizer can be used to recall prior knowledge to help in processing, and an expository advance organizer can be used to help incorporate the details of the lesson (Ally, 1980). Mayer (1979) conducted a meta-analysis of advance organizer studies, and found that these strategies are effective when students are learning from text that is presented in an unfamiliar form. Since most courses contain materials that are new to learners, advance organizers should be used to provide the framework for learning. • Provide conceptual models that learners can use to retrieve existing mental models or to store the structure they will need to use to learn the details of the lesson. • Use pre-instructional questions to set expectations and to activate the learners’ existing knowledge structure. Questions presented before the lesson facilitate the recall of existing

TensorFlow

Google’s machine intelligence framework is the new hotness right now. And when TensorFlow became installable on the Raspberry Pi, working with it became very easy to do. In a short time I made a neural network that counts in binary. So I thought I’d pass on what I’ve learned so far. Hopefully this makes it easier for anyone else who wants to try it, or for anyone who just wants some insight into neural networks.

What Is TensorFlow?

To quote the TensorFlow website, TensorFlow is an “open source software library for numerical computation using data flow graphs”. What do we mean by “data flow graphs”? Well, that’s the really cool part. But before we can answer that, we’ll need to talk a bit about the structure for a simple neural network.
Binary counter neural network
Binary counter neural network
Basics of a Neural Network

A simple neural network has some input units where the input goes. It also has hidden units, so-called because from a user’s perspective they’re literally hidden. And there are output units, from which we get the results. Off to the side are also bias units, which are there to help control the values emitted from the hidden and output units. Connecting all of these units are a bunch of weights, which are just numbers, each of which is associated with two units.

The way we instill intelligence into this neural network is to assign values to all those weights. That’s what training a neural network does, find suitable values for those weights. Once trained, in our example, we’ll set the input units to the binary digits 0, 0, and 0 respectively, TensorFlow will do stuff with everything in between, and the output units will magically contain the binary digits 0, 0, and 1 respectively. In case you missed that, it knew that the next number after binary 000 was 001. For 001, it should spit out 010, and so on up to 111, wherein it’ll spit out 000. Once those weights are set appropriately, it’ll know how to count.
Binary counter neural network with matrices
Binary counter neural network with matrices

One step in “running” the neural network is to multiply the value of each weight by the value of its input unit, and then to store the result in the associated hidden unit.

We can redraw the units and weights as arrays, or what are called lists in Python. From a math standpoint, they’re matrices. We’ve redrawn only a portion of them in the diagram. Multiplying the input matrix with the weight matrix involves simple matrix multiplication resulting in the five element hidden matrix/list/array.
From Matrices to Tensors

In TensorFlow, those lists are called tensors. And the matrix multiplication step is called an operation, or op in programmer-speak, a term you’ll have to get used to if you plan on reading the TensorFlow documentation. Taking it further, the whole neural network is a collection of tensors and the ops that operate on them. Altogether they make up a graph.
Binary counter’s full graph
layer1 expanded

Shown here are snapshots taken of TensorBoard, a tool for visualizing the graph as well as examining tensor values during and after training. The tensors are the lines, and written on the lines are the tensor’s dimensions. Connecting the tensors are all the ops, though some of the things you see can be double-clicked on in order to expand for more detail, as we’ve done for layer1 in the second snapshot.

At the very bottom is x, the name we’ve given for a placeholder op that allows us to provide values for the input tensor. The line going up and to the left from it is the input tensor. Continue following that line up and you’ll find the MatMul op, which does the matrix multiplication with that input tensor and the tensor which is the other line leading into the MatMul op. That tensor represents the weights.

All this was just to give you a feel for what a graph and its tensors and ops are, giving you a better idea of what we mean by TensorFlow being a “software library for numerical computation using data flow graphs”. But why we would want to create these graphs?
Why Create Graphs?

The API that’s currently stable is one for Python, an interpreted language. Neural networks are compute intensive and a large one could have thousands or even millions of weights. Computing by interpreting every step would take forever.

So we instead create a graph made up of tensors and ops, describing the layout of the neural network, all mathematical operations, and even initial values for variables. Only after we’ve created this graph do we then pass it to what TensorFlow calls a session. This is known as deferred execution. The session runs the graph using very efficient code. Not only that, but many of the operations, such as matrix multiplication, are ones that can be done on a supported GPU (Graphics Processing Unit) and the session will do that for you. Also, TensorFlow is built to be able to distribute the processing across multiple machines and/or GPUs. Giving it the complete graph allows it to do that.
Creating The Binary Counter Graph

And here’s the code for our binary counter neural network. You can find the full source code on this GitHub page. Note that there’s additional code in it for saving information for use with TensorBoard.

We’ll start with the code for creating the graph of tensors and ops.

import tensorflow as tf
sess = tf.InteractiveSession()

NUM_INPUTS = 3
NUM_HIDDEN = 5
NUM_OUTPUTS = 3

We first import the tensorflow module, create a session for use later, and, to make our code more understandable, we create a few variables containing the number of units in our network.

x = tf.placeholder(tf.float32, shape=[None, NUM_INPUTS], name=’x’)
y_ = tf.placeholder(tf.float32, shape=[None, NUM_OUTPUTS], name=’y_’)

Then we create placeholders for our input and output units. A placeholder is a TensorFlow op for things that we’ll provide values for later. x and y_ are now tensors in a new graph and each has a placeholder op associated with it.

You might wonder why we define the shapes as [None, NUM_INPUTS] and [None, NUM_OUTPUTS], two dimensional lists, and why None for the first dimension? In the overview of neural networks above it looks like we’ll give it one input at a time and train it to produce a given output. It’s more efficient though, if we give it multiple input/output pairs at a time, what’s called a batch. The first dimension is for the number of input/output pairs in each batch. We won’t know how many are in a batch until we actually give one later. And in fact, we’re using the same graph for training, testing, and for actual usage so the batch size won’t always be the same. So we use the Python placeholder object None for the size of the first dimension for now.

W_fc1 = tf.truncated_normal([NUM_INPUTS, NUM_HIDDEN], mean=0.5, stddev=0.707)
W_fc1 = tf.Variable(W_fc1, name=’W_fc1′)

b_fc1 = tf.truncated_normal([NUM_HIDDEN], mean=0.5, stddev=0.707)
b_fc1 = tf.Variable(b_fc1, name=’b_fc1′)

h_fc1 = tf.nn.relu(tf.matmul(x, W_fc1) + b_fc1)

That’s followed by creating layer one of the neural network graph: the weights W_fc1, the biases b_fc1, and the hidden units h_fc1. The “fc” is a convention meaning “fully connected”, since the weights connect every input unit to every hidden unit.

tf.truncated_normal results in a number of ops and tensors which will later assign normalized, random numbers to all the weights.

The Variable ops are given a value to do initialization with, random numbers in this case, and keep their data across multiple runs. They’re also handy for saving the neural network to a file, something you’ll want to do once it’s trained.

You can see where we’ll be doing the matrix multiplication using the matmul op. We also insert an add op which will add on the bias weights. The relu op performs what we call an activation function. The matrix multiplication and the addition are linear operations. There’s a very limited number of things a neural network can learn using just linear operations. The activation function provides some non-linearity. In the case of the relu activation function, it sets any values that are less than zero to zero, and all other values are left unchanged. Believe it or not, doing that opens up a whole other world of things that can be learned.

W_fc2 = tf.truncated_normal([NUM_HIDDEN, NUM_OUTPUTS], mean=0.5, stddev=0.707)
W_fc2 = tf.Variable(W_fc2, name=’W_fc2′)

b_fc2 = tf.truncated_normal([NUM_OUTPUTS], mean=0.5, stddev=0.707)
b_fc2 = tf.Variable(b_fc2, name=’b_fc2′)

y = tf.matmul(h_fc1, W_fc2) + b_fc2

The weights and biases for layer two are set up the same as for layer one but the output layer is different. We again will do a matrix multiplication, this time multiplying the weights and the hidden units, and then adding the bias weights. We’ve left the activation function for the next bit of code.

results = tf.sigmoid(y, name=’results’)

cross_entropy = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(logits=y, labels=y_))

Sigmoid is another activation function, like the relu we encountered above, there to provide non-linearity. I used sigmoid here partly because the sigmoid equation results in values between 0 and 1, ideal for our binary counter example. I also used it because it’s good for outputs where more than one output unit can have a large value. In our case, to represent the binary number 111, all the output units can have large values. When doing image classification we’d want something quite different, we’d want just one output unit to fire with a large value. For example, we’d want the output unit representing giraffes to have a large value if an image contains a giraffe. Something like softmax would be a good choice for image classification.

On close inspection, it looks like there’s some duplication. We seem to be inserting sigmoid twice. We’re actually creating two different, parallel outputs here. The cross_entropy tensor will be used during training of the neutral network. The results tensor will be used when we run our trained neural network later for whatever purpose it’s created, for fun in our case. I don’t know if this is the best way of doing this, but it’s the way I came up with.

train_step = tf.train.RMSPropOptimizer(0.25, momentum=0.5).minimize(cross_entropy)

The last piece we add to our graph is the training. This is the op or ops that will adjust all the weights based on training data. Remember, we’re still just creating a graph here. The actual training will happen later when we run the graph.

There are a few optimizers to chose from. I chose tf.train.RMSPropOptimizer because, like the sigmoid, it works well for cases where all output values can be large. For classifying things as when doing image classification, tf.train.GradientDescentOptimizer might be better.
Training And Using The Binary Counter

Having created the graph, it’s time to do the training. Once it’s trained, we can then use it.

inputvals = [[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1],
[1, 1, 0], [1, 1, 1]]
targetvals = [[0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0],
[1, 1, 1], [0, 0, 0]]

First, we have some training data: inputvals and targetvals. inputvals contains the inputs, and for each one there’s a corresponding targetvals target value. For inputvals[0] we have [0, 0, 0], and the expected output is targetvals[0], which is [0, 0, 1], and so on.

if do_training == 1:
sess.run(tf.global_variables_initializer())

for i in range(10001):
if i%100 == 0:
train_error = cross_entropy.eval(feed_dict={x: inputvals, y_:targetvals})
print(“step %d, training error %g”%(i, train_error))
if train_error < 0.0005:
break

sess.run(train_step, feed_dict={x: inputvals, y_: targetvals})

if save_trained == 1:
print(“Saving neural network to %s.*”%(save_file))
saver = tf.train.Saver()
saver.save(sess, save_file)

do_training and save_trained can be hardcoded, and changed for each use, or can be set using command line arguments.

We first go through all those Variable ops and have them initialize their tensors.

Then, for up to 10001 times we run the graph from the bottom up to the train_step tensor, the last thing we added to our graph. We pass inputvals and targetvals to train_step‘s op or ops, which we’d added using RMSPropOptimizer. This is the step that adjusts all the weights such that the given inputs will result in something close to the corresponding target outputs. If the error between target outputs and actual outputs gets small enough sooner, then we break out of the loop.

If you have thousands of input/output pairs then you could give it a subset of them at a time, the batch we spoke of earlier. But here we have only eight, and so we give all of them each time.

If we want to, we can also save the network to a file. Once it’s trained well, we don’t need to train it again.

else: # if we’re not training then we must be loading from file

print(“Loading neural network from %s”%(save_file))
saver = tf.train.Saver()
saver.restore(sess, save_file)
# Note: the restore both loads and initializes the variables

If we’re not training it then we instead load the trained network from a file. The file contains only the values for the tensors that have Variable ops. It doesn’t contain the structure of the graph. So even when running an already trained graph, we still need the code to create the graph. There is a way to save and load graphs from files using MetaGraphs but we’re not doing that here.

print(‘\nCounting starting with: 0 0 0’)
res = sess.run(results, feed_dict={x: [[0, 0, 0]]})
print(‘%g %g %g’%(res[0][0], res[0][1], res[0][2]))
for i in range(8):
res = sess.run(results, feed_dict={x: res})
print(‘%g %g %g’%(res[0][0], res[0][1], res[0][2]))

In either case we try it out. Notice that we’re running it from the bottom of the graph up to the results tensor we’d talked about above, the duplicate output we’d created especially for when making use of the trained network.

We give it 000, and hope that it returns something close to 001. We pass what was returned, back in and run it again. Altogether we run it 9 times, enough times to count from 000 to 111 and then back to 000 again.
Running the binary counter
Running the binary counter

Here’s the output during successful training and subsequent counting. Notice that it trained within 200 steps through the loop. Very occasionally it does all 10001 steps without reducing the training error sufficiently, but once you’ve trained it successfully and saved it, that doesn’t matter.
The Next Step

As we said, the code for the binary counter neural network is on our github page. You can start with that, start from scratch, or use any of the many tutorials on the TensorFlow website. Getting it to do something with hardware is definitely my next step, taking inspiration from this robot that [Lukas Biewald] made recognize objects around his workshop.

What are you using, or planning to use TensorFlow for? Let us know in the comments below and maybe we’ll give it a try in a future article!
Posted in Featured, Skills, slider, software hacks

What is Deep Learning?

Deep Learning (also known as deep structured learning, hierarchical learning or deep machine learning) is a class of machine learning algorithms that: use a cascade of many layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. The algorithms may be supervised or unsupervised and applications include pattern analysis (unsupervised) and classification (supervised). These algorithms are based on the (unsupervised) learning of multiple levels of features or representations of the data. Higher level features are derived from lower level features to form a hierarchical representation. They are part of the broader machine learning field of learning representations of data. They learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts. In a simple case, there might be two sets of neurons: one set that receives an input signal and one that sends an output signal. When the input layer receives an input it passes on a modified version of the input to the next layer

What is Deep Belief Network?

Deep Belief Nets are probabilistic generative models that are composed of multiple layers of stochastic, latent variables. The latent variables typically have binary values and are often called hidden units or feature detectors. The top two layers have undirected, symmetric connections between them and form an associative memory. The lower layers receive top-down, directed connections from the layer above. The states of the units in the lowest layer represent a data vector.

The two most significant properties of deep belief nets are:

There is an efficient, layer-by-layer procedure for learning the top-down, generative weights that determine how the variables in one layer depend on the variables in the layer above.
After learning, the values of the latent variables in every layer can be inferred by a single, bottom-up pass that starts with an observed data vector in the bottom layer and uses the generative weights in the reverse direction.
Deep belief nets are learned one layer at a time by treating the values of the latent variables in one layer, when they are being inferred from data, as the data for training the next layer. This efficient, greedy learning can be followed by, or combined with, other learning procedures that fine-tune all of the weights to improve the generative or discriminative performance of the whole network.

Discriminative fine-tuning can be performed by adding a final layer of variables that represent the desired outputs and backpropagating error derivatives. When networks with many hidden layers are applied to highly-structured input data, such as images, backpropagation works much better if the feature detectors in the hidden layers are initialized by learning a deep belief net that models the structure in the input data (Hinton & Salakhutdinov, 2006).

What is Decision Tree?

Decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning. Each internal node of a decision tree represents a “test” on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from the root to leaf represent classification rules. In decision analysis, a decision tree and the closely related influence diagram are used as a visual and analytical decision support tool, where the expected values of competing alternatives are calculated. Decision trees are commonly used in operations research and operations management.

What is Data Mining?

Data Mining is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. It is an interdisciplinary subfield of computer science. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. The term is frequently applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence, machine learning, and business intelligence. The actual data mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern mining).

What is Cross-Validation?

Cross-Validation is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is a prediction, and one wants to estimate how accurately a predictive model will perform in practice. The idea is to define a dataset to “test” the model in the training phase (i.e., the validation dataset), in order to limit problems like overfitting, give an insight on how the model will generalize to an independent dataset. One round of cross-validation involves a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds. One of the main reasons for using cross-validation instead of using the conventional is that there is not enough data available to partition it into separate training and test sets without losing significant modeling or testing capability.

What is Correlation?

Correlation is a statistical measure that can show whether and how strongly pairs of variables are related. For example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn’t perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter one is heavier than the taller one. Nonetheless, the average weight of people 5’5” is less than the average weight of people 5’6”, and their average weight is less than that of people 5’7”, etc. Correlation does not imply causation, rather it implies an association between two variables. The strength of a correlation can be indicated by the correlation coefficient.

The correlation coefficient is a measure of how closely two variables move in relation to one another. If one variable goes up by a certain amount, the correlation coefficient indicates which way the other variable moves and by how much. A correlation coefficient is in the range of -1 to zero to +1. When two variables, X and Y, have a correlation coefficient approaching -1, if variable X goes up by one unit, variable Y will go down by one unit. If their correlation coefficient approaches +1, and X goes up by one unit, Y will also go up one unit. A correlation coefficient of zero means movement of the X and Y variables is unrelated.

What is Convolutional Neural Network (CNN)?

Convolutional Neural Network (CNN) is made up of neurons that have learnable weights and biases. CNN are a category of Neural Networks that have proven very effective in areas such as image recognition and classification. ConvNets have been successful in identifying faces, objects and traffic signs apart from powering vision in robots and self-driving cars. CNNs transform the original image layer by layer from the original pixel values to the final class scores.

What is collaborative filtering?

Collaborative Filtering (CF) is a technique used by recommender systems. Collaborative filtering has two senses, a narrow one, and a more general one. Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B’s opinion on a different issue than that of a randomly chosen person. This differs from the simpler approach of giving an average (non-specific) score for each item of interest. In the more general sense, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets. Collaborative filtering methods have been applied to many different kinds of data including: sensing and monitoring data, such as in mineral exploration, environmental sensing over large areas or multiple sensors; financial data, such as financial service institutions that integrate many financial sources; or in electronic commerce and web applications where the focus is on user data, etc.