# Machine Learning — K-Nearest Neighbors(K-NN)

1. Definition

The K-Nearest Neighbors (K-NN) algorithm is one of the most fundamental and simple classification scheme which can be used for both classification and regression problems. However, the algorithm is mostly used for classification predictive problems in industry. ‘Feature similarity’ is used to predict the values of new data points which we wish to classify. The ‘feature similarity’ suggests that the new data point will be assigned a value which depends on the ‘closeness’ of it with the data points in the training set.

2. The Algorithm

Suppose now we have a dataset, we would like to implement K-NN algorithm…

# Statistics — Statistical Sampling and Sampling Distributions

Statistical Sampling

1. Population & Sample

As in this article we are going to talk about statistical sampling, the first thing we should know is the difference between a population and a sample. Suppose we want to investigate the average share of income do households in U.K. spend in restaurants. Our population in this case will be all households currently located in U.K., theoretically speaking, we could knock each door of the households and get the answer. However, in this way, it would be costly and time-consuming to implement and aggregate all the answers. …

# Statistics — Distributions of Discrete random variables

In this article, we would like to introduce three main kinds of distributions of discrete random variables, these are the Bernoulli distribution, the Binomial distribution and the Poisson distribution. We will give each definitions, examples and also the basic properties of the distributions.

1. Bernoulli Distribution

1.1. Definition

A Bernoulli distribution models a single trail of flipping a ‘fair’ coin in essence. It is the probability distribution of a random variable X which is defined on a random experiment that can have two outcomes, ‘0’ or ‘1’. The probability density function of X is shown below:

# Statistics — ANOVA

1. What is ANOVA?

Analysis of variance (ANOVA) is a statistical method to find out if the differences between the means of three or more groups are significant. The null hypothesis for ANOVA is that the population means of all groups are equal, while the alternative hypothesis is that at least one mean is significantly different from the rest (As shown below). In other words, ANOVA helps us to figure out whether we need to reject the null hypothesis.

The name ‘ANOVA’ suggests that we are actually analyzing variances between the groups. We would compute a measure of the variance between…

# Machine Learning — Complete Linear Regression

Linear regression is a method that describes dependence relationships, a linear model establishes the relationship between a dependent variable y (Target) and one or more independent variables denoted X (Inputs).

Let us follow one simple example to get an idea of linear regression, our sample data set is shown below, then we plot them.

The values in column ‘GrLivArea’ are independent variables because they are pre-determined, we can not change them by the other variables. The values in column ‘Price’ are dependent variables since we can predict them on changes in the independent variables. Through observation of the diagram, we…

# Finance — Investment Appraisal (Discounted Cash Flow Approach — NPV)

There are basically two discounted cash flow (DCF) methods of investment appraisal: the net present value (NPV) and the internal rate of return (IRR) methods. This article would take a look at the first one, the NPV investment appraisal method.

The NPV investment appraisal method is a straightforward approach that it works on a fundamental principle, whenever the money got out of the investment is equal to or greater than the money put in, the investment is regarded as worth undertaking. Thus, the decision rule of the NPV method would be accept all investments with a zero or a positive…

# Finance — Investment Appraisal (Traditional Methods)

Stephen Lumby(1981) defined an investment decision as one which made by the investors or the top management, involves the company in a cash outlay with the aim of receiving future cash inflows.

This article will take a look at two traditional methods of investment appraisal, Pay-back and Return on Capital Employed. Before we get to know the methods, we should keep in mind that all investment appraisal methods act as a decision guide, they could not tell investors or decision makers whether to invest or not. However, through investment appraisal, it will help us to communicate information with decision makers…

# Statistics — Hypothesis Testing

“A thesis is something that has been proven to be true. A hypothesis is something that has not yet been proven to be true.”

1. The null hypothesis

The procedure of hypothesis testing aims to determine whether the given hypothesis is true or not through statistical methods. The first step of hypothesis test is to set up the null hypothesis. A null hypothesis is an assertion about the value of a population parameter. It is considered to be true until we have sufficient statistical evidence to reject. For example, a delivery vendor claims that his company could deliver parcels, on the…

## Bu Yifan

Get the Medium app