The post Travel Industry Must Outgrow Its Past to Thrive in an AI World appeared first on Levi Brackman's Website.

]]>The airline industry was at the forefront of many of the significant innovations of the last century. Besides for all the advances related directly to aviation, airlines were also pioneering in developing computer systems that could be accessed around the globe to book and reserve airline tickets. Airlines also established revenue management systems to optimize revenue from ticket sales. Inventions by the airlines in this area have informed revenue management models for multiple other industries as well, most notably for hotels and car rental companies.

Yet, in my view the current state of the art as it relates to pricing and revenue management in the airline industry can best be described as an evolution rather than a revolution. Here is what I mean. In the not too distant past when access to massive amounts of data and compute was prohibitively expensive, scientists who wanted to optimize a process would hypothesize about the mathematical makeup of the problem. Based on that postulation they would build a mathematical model that represented the problem to be solved. They then validated the mathematical model by checking if it properly explained the underlying real world (or — in absence of real world data — simulated) data.

In the present world where we have easy access to massive amounts of compute and big data there is an alternative approach to this process. When we want to predict a certain outcome, instead of hypothesizing what a mathematical representation of the problem would be, we hypothesize what the underlying causes of the predicted outcome is. We then use data to represent those causes. This is done using data transformations, modeling variables and by joining multiple dataset together. Data scientists call this process Feature Engineering. We then choose machine learning algorithms that can build us a solid predictive model. We split the data into at least three parts, one to train the model, one to validate the model and one to test it. The optimal result is a model that can predict an event or a state on new data that was not used in the model building process.

Key to the difference between these approaches is that in the latter there is no need to for the human to actually build the mathematical model rather the algorithm builds it by finding patterns in the massive amounts of data it trains on. Another key difference is that while the former starts with a hypothesis with regards the underlying distribution of the problem from a mathematical perspective the latter starts with a hypothesis of what features cause the predicted outcome and what algorithm might build the optimal model — finding the underlying distribution and patterns is the work that the machine learning algorithm figures out.

It must be pointed out that there are also many similarities to these two approaches, for example both approaches will use classical Exploratory Data Analysis and data visualization techniques and many of the underlying statistical and scientific methods are similar if not identical. In addition, often data scientists will use mathematical modeling in the feature engineering stage of the model building process.

In my perusal of the airline revenue management literature, however, it is clear that the main focus of finding solutions to airline revenue problems such as dynamic pricing and bundling uses the former older approach. While companies such as Uber, Lyft and Aibnb are now leapfrogging the airline industry in terms of innovation in this area because they are using the more contemporary Machine Learning and AI approach.

This is understandable given that the practical use of Machine Learning is relatively new and airline revenue management professionals mostly were doing their work when the older methods were all we had. And people are often slow in catching up to the fast-changing technological landscape. Furthermore, the airline industry is saddled with formats and streamlined processes that are hard to change quickly. Despite all of this, if current airline players are to thrive they must outgrow the past and move into the machine learning and AI era.

The post Travel Industry Must Outgrow Its Past to Thrive in an AI World appeared first on Levi Brackman's Website.

]]>The post AI and Travel Distribution appeared first on Levi Brackman's Website.

]]>We live in a world of platforms. Facebook, Airbnb, Uber and Twitter are all platforms. In essence platforms are intermediaries. They are the modern-day equivalent of middlemen who facilitate transactions or communication between two entities. To understand how middlemen evolved into platforms and how platforms thrive in the 21st century we must very briefly review postmodernism.

**Derrida and Postmodernism**

The famous post-modern philosopher Jacques Derrida (1930-2004) is known for the concept known as deconstructionism and de-centering. Whilst his ideas related to textual criticism and philosophy they can also be extended to middlemen and platforms. The idea of de-centering is that what was once the center is no longer the center and the periphery becomes the new center. This has turned out to be an accurate description of where we currently are in the West. Let me explain and I’ll use the travel industry as an example.

**Deconstruction/De-centering in the Travel Industry**

There was a time where, if one wanted to go on a holiday one would need to use a central service such as a travel agency to buy and book travel. Hotels were owned and controlled by large corporations and bookings needed to be done through a travel agent. In this sense the hotelier, car-rental company, airline, Global Distribution System (GDS) etc. represented the center. The traveler was the periphery. The barrier to enter the center was high, most individuals were unable to unless they got a job working for a travel agency, GDS or airline.

What we now see is the deconstruction or de-centering of the travel industry. Here are a few examples. Through Airbnb people can book rooms in private homes directly with the owners of those homes. With companies such as Turo we have peer to peer car rentals. So, hotels and car rentals are being deconstructed or de-centered by companies such as Airbnb and Turo.

This is of course a trend that is impacting not only hotels and cars but will impact the rest of travel as well (including the airlines). This is not a new phenomenon, Derrida saw this trend starting and wrote about it in the 1970s and 1980s. Every aspect of business will be impacted, Blockchain and cryptocurrencies are just one manifestation of the trend of de-centering or deconstruction.

**Platforms rather than Middlemen**

So how is it possible that Airbnb, Twitter and Facebook are able thrive as a center of power in a culture and society the is de-centering and deconstructing and where the periphery becomes the new center? The answer is that they act as a platform rather than as a middleman. The difference is subtle but important.

A middleman is someone who mediates a transaction. When someone works through the middleman both parties don’t want to deal directly with each other, they care about doing the deal. An old-fashioned travel agent, for example, acted as a mediator because the traveler and the airline did not care to deal directly with each other. In many ways the GDS was always the platform that facilitated between the airlines and the travel agent.

Facebook, Twitter, Google and Airbnb are really good at being platforms that facilitate. To the extent that they must mediate they make that mediation invisible. They facilitate the direct interaction between buyer and seller or between communicator and communicatee. This allows the deconstruction and de-centering of the middleman. Observe that the middleman is not obviated, rather the role the middleman plays and how they play it changes.

In summary, in a pre-modern and modern world the middleman acted as a mediator in a transaction. In a postmodern world the middleman acts as an almost invisible facilitator. This is the definition of a platform. People use a platform to interact directly with friends (Facebook), drivers (Uber), hosts (Airbnb, Turo) without having to actively go through the platform. Instead they are on the platform.

**Personalization Platforms Powered by AI**

But how do these companies make themselves facilitators rather the mediators? The answer is two letters: AI.

In the case of Facebook, AI is used to make sure that your newsfeed is full of personalized content that you are interested in — no two people have the exact same experience on Facebook (or Amazon). Airbnb uses AI to ensure that you are shown properties that you are most likely to be interested in booking both in terms of location and type.

In other words, AI driven personalization and recommendation allow you to easily and quickly get closer to what you are uniquely interested in. An AI powered platform allows you to effortlessly interact and transact and gives you exactly what you want the moment you want it and in best case scenarios before you know you want it. A platform deconstructs and de-centers the middleman without becoming obsolete. AI is absolutely integral and fundamental to allowing this.

This very concept of personalization and recommendation facilitated by AI is a manifestation of deconstruction and de-centering. When the corporation was the center and the client or customer the periphery there was a relationship where few products were consumed by a myriad of customers. Now that the customer is the center and the corporation is the periphery, companies that want to play, have to cater to, and have a relationship with, each customer uniquely. One, two or three sizes no longer fit all. There has to be a unique offering for each individual customer. This can only be facilitated by AI. Companies that do that well, thrive.

**NDC a manifestation of Deconstruction**

So how does IATA’s New Distribution Capability (NDC) where airlines host their own content and prices fit into all of this? If airlines are to continue to exist, never mind thrive, they need to develop individualized offerings and personalization. To do that, they need to own their customers and de-commoditize themselves. This is why airlines are pushing NDC and direct channels — it is an existential need for them.

So where does this leave the GDS as middleman?

In the old GDS model that travel agent dealt with the GDS, not the airline, and the GDS mediated between the airline and the travel agent. This model divorced the customer from the airline and turned the airplane seat into a commodity. Now the GDS needs to pivot to become a platform that facilitates rather than one that mediates.

What does this mean is practice? There are two side to this: 1. the end customer, 2. the suppliers. They both have needs that are entirely compatible with each other. Suppliers need to deconstruct, de-center and de-commoditize. Customers are the new center and are demanding personalization and recommendations.

But who is going to facilitate this?

**GDS as AI Powered Platform**

A pivoted GDS as an AI powered platform that facilitates rather than mediates is very well positioned to become this new invisible facilitator.

The GDS is well positioned for this role for many reasons. First, it is a natural evolution. The GDS was the original intelligent mediator. It used old-fashioned rules-based intelligence to create personalized offerings for customers.

Second, there are risks for individual suppliers to do this on their own. Without being on a common platform the supplier(s) most successful at personalization will end up dominating. But if they are all on a common platform they all have a chance of reaching an end customer. And in a world where customers are demanding real personalization there is room for all suppliers to have differentiated offerings that will be personalized to meet the need of a customer.

One vision therefore is for the GDS to become an Airbnb style platform where suppliers have full control of their content and pricing all tailored to different end customers and the new GDS becomes the invisible AI powered platform that facilitates the transaction with the end customer. In that brave new world the new GDS still offers pricing but instead ATPco style pricing it create dynamic prices based on the airlines revenue management needs, real-time demand and bundled merchandizing and GDS revenue is still generated based on segments.

The post AI and Travel Distribution appeared first on Levi Brackman's Website.

]]>The post Function in R for Word and Line Count Table appeared first on Levi Brackman's Website.

]]>Here I present a new function I created to find the count of lines and words in a text document and return them in the form of a table. It uses the ** wc** “qdap” package in R as well as base R functions

**The Problem:**

How to find both the number of lines and the amount of words in a potentially large document using R and return it as a table”

**The solution:**

First install and load qdap package

install.packages("qdap");library(qdap)

**Load text document**

doc = readLines("doc.txt", ok = TRUE)

**Read “WordsLines” in Function**

WordsLines = function(dataframe, names1, names2){ Words = as.data.frame(dataframe) #since the dataframe is in text format put it into a dataframe Wc = wc(Words[,1]) #get the word count of each input (all rows) of the first column Words1 = as.data.frame(Wc) #put that word count into a dataframe Words1$Wc = as.numeric(Words1$Wc) #make sure it is numeric names(Words1)[1] = paste("Words") #change the column name to "Words" Words1 = sum(Words1, na.rm = T) #Sum all the word counts of the entire column Lines = nrow(Words) #find the number of words in the entire dataframe final = cbind(Lines, Words1) #combine the line count and wort count into one table colnames(final) = c(names1, names2) #change the names of the columns to fit the particular dataset final #return the table }

**Call function**

WordsLines(doc, "Doc Lines", "Doc Words")

**Should return something like this:**

Doc Lines Doc Words [1,] 1010242 33482314

The post Function in R for Word and Line Count Table appeared first on Levi Brackman's Website.

]]>The post Goodness of Fit Measures Table APA for Factor Analysis appeared first on Levi Brackman's Website.

]]>This post is for social science researchers and research psychologists who are doing factor analysis and want to create tables with fit measures in R. If you do not fit that very narrow audience you might not find this post interesting.

**The Problem:**

How to take the fit measures of multiple models and place them into a table (APA style) that can be put directly into a paper.

**The solution:**

First remove scientific notation from outputs (this is a personal preference of mine).

options(scipen=999)

**For Psyc package**, note we have manually calculated the CFI in this piece of code because the psyc package does not have the CFI.

First compute the CFI (if you want that measure):

CFImodel = 1 - ( ( model$STATISTIC - model$dof)/(model$null.chisq-model$null.dof ) )

Extract measures and save them to GoodnessfitMeasures variable.

GoodnessfitMeasures = c(model$STATISTIC, model$PVAL,model$dof, CFImodel, model$TLI, model$RMSEA, model$rms)

Put into columns

require(reshape)

GoodnessfitMeasures = as.data.frame(GoodnessfitMeasures)

GoodnessfitMeasures = melt(GoodnessfitMeasures, id.vars="GoodnessfitMeasures")

Skip the next section if you are not using Lavaan and go to “Continued” below.

**For Lavaan**

GoodnessfitMeasures = fitmeasures(fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "rmsea.ci.lower", "rmsea.ci.upper", "srmr"))

Create names for colunms

namess = c("Chisq", "DF", "P-Value", "CFI", "TLI", "RMSEA", "RMSEA ci upper", "RMSEA ci lower", "SRMR")

Put GoodnessfitMeasures into data frame (if you already did this for the psych package above no need to do it again).

GoodnessfitMeasures = data.frame(GoodnessfitMeasures)

**Continued**

Use dplyr to bind names and fit measure. This will result in at least two columns. 1. the names of the fit measures Chisq, DF etc. 2. The corresponding fit measures (can have more than one column depending on how many models you have.

all = bind_cols(namess, GoodnessfitMeasures)

There might be long numbers in each Create function to round all numbers of a data frame (function was found here: http://stackoverflow.com/questions/9063889/rounding-a-dataframe-in-r and works very well)

round_df <- function(df, digits) { nums <- vapply(df, is.numeric, FUN.VALUE = logical(1)) df[,nums] <- round(df[,nums], digits = digits) (df) }

Round the numbers on the data frame

GoodnessfitMeasures = round_df(all, digits=3)

Transpose the data frame so that the measures of each model take up a row rather than a column. This makes it easier to compare models if there are many of them. The result will be a matrix

GoodnessfitMeasures = t(GoodnessfitMeasures)

I like to turn the matrix into a data frame

GoodnessfitMeasures = as.data.frame(GoodnessfitMeasures)

Print the table as a latex then use suave to compile as a PDF. I personally then turn the PDF into a word doc using adobe pro. If you don’t have that there are other online options to turn PDFs into word docs.

require(xtable)

xtable(GoodnessfitMeasures)

The post Goodness of Fit Measures Table APA for Factor Analysis appeared first on Levi Brackman's Website.

]]>The post Polls, Margin of Errors and Standard Deviations appeared first on Levi Brackman's Website.

]]>See My App that Explains Standard Deviations Intuitively Here

This coming week there are big primaries with lot of delegates up for grabs in New York. It seems from the polls that the both Trump and Clinton are ahead. How reliable are those polls? There are many ways to answer that question and it really depends on many complex considerations such as how the poll was taken, how many people were sampled etc. But without going into all of that there is something each of us can look up to determine reliability. Each poll comes with a margin of error which tells us that we can expect the poll is correct give or take the few points of margin of error. We ought to take note of those numbers. Whilst the poll number itself is helpful it does not tell us the entire story. When we take the margin of error into account we get a more accurate picture of what the poll is really saying.

The main number of the poll is like the average. We like to use the average a lot because it conveys a summary of a population we are interested in. Politicians love talking about the “average American”. In sports you have Batting Averages and Average Scoring Margins amongst others. These are all very valuable but do not convey the entire picture. As was mentioned in a previous post the medium is also an important number to know. You can read that more about the median and see its accompanying app here. However, even with knowing the mean and the medium we do not have the entire story. We need another piece of information and that is how variable (spread out) the population is around the mean.

Technically the income of the average American family income is $53,657 per year. But the entire population might be very spread out around that number and therefore that summary statistic does not give us enough information about most Americans. See if the population is very spread out you could have a huge chunk earning considerably less or more than $53,657. Whereas if the population is tightly clustered around the mean than the $53,657 tells us a great deal about the population as a whole.

The Standard Deviation is a number which tells us how spread out or how closely clustered a population is around its mean. A higher number of Standard Deviation tells us that the populations is more spread out and a lower number tells us it is more tightly clustered around the mean.

The Standard Deviation and the Margin of Error are similar to each other. They both tell us how reliable the main number — the poll or the mean — is, and how much we can expect the actual result to vary.

Thus, whether we are talking about polls or general statistics we should be careful to look for more than just the headline number and ask for the median and the standard deviation so that we can get a more accurate picture.

For a deeper and intuitive understanding of Standard Deviations see the App I created here

The post Polls, Margin of Errors and Standard Deviations appeared first on Levi Brackman's Website.

]]>The post How Politicians Lie to You With Statistics appeared first on Levi Brackman's Website.

]]>We are in the midst of an intense election season here in the United States and it seems that some politicians will say anything in order to get votes. This should come as no surprise. As my father used to tell me the one most important qualification needed to be a successful politician is to be an artful lier. One of the most efficient tactics used by politicians as well as companies and public officials to mislead is statistics. As British Prime Minister Benjamin Disraeli said: “There are three kinds of lies: lies, damned lies, and statistics.”

This does not mean that statistics are lies. It means that statistics are often used as an elegant way to mislead without having to outright lie. In order not to be fooled it is important to understand how this is done. One of the most effective way of lying with statistics is to cherry pick the statistic that agrees with one’s point and ignore those that do not. Politicians do this the whole time and most people do not notice.

One of the most common ways of misleading by cherry picking statistics is by reporting only one measure of central tendency and leaving out the rest that do not support the given argument. There are overall three measures of central tendency: the mean (also known as the average), the median and the mode. To get a full understanding of what is going on in any data, it is important to know all three. But most often we only hear about the mean or the median but not both. This is often because reporting both would be inconvenient for the point being made.

I have created an interactive app that will explain this more clearly and intuitively, using the example of a group of people’s incomes. The app will allow you to play with the numbers and see clearly, on your own, how leaving one indicator of central tendency out can easily mislead. See that app here.

There is another element of a data set that also should be reported and that is how the data varies, known as the variance and/or the standard deviation. Leaving that out can also seriously mislead. That will be the topic of my next post. But to understand that it’s important to understand central tendency first. See the central tendency app here.

The post How Politicians Lie to You With Statistics appeared first on Levi Brackman's Website.

]]>The post Why I love Las Vegas: The Law of Large Numbers appeared first on Levi Brackman's Website.

]]>Las Vegas is caters to people’s vices of all kinds. Sheindy and I love visiting Sin City although we do not gamble, drink or partake in any of the other entertainment and activities designed to cater to human weakness. We go there because the accommodation is relatively inexpensive, family friendly shows are often free or greatly discounted, we can drive there from our home, and there is lots of really good Kosher restaurants to eat at.

So how do hotels in Las Vegas manage to offer inexpensive accommodation, cut price shows and free attractions and still make money. The answer is gambling of course. The hotels make money when they get you to gamble. The gambling subsidizes the accommodation and the cut price shows.

But how does that work? Does not gambling offer the promise of outsized wins to those that partake? Well maybe to some, but the Law of Large Numbers guarantees that the casino will never loose money and in fact will overall make loads of money. Whilst I have never gambler, since I learned this law, any attraction to gamble has vanished.

So what is the Law of Large Numbers that guarantees that the house will always make money and makes gambling so unattractive to me?

Simply put this law says that whilst you can beat the odds of something happening for a time you cannot do it consistently over and over again. Think of flipping a coin for example. It is conceivable that you can get three or four heads in a row. One can event get ten heads in a row. But if you do a thousand or more coin flips one average you will get heads fifty percent of the time and tails fifty percent of the time.

What if someone doctors the coin and the way the weight is distributed makes it so that there is only a 20 percent chance it will land on heads and 80 percent chance it will land on tails? The Law of Large Numbers still applies. Over many, many coin flips 20 percent of them will be heads but 80 percent will be tails tails. The Law of Large Numbers as I have described it is a fact. To prove it to you I have created an app that simulates coin flips for you to try it for yourself. See here.

This is what the casinos do. The odds of you losing, and the casino winning is stacked in favor of the casino. Whilst it is entirely possible that you as an individual might win, overall, over many times, you are guaranteed to lose. Thus, the more people they get in through the doors to gamble the more money they make.

This is known as the Gambler’s Fallacy, where the gambler believes that since something has happened many times in the recent past it is less likely to happen again or if something did not happen recently is means that it is due to happen soon. People think that since they lost previously they are “due” for a win or vice versa. In reality the chances of winning in a game that depends on “luck” is completely uninfluenced by what went before or what will happen after.

This law is also at play when you buy insurance coverage and is why you should never buy the extended warranties on items you could easily afford to replace. In making decisions about life it is important to keep this law in mind. It will help you avoid drawing false conclusions that could hurt you in the long run. It will also help you avoid people and schemes that are designed to part you from your hard earned money.

This is in essence why I love Las Vegas. Where else can I go on vacation and have it subsidized by the multitudes who either choose to ignore or are ignorant of a fundamental law of the universe?

(View the app I created that illustrates the Law of Large Numbers here)

The post Why I love Las Vegas: The Law of Large Numbers appeared first on Levi Brackman's Website.

]]>The post We (and our investments) Regress to Mediocrity appeared first on Levi Brackman's Website.

]]>(Here is an interactive app I created to help you understand the ideas expressed in this post. Take a look either before or after reading the post.

We all have days where we are much more productive than usual. At such times we feel good about ourselves and think that we are turning a new leaf in terms of our productivity. Overtime however, we often find ourselves reverting back to our standard level of productivity. This concept is called “Regression to the Mean” ^{1)}Also known as Regression to Mediocrity. and it is a rule in statistics and about how the world works.

This rule builds on what we discussed in the previous post about how many things in the natural world fit the Normal Distribution. Most of the members of the distribution will be in the middle, around the mean ^{2)}Also known as the arithmetic average–we will use average and mean interchangeably. and the minority will be in the edges, known as “the tails”, of the distribution. Let’s take a real world example. The distribution (on the right) is of the heights of people, the average person is 68.3 inches tall, this is indicated by the blue line in the center on the graph (this one is called a histogram). The majority of the population are somewhere around average indicated by the blue line. Very few were above 72 inches (6 feet) or below 65 inches (5.3 feet). ^{3)}Note that in this dataset 1.08 inches was added to female heights to even out gender differences.

If one compares the heights of parents to that of their children one finds that parents who are very tall have children who are slightly shorter than themselves and parents who are short have children who are slightly taller than them. This makes sense intuitively because if tall parents always had children who were taller than them some part of the population would incrementally get taller until we had a population of giants. Similarly if short parents consistently had even shorter children we would end up with part of the population who get unendingly shorter. Neither of these happen in the real world. So Regression to the Mean tells us that any extreme occurrence will not be permanent. Over time it will revert back to the mean.

Another real life example of this is hedge fund and mutual fund managers. At any given time you will have some who beat the market and outperform the others. Yet, this rarely lasts. According to a New Yorker Magazine piece in 2014 a third of hedge funds fail in a three year period:

“Out of an estimated seventy-two hundred hedge funds in existence at the end of 2010, seven hundred and seventy-five failed or closed in 2011, as did eight hundred and seventy-three in 2012, and nine hundred and four in 2013.”

Thus, whilst some people can beat the market average some of the time, Regression to the Mean informs us that it is very rare for people to be able to do it consistently. This law also applies to sports as well many other fields.

In order to illustrate this idea using real data, I have created a simple online application that you can play around with to see how the heights of people regress to the mean. This app uses the well-known Galton dataset, collected in 1885, of nearly 900 pairs of parents and their children and shows that the children of tall parents are on average shorter than their parents and the children of short parents are on average taller than their parents.

Next post will build on this idea and will be about “The Law of Large Numbers.”

Notes

1. | ↑ | Also known as Regression to Mediocrity. |

2. | ↑ | Also known as the arithmetic average–we will use average and mean interchangeably. |

3. | ↑ | Note that in this dataset 1.08 inches was added to female heights to even out gender differences. |

The post We (and our investments) Regress to Mediocrity appeared first on Levi Brackman's Website.

]]>The post The Normal Distribution – Explained Intuitively appeared first on Levi Brackman's Website.

]]>Of all the people that you know how many of them are truly extraordinary in any domain? How many world-class dancers do you know? How of the people in the Forbes list of richest people do you personally know? Unless you are a professional dancer or are extremely wealthy, I already know the answer to these two questions: you most likely don’t know any. How do I know that? Well because of something called the Normal Distribution ^{1)}Although clearly wealth is not distributed evenly..

In 1795 the German mathematician Carl Friedrich Gauss observed that astronomical errors were always distributed in the same way, therefore the Normal Distribution is often referred to as the Gaussian Distribution named after Gauss. But what is this Normal Distribution and how does it allow me to make pretty well founded assumptions about the type of people you probably do or do not know?

It’s a rule about how often things occur in the world around us. For example, there are many scientists but very few who, like Einstein, came up with theories that impact almost every aspect of the world as we know it. Since I live in the Rocky Mountains let’s use mountains as our example. There are millions of mountains but very few with an elevation close to that of Mount Everest — 29,029 feet tall. The Normal Distribution predicts all of this. It does this by telling us how things, like mountains, are generally distributed. But what is a distribution?

Imagine you are making a peanut butter sandwich. You have your slice of bread and you distribute your peanut butter over the slice. In this case you’d want the peanut butter to be evenly distributed with the same amount spread over the entire surface of the slice. Now imagine the slice of bread is enormous and instead of peanut butter you spread mountains over the slice of bread. And instead of spreading them evenly you distribute the mountains so that mountains of average height are stacked in the middle and the taller and smaller mountains are stacked progressively further away towards the edges of the slice.

What would your slice look like?

Something like this (looks like mountains itself!):

The vast majority of the mountains will be distributed in the middle with very few distributed around the edges ^{2)}I have not done the analysis on all mountains in the world so this is an hypothesis of what a histogram of the elevation of all mountains in the world might look like. In a dataset of hills from the UK that I was able to analyze for this post I found that the distribution was somewhat right skewed which means that there were more smaller hills than really tall hills. This makes sense because what constitutes the minimum size of a hill or mountain is arbitrary and since the ground is flat you’d expect there to be more smaller hills or than super tall ones. I also analyzed a dataset of 80 Peaks with Prominence 2,000 ft. and greater in Colorado and I found that was left skewed. But neither of these datasets were representative of all mountains. The larger UK dataset was more representative of all the hills in the UK and although right skewed seemed to approximate a normal distribution based on my analysis (although it failed the Anderson-Darling normality test but that could have been because it was such a large dataset but the Q-Q plot seemed “approximately” normal). A study of all mountains, however, would be an interesting and fun to conduct–collecting all the data would be the difficult part.. This pattern works not only for mountains but it also works for most other things that occur in the natural world. If you did this exercise with people’s heights the results would be the same. But the same would be true for extraordinary talent in any domain. Most people fall within the range of average and the more or less a person is of something (taller/smaller, more/less talented etc.) the less there are of them to the degree that the Albert Einsteins or Peyton Mannings of this world are extremely rare ^{3)}Intelligence is normally distributed, although I don’t have evidence that the quality of football players are normally distributed..

Now isn’t this intuitive? You knew this already didn’t you? Well this simple idea is a fundamental concept in statistics, and it allows statisticians, researchers, pollsters, data scientists etc. to make all kinds of predictions etc. about the world we live in.

Notes

1. | ↑ | Although clearly wealth is not distributed evenly. |

2. | ↑ | I have not done the analysis on all mountains in the world so this is an hypothesis of what a histogram of the elevation of all mountains in the world might look like. In a dataset of hills from the UK that I was able to analyze for this post I found that the distribution was somewhat right skewed which means that there were more smaller hills than really tall hills. This makes sense because what constitutes the minimum size of a hill or mountain is arbitrary and since the ground is flat you’d expect there to be more smaller hills or than super tall ones. I also analyzed a dataset of 80 Peaks with Prominence 2,000 ft. and greater in Colorado and I found that was left skewed. But neither of these datasets were representative of all mountains. The larger UK dataset was more representative of all the hills in the UK and although right skewed seemed to approximate a normal distribution based on my analysis (although it failed the Anderson-Darling normality test but that could have been because it was such a large dataset but the Q-Q plot seemed “approximately” normal). A study of all mountains, however, would be an interesting and fun to conduct–collecting all the data would be the difficult part. |

3. | ↑ | Intelligence is normally distributed, although I don’t have evidence that the quality of football players are normally distributed. |

The post The Normal Distribution – Explained Intuitively appeared first on Levi Brackman's Website.

]]>