Trend Analysis of TED Talks with Python Codes – Hacker Noon

TED is a non-profit organization founded in 1984 by Richard Saulman.

TED aimed at bringing experts from the Technology, Entertainment and Design converged, and today covers almost all topics in more than 100 languages. TED’s mission is “spread ideas” in the form of short and powerful talks.

I have learned a lot of things from TED Talks about fields that I didn’t have even a knowledge crumb. And with this story, I want to dig into Trends of TED by progressing with Python codes.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
%matplotlib inline

I will use TED Talks dataset received from Kaggle. I should notice that data contains years between 2006 and 2017. Therefore, I will be analyzing up to a year ago. You can reach the data from the link down below and get more details about the features of the data.

#Getting the data
df = pd.read_csv("../input/ted_main.csv")

#Setting Date Format
month_order = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
day_order = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
df['film_date'] = df['film_date'].apply(lambda x: datetime.datetime.fromtimestamp( int(x)).strftime('%d-%m-%Y'))
df['published_date'] = df['published_date'].apply(lambda x: datetime.datetime.fromtimestamp( int(x)).strftime('%d-%m-%Y'))
df["published_year"] = df["published_date"].apply(lambda x: x.split("-")[2])

Most Viewed 25 Talks of All Time

df = df.sort_values('views', ascending=False)

Ken Robinson’s talk titled “Do Schools Kill Creativity?” is the most popular TED Talk of all time with 47.2 million views. Also, this talk published in 2006. That’s mean it is one of the oldest talks.

Second most viewed talk titled “Your body language may shape who you are” belongs to Amy Cuddy with 43.1 million views. And it’s published in 2012! According to the published date, Amy’s talk performed very well when compared to Kens and views decrease dramatically downwards.

Most Content Produced Years At TED


It seems like TED team has done well in 2012 and growth regularly till then. But after, decreases slightly.

What about views by year?

sns.barplot(x= df.groupby(["published_year"]).sum()["views"].index, y= df.groupby(["published_year"]).sum()["views"])

According to views, it peaked in 2013 and crashed hard in 2017. Even can say TED returned back to it started. I think the rise in 2012 and 2013 occurred by means of Amy Cuddy with 43.1 million viewed speech and TED lived its most popular times.

Top 10 Occupations The Most Likely To Talk At TED


As the output suggests, writers ahead by a big gap. TED attaches importance to intellectual knowledge of speakers. If you have an achievement on the artistic field, there is no obstacle.

Average Views Per Top 5 Occupation

print("Writer: ",int(df[df["speaker_occupation"]=="Writer"]["views"].sum() / len(df[df["speaker_occupation"]=="Writer"])))
print("Designer: ", int(df[df["speaker_occupation"]=="Designer"]["views"].sum() / len(df[df["speaker_occupation"]=="Designer"])))
print("Artist: ",int(df[df["speaker_occupation"]=="Artist"]["views"].sum() / len(df[df["speaker_occupation"]=="Artist"])))
print("Jornalist: ",int(df[df["speaker_occupation"]=="Journalist"]["views"].sum() / len(df[df["speaker_occupation"]=="Journalist"])))
print("Entrepreneur",int(df[df["speaker_occupation"]=="Entrepreneur"]["views"].sum() / len(df[df["speaker_occupation"]=="Entrepreneur"])))

When looked at viewer demands, it seems like people like entrepreneurship speeches. Because with 1.9 million average it catches the second row although writers’ speeches perform very well.

Most Used 10 Tags of All Time

tags = []
for i in range(len(df.loc[:,'tags'])):
ls = list(df.loc[:,'tags'])[i][2:-2].split(',')
for c in range(len(ls)):
value= list(df.loc[:,'tags'])[i][2:-2].split(',')[c]
tags = pd.DataFrame(tags,columns=["tags"])
tags = pd.DataFrame(tags.iloc[:,0].value_counts()).reset_index()

Technology is the most used tag of all time as you may be expected. Then science follows it.

Struck upon an idea when I saw that graph. The idea is, analyzing how tag trends change year by year for the last 3 years.

df2017 = df[df["published_year"]=="2017"]
tags2017 = []
for i in range(len(df2017.loc[:,'tags'])):
ls = list(df2017.loc[:,'tags'])[i][2:-2].split(',')
for c in range(len(ls)):
value= list(df2017.loc[:,'tags'])[i][2:-2].split(',')[c]
tags2017 =pd.DataFrame(tags2017,columns=["tags"])
df2016 = df[df["published_year"]=="2016"]
tags2016 = []
for i in range(len(df2016.loc[:,'tags'])):
ls = list(df2016.loc[:,'tags'])[i][2:-2].split(',')
for c in range(len(ls)):
value= list(df2016.loc[:,'tags'])[i][2:-2].split(',')[c]
tags2016 =pd.DataFrame(tags2016,columns=["tags"])
df2015 = df[df["published_year"]=="2015"]
tags2015 = []
for i in range(len(df2015.loc[:,'tags'])):
ls = list(df2015.loc[:,'tags'])[i][2:-2].split(',')
for c in range(len(ls)):
value= list(df2015.loc[:,'tags'])[i][2:-2].split(',')[c]
tags2015 =pd.DataFrame(tags2015,columns=["tags"])

read original article here