browser history
This post is insipred by the uploader 偶尔有点小迷糊, however his post was removed for some reason in Bilibili. In his video, he introduced how to use python to get your browser history and make a pie chart of it. I borrowed his idea here and use R code to explore what I browse most via Bar chart.
R Package Required
library(DBI)
library(dplyr)
library(urltools)
library(ggplot2)
history file
The chrome history file in Windows is SQLite so we can use package DBI
to connect and read.
con <- dbConnect(RSQLite::SQLite(), "C:/Users/liumi/AppData/Local/Google/Chrome/User Data/Default/History")
You may replace with your user name.
Some basic SQLITE command:
dbListTables(con)
dbListFields(con, "urls")
the URL
urls = dbReadTable(con, "urls") #Read the URL
domain = url_parse(urls$url)$domain # parse the url to get the main domain
domaint = tibble(domain, weights = urls$visit_count) # REMEMBER TO GET WEIGHTS
# get count
domain_count = domaint %>%
count(domain, wt=weights) %>%
arrange(desc(n))
# get top n=10
domain_count_top = domain_count %>%
top_n(10) %>%
mutate(prop = n / sum(n)*100) %>%
mutate(ypos = prop/2, rank = row_number(), propc = scales::label_percent()(prop/100) )
# site list
sitelist = domain_count_top$domain
Plot
# Basic bar chart
ggplot(domain_count_top, aes(x=rank, y=prop, fill=domain)) +
geom_tile(aes(y = ypos, height = prop, fill = domain), width = 0.9) +
geom_text(aes(label = propc), hjust = "right", nudge_y = .01, colour = "grey30") +
coord_flip(clip="off") +
scale_x_reverse("", labels=sitelist, breaks = 1:10) +
theme(panel.grid.major.y=element_blank(),
panel.grid.minor.x=element_blank(),
legend.position = "none",
axis.text.x = element_blank(),
axis.title.x = element_blank()
)
# disconnect DB
dbDisconnect(con)
Result
Please note those percentages are based on the top n (10), and you may display the percentages based on the total history.