This post is insipred by the uploader 偶尔有点小迷糊, however his post was removed for some reason in Bilibili. In his video, he introduced how to use python to get your browser history and make a pie chart of it. I borrowed his idea here and use R code to explore what I browse most via Bar chart.

R Package Required

library(DBI)
library(dplyr)
library(urltools)
library(ggplot2)

history file

The chrome history file in Windows is SQLite so we can use package DBI to connect and read.

con <- dbConnect(RSQLite::SQLite(), "C:/Users/liumi/AppData/Local/Google/Chrome/User Data/Default/History")

You may replace with your user name.

Some basic SQLITE command:

dbListTables(con)

dbListFields(con, "urls")

the URL

urls = dbReadTable(con, "urls") #Read the URL

domain = url_parse(urls$url)$domain # parse the url to get the main domain

domaint = tibble(domain, weights = urls$visit_count) # REMEMBER TO GET WEIGHTS

# get count

domain_count = domaint %>% 
  count(domain, wt=weights) %>% 
  arrange(desc(n)) 

# get top n=10
domain_count_top = domain_count %>% 
  top_n(10) %>% 
  mutate(prop = n / sum(n)*100) %>%
  mutate(ypos = prop/2, rank = row_number(), propc = scales::label_percent()(prop/100) )

# site list
sitelist = domain_count_top$domain

Plot

# Basic bar chart
ggplot(domain_count_top, aes(x=rank, y=prop, fill=domain)) +
  geom_tile(aes(y = ypos, height = prop, fill = domain), width = 0.9) +
  geom_text(aes(label = propc), hjust = "right", nudge_y = .01, colour = "grey30") +
  coord_flip(clip="off") +
  scale_x_reverse("", labels=sitelist,  breaks = 1:10) +
  theme(panel.grid.major.y=element_blank(),
        panel.grid.minor.x=element_blank(), 
        legend.position = "none",
        axis.text.x = element_blank(), 
        axis.title.x = element_blank()
        )

# disconnect DB
dbDisconnect(con)

Result

Please note those percentages are based on the top n (10), and you may display the percentages based on the total history.



Published

20 January 2021

Modified

20 January 2021

Tags