Best IT jobs in Canada – scraping Glassdoor.com

To try RStudio and graph plotting libraries ggplot and plotly for R I decided to find the numbers of job postings for various occupations in the IT sector in Canada, and then to use the graphing libraries for data visualization. The first step was to choose a job website and to compile a list of occupations.
Website: Glassdoor.
IT Job List:
  • Web developer
  • Software developer
  • Data analyst
  • Data scientist
  • Database administrator
  • Network administrator
  • Java Developer
  • IT Security Specialist
  • JavaScript developer
I keyed-in the Job titles one by one from the above list, using ‘Canada’ as the location and clicked the Search button. The results were recorded into a text file. That’s what I call real scraping! No modern tricks. I said to myself that for the first time, it’s ok! I imported this data into RStudio, wrote a few lines of code, created a plot and exported the plot as a png image. Here is what I got: (download source code and data): Here is what I got (download source code and data):
There is no point to talk about the “results”. A numerous articles with job posting analysis are easily available in the Internet.

Here are few comments:
1. The approach of data visualization, which was used here, has no interactivity and quickly becoming obsolete. I will keep it like it is for a while since scraping comes first! In a couple of weeks I am planning to try Shiny for bridging R and Plotly.

2. After careful consideration I found out that Glassdoor Search uses “OR” logic to select job postings, i.e. a ‘Data Entry Clerk’ posting is included in the results of a ‘Data Scientist’ search list. I will fix this issue next time.

3. About using JavaScript to graph plots
AmCharts generates plots online.
To use the Javascript features for your R charts
I sink under the weight of the splendour of these visions!
Click to read more

Leave a Reply