" Start your business on internet, we are here to help you "

We are Software Development and Technical Service providing company from kathmandu Nepal. We use latest technologies to fulfill your requirements

Basic data scraping using Ruby

Ruby     Published at about 2 months ago    Bishnu Basyal

Data scraping is the process of taking output that was originally intended to be human readable and extracting data from it so you can use that data in your program.

 

To scrap web using Ruby, we need
1) nokogiri
2) httparty

 

Install nokogiri
Go to your terminal and type the following code

imvsnu@vsnu:~$ gem install nokogiri

 

Install httparty
Go to your terminal and type the following code

imvsnu@vsnu:~$ gem install httparty  

 

In this tutorial we are going to scrap news headings of  Ekantipur

step 1:

create a ruby(.rb) file say scrap.rb and create a class

  # scrap.rb

  require 'nokogiri'
  require 'httparty'
 

  class Scrap
  end

donot forget to add require 'nokogiri' and require 'httparty'

 

step 2:

Send an http request to the page

  #scrap.rb

  require 'nokogiri'
  require 'httparty'
 

  class Scrap
    doc = HTTParty.get('http://www.ekantipur.com/')
  end

 

step 3:

Parse using nokogiri

#scrap.rb 

  require 'nokogiri'
  require 'httparty'
 

  class Scrap
    doc = HTTParty.get('http://www.ekantipur.com/')
    @parse_page = Nokogiri::HTML(doc)
  end

 

step 4:

Initialize and create an object

#scrap.rb

  require 'nokogiri'
  require 'httparty'
 

  class Scrap
    attr_accessor :parse_page

    def initialize    
      doc = HTTParty.get('http://www.ekantipur.com/')
      @parse_page = Nokogiri::HTML(doc)
    end

  end

  s = Scrap.new

 

step 5:

Figure out what you want to scrap using css selectors

#scrap.rb 

  require 'nokogiri'
  require 'httparty'

  class Scrap
    attr_accessor :parse_page

    def initialize  
      doc = HTTParty.get('http://www.ekantipur.com/')
      @parse_page = Nokogiri::HTML(doc)
    end


    def get_titles
      parse_page.css(".display-news-title").css("h1").css("a").children.map { |name| name.text }.compact
    end

  end

  s = Scrap.new
  titles = s.get_titles

 

step 6:

Display data

#scrap.rb 

  require 'nokogiri'
  require 'httparty'

  class Scrap
    attr_accessor :parse_page

    def initialize  
      doc = HTTParty.get('http://www.ekantipur.com/')
      @parse_page = Nokogiri::HTML(doc)
    end
    
    def get_titles
      parse_page.css(".display-news-title").css("h1").css("a").children.map { |name| name.text }.compact
    end

  end

  s = Scrap.new
  titles = s.get_titles

  (0..titles.size).each do |index|
    puts "#{index+1}) #{titles[index]} \n"
  end

 

output must be in this form:

1) Desai arrives in Kathmandu as batting consultant  
2) Indian court slaps 10-yr imprisonment on Ram Raheem for rape  
3) Police raids Dugad Foods after contamination in Frooti  
4) India visit successful: PM Deuba  
5) India, China agree to end border dispute at Doklam  
6) Number of viral fever patients up in Parbat  
7) 1 killed, 4 injured in Syangja microbus accident  
8) Affiliation to Nat'l Medical College a bizarre move: Dahal  
9) Sarlahi flood survivors compelled to live under tents  
10) NA man electrocuted in Sindhupalchok  
11) Micromax launches four new phones  
12) Yaks face extinction due to low birth rate  
13) Govt announces Rs1.25b aid for flood-hit farmers  
14) National Trading likely to get new lease of life  
15) Banking sector faces threat from its regulator  
16) Nepal announce 14-member squad  
17) India pip Nepal for title  
18) ‘Make employees feel valued’  
.
.
.
.
.
.
.

 

Check it on github Basic Web Scrap with Ruby

Thank you.....

To Top