skip to Main Content

I am scraping a website which return html containing single and double quotation and an example text is

<div class="article__content">                                    <font face="Arial Helvetica sans-serif" size="3">Successful hires will expand the group's ongoing efforts applying machine learning to drug discovery biomolecular simulation and biophysics.  Ideal candidates will have demonstrated expertise in developing deep learning techniques as well as strong Python programming skills.  Relevant areas of experience might include molecular dynamics structural biology medicinal chemistry cheminformatics and/or quantum chemistry but specific knowledge of any of these areas is less critical than intellectual curiosity versatility and a track record of achievement and innovation in the field of machine learning.</font>                                </div>

When I am writing the following query in phpmyadmin:

SELECT COUNT(*) FROM scrappedjobs WHERE JobDescription = '"<div class="article__content">                                    <font face="Arial Helvetica sans-serif" size="3">Successful hires will expand the group's ongoing efforts applying machine learning to drug discovery biomolecular simulation and biophysics.  Ideal candidates will have demonstrated expertise in developing deep learning techniques as well as strong Python programming skills.  Relevant areas of experience might include molecular dynamics structural biology medicinal chemistry cheminformatics and/or quantum chemistry but specific knowledge of any of these areas is less critical than intellectual curiosity versatility and a track record of achievement and innovation in the field of machine learning.</font>                                </div>"'

I am getting either error or count = 0 when this is present in database. Please tell me how to deal with strings containing quotations in scraped data. I am new to this and all the answers I found about it are for php and not python

EDIT:

The python is code is as follows:

self.Cursor = self.db.cursor(buffered=True)
    FetchQuery = "SELECT COUNT(*) FROM scrappedjobs where URL = %s AND JobDescription = %s"
    self.Cursor.execute(FetchQuery,("'" + item['url'] + "'", item['text']))

    if(self.Cursor.fetchone()[0]== 0): #If the url does not exist in database
        print("Inserting into db...n")
        InsertQuery = "INSERT INTO scrappedjobs (URL, JobTitle, JobDescription, CompanyName) VALUES (%s, %s, %s, %s)"
        self.Cursor.execute(InsertQuery,(item['url'], item['title'], item['text'], item['companyName']))
        self.db.commit()

Basically the if condition is not triggering, despite that data being there in database.

2

Answers


  1. You need something like this :

      create table #scrappedjobs
        (
           JobDescription NVARCHAR(1000)
        )
        
        insert into #scrappedjobs (JobDescription)
        VALUES('"<div class="article__content">"')
        
        select * from #scrappedjobs
        
        SELECT COUNT(*) FROM #scrappedjobs WHERE JobDescription = '"<div class="article__content">"'
    
    -- Second select with like :
        SELECT COUNT(*) FROM #scrappedjobs WHERE JobDescription like '%"<div class="article__content">"%' 
    

    Remeber you need to use ‘ on the start and end of JobDescription value.

    Login or Signup to reply.
  2. Your example string begins like this: Successful hires will expand the group's. The single quote in group's will be interpreted by MySQL as the closing of the string condition in your SELECT statement.
    In order to make that work, you have to replace every ' with '' when storing the text in the database and reversing it when you need.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search