skip to Main Content

I want to scrap data from a webpage. Here’s the code I have. It is supposed to get all the authors, but it only gets a first one (‘Simon Butler’).

Array.from(document.querySelectorAll('#author-group'))
  .map(e =>
      e.querySelector('[class="button-link workspace-trigger button-link-primary"]'))
  .map(e =>
      e.querySelector('[class="button-link-text"]'))
  .map(e =>
      e.querySelector('[class="react-xocs-alternative-link"]'))
  .map(e =>
      e.querySelector('[class="given-name"]').textContent + ' '
      + e.querySelector('[class="text surname"]').textContent)
  .join(', ')

As I see it, the error is from using querySelector as it gets the first element. However, when I use querySelectorAll I get the following error: e.querySelectorAll is not a function.

I want to scrap data from https://www.sciencedirect.com/science/article/pii/S0164121219302262.
I didn’t give any HTML code as the source HTML is really huge when it comes to a portion of authors information. I’m not familiar enough with HTML nor JS to give a minimal sample of HTML code.

2

Answers


  1. Array.from(document.querySelectorAll('#author-group'))
    

    This creates an array with one element in it.


    The code you provided used querySelector which only returned one item (which is what you said you weren’t looking for) but you said you tried with querySelectorAll.

    .map(e =>
          e.querySelectorAll('[class="button-link workspace-trigger button-link-primary"]'))
    

    Since the previous step returned an array with an element in it, e is an element.

    Elements have querySelectorAll so this is fine.

    However, now you are returning a NodeList, not an Element.


    .map(e =>
    e.querySelectorAll(‘[class="button-link-text"]’))

    Now e is a NodeList. It isn’t an Element.

    NodeLists don’t have querySelector or querySelectorAll methods.

    You need to loop over the NodeList (perhaps with a map) and deal with each element one by one.


    Probably what you should be doing is calling querySelectorAll once and using descendant combinators to describe the elements containing each author in a single query.

    Then you would be able to:

    Array.from(document.querySelectorAll('#author-group etc etc etc'))
        .map(e =>  
            e.querySelector('[class="given-name"]').textContent
            + ' '
            + e.querySelector('[class="text surname"]').textContent
        )
    
    Login or Signup to reply.
  2. There’s only one #author-group. The elements containing authors are #author-group button. The name elements are .given-name and .surname.

    Array.from(document.querySelectorAll('#author-group button'))
    .map(e =>
      e.querySelector('.given-name').textContent + ' '
      + e.querySelector('.surname').textContent)
    .join(', ')
    

    Have a look at a CSS Selector reference to learn how they work. Ctrl-F in the Elements view of Chrome’s developer tools (F12) lets you test selectors.

    You can also use textContent of an element containing all the elements you want. #author-group button contains some extra letters, but #author-group .react-xocs-alternative-link contains just the names.

    Array.from(
      document.querySelectorAll(
        '#author-group .react-xocs-alternative-link'
      )
    ).map(e => e.textContent).join(', ')
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search