skip to Main Content

I’m working on an answer site crawler, how should I get the questions text inside this td, instead of including the text in the tag

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Document</title>
  </head>
  <body>
    <table
      border="0"
      width="100%"
      onclick="GiveAns(event.srcElement||event.target)"
      onmouseover="ChangeColor(event.srcElement||event.target)"
    >
      <tbody>
        <tr>
          <th class="w">Question number</th>
          <th class="w">key<br />answer</th>
          <th class="w">Choose your <br />own answer</th>
          <th class="w">Selected Topics<span id="cdes"></span></th>
          <th class="w">Error<br />Notification</th>
        </tr>
      </tbody>
      <tbody id="s1234">
        <tr id="d1">
          <th><a name="P1">1</a></th>
          <th><b>(1)</b></th>
          <th><tt> </tt></th>
          <td>
            question1
            <i>
              <a>(1)ans1</a>
            </i>
            <i>(2)ans2</i>
            <i>(3)ans3</i>
            <i>ans4</i>。<q>360 02-137</q>
          </td>
          <th class="h" onclick="E(this)"><img src="/e.gif" /></th>
        </tr>
        <tr id="d2">
          <th><a name="P2">2</a></th>
          <th><b>(4)</b></th>
          <th><tt> </tt></th>
          <td>
            question2
            <i>(1)ans1</i>
            <i>(2)ans2</i>
            <i>(3)ans3</i>
            <i>
              <a>(4)ans4</a>
            </i>
            。
            <q>1149 </q>
          </td>
          <th class="h" onclick="E(this)"><img src="/e.gif" /></th>
        </tr>
      </tbody>
    </table>
  </body>
</html>

This is my table for site

I tried these methods

document.querySelectorAll('#s1234 tr > td:not(i)').forEach((e)=>{console.log(e)})
document.querySelectorAll('#s1234 tr > td'))

But all of these methods contain <i> and <a> tags, so how do I get just the question text?

The result I need is like this: "question1"

3

Answers


  1. It isn’t super clear what you are asking, do you just need the innerText? e.g.

    document.querySelectorAll('#s1234 tr > td').forEach((e) => {
      console.log(e.innerText)
    })
    

    Gives

    question1 (1)ans1 (2)ans2 (3)ans3 ans4。360 02-137
    question2 (1)ans1 (2)ans2 (3)ans3 (4)ans4 。 1149 
    

    Edit:

    if you just need the question part then…

    document.querySelectorAll('#s1234 tr > td').forEach((e) => {
      console.log(e.firstChild.data.trim())
    })
    

    gives…

    question1
    question2
    
    Login or Signup to reply.
  2. You can’t do it with a CSS selector (see this question).

    But since you’re already in JS, you can get text content in a few other ways, for which there is also a dedicated question with many options (probably this is currently the best one).

    Applied to the question’s code:

    const extractText= (node) => {
    // Assuming there's 1 text node you want.
    // Change to `filter` if you want to extract all text nodes in an element.
      const text = [...node.childNodes].find(child => child.nodeType === Node.TEXT_NODE);
      return text && text.textContent.trim();
    }
    
    const allTextNodes = [...document.querySelectorAll('#s1234 tr > td')].map(extractText);
    
    Login or Signup to reply.
  3. I believe you only want to extract Question, your statements are little confusing

    document.querySelectorAll('#s1234 tr > td').forEach((e)=>{console.log(e.firstChild.data)}) # this will give you only question
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search