I want to extract content from two different tags using PHP. I want to associate h2 tags with the div tags’ content that immediately follows them — like a parent-child relationship.
<h1>Title 1</h1>
<div class="items">some data and divs here 1</div>
<h1>Title 2</h1>
<div class="items">some data and divs here 2</div>
<div class="items">some data and divs here 3</div>
<h1>Title 3</h1>
<div class="items">some data and divs here 4</div>
<div class="items">some data and divs here 5</div>
<div class="items">some data and divs here 6</div>
The number of items between two H1 tag is different.
I know how to scrape all tags with simple_html_dom or GoutteClient to get:
<h1>Title 1</h1>
<h1>Title 2</h1>
<h1>Title 3</h1>
Or
<div class="items">some data and divs here 1</div>
<div class="items">some data and divs here 2</div>
<div class="items">some data and divs here 3</div>
<div class="items">some data and divs here 4</div>
<div class="items">some data and divs here 5</div>
<div class="items">some data and divs here 6</div>
But I am unable to associate the title to the data. I cannot figure out how to have an array like this:
array (
0 =>
array (
'item' => 'Title 1',
'data' => 'some data and divs here 1',
),
1 =>
array (
'item' => 'Title 2',
'data' => 'some data and divs here 2',
),
2 =>
array (
'item' => 'Title 2',
'data' => 'some data and divs here 3',
),
3 =>
array (
'item' => 'Title 3',
'data' => 'some data and divs here 4',
),
4 =>
array (
'item' => 'Title 3',
'data' => 'some data and divs here 5',
),
5 =>
array (
'item' => 'Title 3',
'data' => 'some data and divs here 6',
),
)
I’ve tried to implement something like sibling
, but didn’t find a way.
2
Answers
Here’s an idea, use some string manipulation to wrap the parts between the
h1
in aspan
(for example). Then read it using php’sDOMDocument
getting the html by the tag names (h1 and span)Here’s my attempt:
Output for
$items
and$titles
:Based on the answer on XPath until next tag, I’ve made very few modifications to generate the desired result.
Code: (Demo)