skip to Main Content

How do I extract the text from the last <li> in the following snippet? (Černošice.)

<footer class="SearchResultCard__footer">
    <ul class="SearchResultCard__footerList">
        <li class="SearchResultCard__footerItem">
            <svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" id="7c37b661a1f4030a0673d3e5cb419678" aria-hidden="true">
                <path fill-rule="evenodd" clip-rule="evenodd" d="M6.16146 2H9.83854C10.3657 1.99998 10.8205 1.99997 11.195 2.03057C11.5904 2.06287 11.9836 2.13419 12.362 2.32698C12.9265 2.6146 13.3854 3.07354 13.673 3.63803C13.8658 4.01641 13.9371 4.40963 13.9694 4.80497C14 5.17955 14 5.63432 14 6.16148V10L17.8385 10C18.3657 9.99998 18.8205 9.99997 19.195 10.0306C19.5904 10.0629 19.9836 10.1342 20.362 10.327C20.9265 10.6146 21.3854 11.0735 21.673 11.638C21.8658 12.0164 21.9371 12.4096 21.9694 12.805C22 13.1795 22 13.6343 22 14.1614V20C22.5523 20 23 20.4477 23 21C23 21.5523 22.5523 22 22 22H2C1.44772 22 1 21.5523 1 21C1 20.4477 1.44772 20 2 20V6.16146C1.99998 5.63431 1.99997 5.17955 2.03057 4.80497C2.06287 4.40963 2.13419 4.01641 2.32698 3.63803C2.6146 3.07354 3.07354 2.6146 3.63803 2.32698C4.01641 2.13419 4.40963 2.06287 4.80497 2.03057C5.17954 1.99997 5.63431 1.99998 6.16146 2ZM4 20H12V6.2C12 5.62345 11.9992 5.25118 11.9761 4.96784C11.9539 4.69617 11.9162 4.59546 11.891 4.54601C11.7951 4.35785 11.6422 4.20487 11.454 4.109C11.4045 4.0838 11.3038 4.04612 11.0322 4.02393C10.7488 4.00078 10.3766 4 9.8 4H6.2C5.62345 4 5.25117 4.00078 4.96784 4.02393C4.69617 4.04612 4.59545 4.0838 4.54601 4.109C4.35785 4.20487 4.20487 4.35785 4.10899 4.54601C4.0838 4.59546 4.04612 4.69617 4.02393 4.96784C4.00078 5.25117 4 5.62345 4 6.2V20ZM14 12V20H20V14.2C20 13.6234 19.9992 13.2512 19.9761 12.9678C19.9539 12.6962 19.9162 12.5955 19.891 12.546C19.7951 12.3578 19.6422 12.2049 19.454 12.109C19.4045 12.0838 19.3038 12.0461 19.0322 12.0239C18.7488 12.0008 18.3766 12 17.8 12H14ZM5.5 7C5.5 6.44772 5.94772 6 6.5 6H9.5C10.0523 6 10.5 6.44772 10.5 7C10.5 7.55229 10.0523 8 9.5 8H6.5C5.94772 8 5.5 7.55229 5.5 7ZM5.5 11C5.5 10.4477 5.94772 10 6.5 10H9.5C10.0523 10 10.5 10.4477 10.5 11C10.5 11.5523 10.0523 12 9.5 12H6.5C5.94772 12 5.5 11.5523 5.5 11ZM5.5 15C5.5 14.4477 5.94772 14 6.5 14H9.5C10.0523 14 10.5 14.4477 10.5 15C10.5 15.5523 10.0523 16 9.5 16H6.5C5.94772 16 5.5 15.5523 5.5 15Z" fill="currentColor"></path>
            </svg>
            <span translate="no">Aquaconsult, s.r.o.</span>
        </li>
        <li data-test="serp-locality" class="SearchResultCard__footerItem">
            <svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" id="75c13f0b214e56e726e671a0486a4038" aria-hidden="true">
                <path fill-rule="evenodd" clip-rule="evenodd" d="M12 3C8.13401 3 5 6.13401 5 10C5 12.2992 6.25613 14.0528 7.97135 15.8393C8.38164 16.2666 8.80735 16.6853 9.24218 17.1129L9.29106 17.161C9.73889 17.6015 10.1961 18.0521 10.6317 18.5111C11.112 19.0173 11.581 19.5489 12 20.1148C12.419 19.5489 12.888 19.0173 13.3683 18.5111C13.8039 18.0521 14.2611 17.6015 14.7089 17.161L14.7579 17.1129C15.1927 16.6853 15.6184 16.2666 16.0287 15.8393C17.7439 14.0528 19 12.2992 19 10C19 6.13401 15.866 3 12 3ZM3 10C3 5.02944 7.02944 1 12 1C16.9706 1 21 5.02944 21 10C21 13.1191 19.2561 15.3655 17.4713 17.2244C17.0407 17.673 16.5965 18.1098 16.1671 18.5321L16.1114 18.5869C15.6608 19.0301 15.2274 19.4575 14.8192 19.8878C13.9985 20.7526 13.3284 21.5792 12.8944 22.4472C12.725 22.786 12.3788 23 12 23C11.6212 23 11.275 22.786 11.1056 22.4472C10.6716 21.5792 10.0015 20.7526 9.18085 19.8878C8.77261 19.4575 8.33924 19.0301 7.88863 18.5869L7.83285 18.5321C7.40346 18.1098 6.95932 17.673 6.52865 17.2244C4.74387 15.3655 3 13.1191 3 10ZM12 7.5C10.8954 7.5 10 8.39543 10 9.5C10 10.6046 10.8954 11.5 12 11.5C13.1046 11.5 14 10.6046 14 9.5C14 8.39543 13.1046 7.5 12 7.5ZM8 9.5C8 7.29086 9.79086 5.5 12 5.5C14.2091 5.5 16 7.29086 16 9.5C16 11.7091 14.2091 13.5 12 13.5C9.79086 13.5 8 11.7091 8 9.5Z" fill="currentColor"></path>
            </svg>
            Černošice
        </li>
    </ul>
</footer>

The following works with the first <li>:

foo.css("ul.SearchResultCard__footerList > li:first-child > span::text").get()

Result: Aquaconsult, s.r.o.

With the second one, I’ve tried last-child and nth-child, nothing (no error, just empty).

2

Answers


  1. Chosen as BEST ANSWER

    This is what I ended up doing (test code):

    from scrapy.selector import Selector
    import re
    
    html = """
        <footer class="SearchResultCard__footer">
            <ul class="SearchResultCard__footerList">
                <li class="SearchResultCard__footerItem">
                    <svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" id="7c37b661a1f4030a0673d3e5cb419678" aria-hidden="true">
                        <path fill-rule="evenodd" clip-rule="evenodd" d="M6.16146 2H9.83854C10.3657 1.99998 10.8205 1.99997 11.195 2.03057C11.5904 2.06287 11.9836 2.13419 12.362 2.32698C12.9265 2.6146 13.3854 3.07354 13.673 3.63803C13.8658 4.01641 13.9371 4.40963 13.9694 4.80497C14 5.17955 14 5.63432 14 6.16148V10L17.8385 10C18.3657 9.99998 18.8205 9.99997 19.195 10.0306C19.5904 10.0629 19.9836 10.1342 20.362 10.327C20.9265 10.6146 21.3854 11.0735 21.673 11.638C21.8658 12.0164 21.9371 12.4096 21.9694 12.805C22 13.1795 22 13.6343 22 14.1614V20C22.5523 20 23 20.4477 23 21C23 21.5523 22.5523 22 22 22H2C1.44772 22 1 21.5523 1 21C1 20.4477 1.44772 20 2 20V6.16146C1.99998 5.63431 1.99997 5.17955 2.03057 4.80497C2.06287 4.40963 2.13419 4.01641 2.32698 3.63803C2.6146 3.07354 3.07354 2.6146 3.63803 2.32698C4.01641 2.13419 4.40963 2.06287 4.80497 2.03057C5.17954 1.99997 5.63431 1.99998 6.16146 2ZM4 20H12V6.2C12 5.62345 11.9992 5.25118 11.9761 4.96784C11.9539 4.69617 11.9162 4.59546 11.891 4.54601C11.7951 4.35785 11.6422 4.20487 11.454 4.109C11.4045 4.0838 11.3038 4.04612 11.0322 4.02393C10.7488 4.00078 10.3766 4 9.8 4H6.2C5.62345 4 5.25117 4.00078 4.96784 4.02393C4.69617 4.04612 4.59545 4.0838 4.54601 4.109C4.35785 4.20487 4.20487 4.35785 4.10899 4.54601C4.0838 4.59546 4.04612 4.69617 4.02393 4.96784C4.00078 5.25117 4 5.62345 4 6.2V20ZM14 12V20H20V14.2C20 13.6234 19.9992 13.2512 19.9761 12.9678C19.9539 12.6962 19.9162 12.5955 19.891 12.546C19.7951 12.3578 19.6422 12.2049 19.454 12.109C19.4045 12.0838 19.3038 12.0461 19.0322 12.0239C18.7488 12.0008 18.3766 12 17.8 12H14ZM5.5 7C5.5 6.44772 5.94772 6 6.5 6H9.5C10.0523 6 10.5 6.44772 10.5 7C10.5 7.55229 10.0523 8 9.5 8H6.5C5.94772 8 5.5 7.55229 5.5 7ZM5.5 11C5.5 10.4477 5.94772 10 6.5 10H9.5C10.0523 10 10.5 10.4477 10.5 11C10.5 11.5523 10.0523 12 9.5 12H6.5C5.94772 12 5.5 11.5523 5.5 11ZM5.5 15C5.5 14.4477 5.94772 14 6.5 14H9.5C10.0523 14 10.5 14.4477 10.5 15C10.5 15.5523 10.0523 16 9.5 16H6.5C5.94772 16 5.5 15.5523 5.5 15Z" fill="currentColor"></path>
                    </svg>
                    <span translate="no">Company</span>
                </li>
                <li data-test="serp-locality" class="SearchResultCard__footerItem">
                    <svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" id="75c13f0b214e56e726e671a0486a4038" aria-hidden="true">
                        <path fill-rule="evenodd" clip-rule="evenodd" d="M12 3C8.13401 3 5 6.13401 5 10C5 12.2992 6.25613 14.0528 7.97135 15.8393C8.38164 16.2666 8.80735 16.6853 9.24218 17.1129L9.29106 17.161C9.73889 17.6015 10.1961 18.0521 10.6317 18.5111C11.112 19.0173 11.581 19.5489 12 20.1148C12.419 19.5489 12.888 19.0173 13.3683 18.5111C13.8039 18.0521 14.2611 17.6015 14.7089 17.161L14.7579 17.1129C15.1927 16.6853 15.6184 16.2666 16.0287 15.8393C17.7439 14.0528 19 12.2992 19 10C19 6.13401 15.866 3 12 3ZM3 10C3 5.02944 7.02944 1 12 1C16.9706 1 21 5.02944 21 10C21 13.1191 19.2561 15.3655 17.4713 17.2244C17.0407 17.673 16.5965 18.1098 16.1671 18.5321L16.1114 18.5869C15.6608 19.0301 15.2274 19.4575 14.8192 19.8878C13.9985 20.7526 13.3284 21.5792 12.8944 22.4472C12.725 22.786 12.3788 23 12 23C11.6212 23 11.275 22.786 11.1056 22.4472C10.6716 21.5792 10.0015 20.7526 9.18085 19.8878C8.77261 19.4575 8.33924 19.0301 7.88863 18.5869L7.83285 18.5321C7.40346 18.1098 6.95932 17.673 6.52865 17.2244C4.74387 15.3655 3 13.1191 3 10ZM12 7.5C10.8954 7.5 10 8.39543 10 9.5C10 10.6046 10.8954 11.5 12 11.5C13.1046 11.5 14 10.6046 14 9.5C14 8.39543 13.1046 7.5 12 7.5ZM8 9.5C8 7.29086 9.79086 5.5 12 5.5C14.2091 5.5 16 7.29086 16 9.5C16 11.7091 14.2091 13.5 12 13.5C9.79086 13.5 8 11.7091 8 9.5Z" fill="currentColor"></path>
                    </svg>
                    Foo
                    Bar
                    Baz
                </li>
            </ul>
        </footer>
    """
    
    
    list_items = Selector(text=html).css("ul.SearchResultCard__footerList li").getall()
    first_li = Selector(text=list_items[0]).css("li:first_child > span::text").get()
    last_li = Selector(text=list_items[1]).css("li::text").getall()
    
    print(first_li) # Company
    print(last_li) # ['n                ', 'n                Foon                Barn                Bazn            ']
    print("".join(re.sub(r"s+","",s) for s in last_li)) # FooBarBaz
    

    get() was not empty, it contained the newline character.


  2. In the second li the text is not enclosed in a span as it is in the first. Make sure that your selector isn’t looking for this span.

    The following should work:

    foo.css("ul.SearchResultCard__footerList > li:last-child::text").get()

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search