BeautifulSoup: AttributeError: 'NavigableString' object has no attribute 'name'

2023-04-08 07:57 问答作者：

Do you know why the first example in BeautifulSoup tutorial http://www.crummy.com/software/BeautifulSoup/documentation.html#QuickStart gives AttributeError: 'NavigableString' object has no attribute 'name'? According to this answer the space characters in the HTML causes the problem. I trie开发者_高级运维d with sources of a few pages and 1 worked the others gave the same error (I removed spaces). Can you explain what does "name" refer to and why this error happens? Thanks.

Just ignore NavigableString objects while iterating through the tree:

from bs4 import BeautifulSoup, NavigableString, Tag

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

for body_child in soup.body.children:
    if isinstance(body_child, NavigableString):
        continue
    if isinstance(body_child, Tag):
        print(body_child.name)

name will refer to the name of the tag if the object is a Tag object (ie: <html> name = "html")

if you have spaces in your markup in between nodes BeautifulSoup will turn those into NavigableString's. So if you use the index of the contents to grab nodes, you might grab a NavigableString instead of the next Tag.

To avoid this, query for the node you are looking for: Searching the Parse Tree

or if you know the name of the next tag you would like, you can use that name as the property and it will return the first Tag with that name or None if no children with that name exist: Using Tag Names as Members

If you wanna use the contents you have to check the objects you are working with. The error you are getting just means you are trying to access the name property because the code assumes it's a Tag

You can use try catch to eliminate the cases when Navigable String is being parsed in the loop, like this:

    for j in soup.find_all(...)
        try:
            print j.find(...)
        except NavigableString: 
            pass

This is the latest working code to obtain the name of the tags in soup.

from bs4 import BeautifulSoup, Tag

res = requests.get(url).content
soup = BeautifulSoup(res, 'lxml')

for child in soup.body.children:
    if isinstance(body_child, Tag):
        print(child.name)

继续阅读：python

BeautifulSoup: AttributeError: 'NavigableString' object has no attribute 'name'

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？