[BeautifulSoup4] インストールからHTML解析までを説明します

2021年11月2日2022年6月4日

説明

今回はBeautiful soupでyahoo.co.jpのタイトルを取得します。

環境

Windows

ライブラリー

beautifulsoup4
lxml(パーサー)
requests

ステップ

準備
ライブラリーのインポート
url指定
url内容の取得
要素取得(タイトル)

準備

ライブラリーのインストール

Beautiful Soup

pip install beautifulsoup4

lxml

pip install lxml

requests

pip install requests

ライブラリーのインポート

import requests
from bs4 import BeautifulSoup

url指定

url = 'https://www.google.com/'

url内容の取得

response = requests.get(url)

要素取得(タイトル)

soup = BeautifulSoup(response.content, 'lxml')
title = soup.title

最終コード

import requests
from bs4 import BeautifulSoup
url ="https://www.yahoo.co.jp/"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
title = soup.title
print(title)
# <title>Yahoo! JAPAN</title>

補足

htmlからの取得

soup.find(<タグ名>)(1件検索)
soup.find_all(<タグ名>)(タグ全検索)
soup.find(<タグ名>, <属性値>)(属性検索)
soup.find(<タグ名>, id=)(id検索)
soup.find(<タグ名>, class_=<クラス名>)(class検索)
soup.タグ名.タグ名…(タグ名で検索)
以下が例になります。(htmlファイルはpython内でコードとして準備しました)

例:

import lxml
from bs4 import BeautifulSoup
html_text = '''
<title>The Dormouse''s story</title>
Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie
'''
soup = BeautifulSoup(html_text, "lxml")
print(soup.find("title"))
#
print(soup.find_all("a"))
# [ Elsie ,
#  Lacie ,
#  Tillie ]
print(soup.find("a", href="http://example.com/lacie"))
# Lacie
print(soup.find("a", id="link3"))
# Tillie
print(soup.find("p", class_="title"))
# The Dormouse''s story
print(soup.html.head)
# 
#
#

Cssからの取得

soup.select(<タグ名>)(タグ検索)
soup.select_one(<タグ名>)(1件検索)
soup.select(<属性名>)(属性存在有無で検索)
soup.select(<属性名と属性値>)(属性値検索)
soup.select(CSSセレクタ)(CSSセレクタ検索)

以下が例になります。(htmlファイルはpython内でコードとして準備しました

例:

import lxml
from bs4 import BeautifulSoup
html_text = '''
The Dormouse''s story

Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie
'''
soup = BeautifulSoup(html_text, "lxml")
print(soup.select("title"))
# []
print(soup.select_one("a"))
# Elsie
print(soup.select("a[href]"))
# [Elsie,
# Lacie,
# Tillie]
print(soup.select('a[href="http://example.com/lacie"]'))
# [Lacie]
print(soup.select("p.title"))
# [The Dormouse''s story]

よかったらシェアしてね！

URLをコピーしました！

[BeautifulSoup4] インストールからHTML解析までを説明します

説明

環境

ライブラリー

ステップ

準備

ライブラリーのインストール

ライブラリーのインポート

url指定

url内容の取得

要素取得(タイトル)

最終コード

補足

htmlからの取得

Cssからの取得

お気軽にお問い合わせください