[Python] 스크래핑 배우기 2

Python

by TUZA 2024. 1. 22. 22:11

배운 내용 요약

1. weworkremotely 사이트에 접속한 뒤 beatuifulsoup4 를 이용해서 html을 받아온다.

2. 받아온 html 을 보고 찾고자하는 데이터를 선정한 뒤 가지고 올려는 정보가 담긴 태그를 이용해 데이터를 가져온다.

3. 데이터를 찾을 때 find와 find_all을 사용할 수 있다.

# find의 경우 찾는 태그가 여러 개 있다면 가장 첫 번째 태그 내용만 추출한다.

# 반면에, find_all의 경우 찾는 여러 개의 태그를 리스트에 담아 반환해준다.

4. list 안에 담긴 데이터를 추출할 때 for 문 말고도 다른 방법을 활용할 수 있음.

ex)

letters = ['a', 'b', 'c']

a,b,c = letters #리스트의 길이에 맞게 variable를 설정해줌.

이 방법을 활용 시 주의사항은 리스트의 길이를 알고 있어야만 하는 것이다.

find와 find_all의 사용예시

find("태그", 태그 = 태그이름)

find_all("태그", 태그 = 태그이름)

import requests 
from bs4 import BeautifulSoup

url = "https://weworkremotely.com/categories/remote-full-stack-programming-jobs#job-listings"

response = requests.get(url)

# print(response.status_code) : 해당 페이지의 접속상태를 알려줌

# print(response.content) #html source
soup = BeautifulSoup(response.content,"html.parser",)


#class를 찾을 때는 언더 바 (_) 사용. 왜냐하면 파이썬에서 class는 예약어 역할을 하기 때문이다.
#find_all은 list형태로 반환해줌.
jobs = soup.find("section",class_ = "jobs").find_all("li")[1:-1]

for job in jobs:
	title = job.find("span", class_= "title").text
	company, position = job.find_all("span", class_="company")
	company = company.text
	position = position.text
	print(title, company, position)	


# list 안에 있는 데이터를 추출하는 방법
'''
letters = ['a','b','c']
a, b, c = letters

'''

저작자표시

'Python' 카테고리의 다른 글

[Python] 스크래핑 배우기 / Beautifulsoup4 / requests (0)	2024.01.22
[Python] classes (2)	2024.01.19
[Python] requests (0)	2024.01.16
[Python] 딕셔너리(dictionary) (0)	2024.01.16
[Python] data structure ( 데이터 구조 ) (2)	2024.01.16