Web scraping financial data is done widely around the globe today. With Python being a versatile language, it can be used for a wide variety of tasks, including web scraping.
We’ll scrape Yahoo Finance from the web using the Python programming language.
Web scraping is divided into two parts:
- Fetching data by making an HTTP request. We’ll be using the requests library to make a GET request to the website that we want to scrape.
- Extracting important data by parsing the HTML DOM. We’ll be using the BeautifulSoup library to parse the HTML document that we get back from the website.
def ExtractField (sHtml, fieldName):
it = sHtml.select_one(f'fin-streamer[data-field="{fieldName}"]')
return it['data-value']
def ExtractValueByLabel (sHtml, labelName):
label_pattern = re.compile(labelName)
oLabel = sHtml.find('span', class_='label', string=label_pattern)
if oLabel:
return oLabel.find_next('span', class_='value').text.strip()
else:
return "N/A"
Use BeautifulSoup to parse HTML.
def ParseStockData(sHtml, oQuote):
oQuote['PreviousClose'] = ExtractField(sHtml, 'regularMarketPreviousClose')
oQuote['Open'] = ExtractField(sHtml, 'regularMarketOpen')
sRange = ExtractField(sHtml, 'regularMarketDayRange')
aRange = sRange.split(' - ')
oQuote['Low'] = aRange[0]
oQuote['High'] = aRange[1]
sDividendYield = ExtractValueByLabel(sHtml, 'Forward Dividend')
#sDividendYield = '6.64 (4.91%)'
pattern = '[(]([0-9.]+)%[)]'
match = re.search(pattern, sDividendYield)
if match != None:
oQuote['Yield'] = match.group(1)
dt = ExtractValueByLabel(sHtml, 'Ex-Dividend Date')
if (dt != 'N/A'):
oQuote['ExDividendDate'] = datetime.strptime(dt, "%b %d, %Y").strftime('%Y-%m-%d')
Make a GET request to the target URL to get the raw HTML data.
def GetStockData(symbol, oQuote):
url = 'https://finance.yahoo.com/quote/' + symbol
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
page = requests.get(url, headers=headers)
#print(page.text)
soup = BeautifulSoup(page.text, 'html.parser')
with open('C:/Export/soup.html', 'wb') as file:
file.write(soup.prettify('utf-8'))
# Find the specific div tag
sHtml = soup.find('div', {'data-testid' : 'quote-statistics'})
ParseStockData(sHtml, oQuote)
Get an array of Python objects containing the financial data of the company Nvidia.
symbol = 'NVDA'
oQuote = {}
oQuote['Symbol'] = symbol
GetStockData(symbol, oQuote)
print(oQuote)