Web scraping financial data is done widely around the globe today. With Python being a versatile language, it can be used for a wide variety of tasks, including web scraping.
We’ll scrape Yahoo Finance from the web using the Python programming language.
Web scraping is divided into two parts:
- Fetching data by making an HTTP request. We’ll be using the requests library to make a GET request to the website that we want to scrape.
- Extracting important data by parsing the HTML DOM. We’ll be using the BeautifulSoup library to parse the HTML document that we get back from the website.
def ExtractField (sHtml, fieldName): it = sHtml.select_one(f'fin-streamer[data-field="{fieldName}"]') return it['data-value']
def ExtractValueByLabel (sHtml, labelName): label_pattern = re.compile(labelName) oLabel = sHtml.find('span', class_='label', string=label_pattern) if oLabel: return oLabel.find_next('span', class_='value').text.strip() else: return "N/A"
Use BeautifulSoup to parse HTML.
def ParseStockData(sHtml, oQuote): oQuote['PreviousClose'] = ExtractField(sHtml, 'regularMarketPreviousClose') oQuote['Open'] = ExtractField(sHtml, 'regularMarketOpen') sRange = ExtractField(sHtml, 'regularMarketDayRange') aRange = sRange.split(' - ') oQuote['Low'] = aRange[0] oQuote['High'] = aRange[1] sDividendYield = ExtractValueByLabel(sHtml, 'Forward Dividend') #sDividendYield = '6.64 (4.91%)' pattern = '[(]([0-9.]+)%[)]' match = re.search(pattern, sDividendYield) if match != None: oQuote['Yield'] = match.group(1) dt = ExtractValueByLabel(sHtml, 'Ex-Dividend Date') if (dt != 'N/A'): oQuote['ExDividendDate'] = datetime.strptime(dt, "%b %d, %Y").strftime('%Y-%m-%d')
Make a GET request to the target URL to get the raw HTML data.
def GetStockData(symbol, oQuote): url = 'https://finance.yahoo.com/quote/' + symbol headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} page = requests.get(url, headers=headers) #print(page.text) soup = BeautifulSoup(page.text, 'html.parser') with open('C:/Export/soup.html', 'wb') as file: file.write(soup.prettify('utf-8')) # Find the specific div tag sHtml = soup.find('div', {'data-testid' : 'quote-statistics'}) ParseStockData(sHtml, oQuote)
Get an array of Python objects containing the financial data of the company Nvidia.
symbol = 'NVDA' oQuote = {} oQuote['Symbol'] = symbol GetStockData(symbol, oQuote) print(oQuote)