Python Finance

Web scraping financial data is done widely around the globe today. With Python being a versatile language, it can be used for a wide variety of tasks, including web scraping.

We’ll scrape Yahoo Finance from the web using the Python programming language.

Web scraping is divided into two parts:

  1. Fetching data by making an HTTP request. We’ll be using the requests library to make a GET request to the website that we want to scrape.
  2. Extracting important data by parsing the HTML DOM. We’ll be using the BeautifulSoup library to parse the HTML document that we get back from the website.

def ExtractField (sHtml, fieldName):
    it = sHtml.select_one(f'fin-streamer[data-field="{fieldName}"]')
    return it['data-value']

def ExtractValueByLabel (sHtml, labelName):
    label_pattern = re.compile(labelName)
    oLabel = sHtml.find('span', class_='label', string=label_pattern)
    if oLabel:
        return oLabel.find_next('span', class_='value').text.strip()
    else:
        return "N/A"

Use BeautifulSoup to parse HTML.

def ParseStockData(sHtml, oQuote):
    oQuote['PreviousClose'] = ExtractField(sHtml, 'regularMarketPreviousClose')
    oQuote['Open'] = ExtractField(sHtml, 'regularMarketOpen')

    sRange = ExtractField(sHtml, 'regularMarketDayRange')
    aRange = sRange.split(' - ')
    oQuote['Low'] = aRange[0]
    oQuote['High'] = aRange[1]

    sDividendYield = ExtractValueByLabel(sHtml, 'Forward Dividend')
    #sDividendYield = '6.64 (4.91%)'
    pattern = '[(]([0-9.]+)%[)]'
    match = re.search(pattern, sDividendYield)
    if match != None:
       oQuote['Yield'] = match.group(1)

    dt = ExtractValueByLabel(sHtml, 'Ex-Dividend Date')
    if (dt != 'N/A'):
        oQuote['ExDividendDate'] = datetime.strptime(dt, "%b %d, %Y").strftime('%Y-%m-%d')

Make a GET request to the target URL to get the raw HTML data.

def GetStockData(symbol, oQuote):
    url = 'https://finance.yahoo.com/quote/' + symbol

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

    page = requests.get(url, headers=headers)
    #print(page.text)

    soup = BeautifulSoup(page.text, 'html.parser')
    with open('C:/Export/soup.html', 'wb') as file:
        file.write(soup.prettify('utf-8'))
        
    # Find the specific div tag
    sHtml = soup.find('div', {'data-testid' : 'quote-statistics'})

    ParseStockData(sHtml, oQuote)

Get an array of Python objects containing the financial data of the company Nvidia.

symbol = 'NVDA'
oQuote = {}
oQuote['Symbol'] = symbol

GetStockData(symbol, oQuote)

print(oQuote)