Python and LLM for Market Analysis — Part III — Allow your trading System to react for Daily News

If you’re a day trader, swing trader, or an active follower of business news, you’re likely aware of the significant impact that news can have on the markets — sometimes it’s exhilarating to ride those movements, and other times it’s frustrating if we miss the right moment to act.

While Fundamental Analysis provides a solid foundation for strategic decision-making, combining it with Technical Analysis enhances our approach. Taking it a step further, integrating current market conditions, particularly news, can provide a more comprehensive perspective. This article explores the integration of these factors into our trading decisions.

For those of us who engage in part-time trading, manually sifting through multiple sources of business news, gauging market sentiment, and making daily decisions can be challenging. While this manual approach offers valuable learning experiences, there’s a need for scalability. This is where we turn to the power of Python, LLM models, and data science to streamline the process. In this article, we’ll delve deep into this fascinating intersection.

Extract Latest news articles list from news aggregator APIs.Reach out to each news source and scrape complete articles.Summarize the article using LLM.Perform Sentiment Analysis using Finance LLM models (Bloomberg, PaLM Finance model)Combine them with Technical indicators and build a recommender system.

First things First!

Are Language Models (LLMs) truly finance experts when provided with the right data? The ongoing research in this field suggests that they are not there yet. While there are some finetuned models showing promise, their reliability is not absolute. A noteworthy research paper shedding light on this is available at https://arxiv.org/pdf/2310.12664.pdf.

Now, the question arises: Why should we consider using LLMs for finance? In my perspective, the potential lies in the future. As time progresses, we anticipate the development of highly sophisticated models that can offer expert insights in finance. As part of this series, we’ll take a closer look at this concept. Towards the conclusion, we’ll embark on the journey of finetuning our own Finance LLM model using a dataset comprising 50,000 business news articles. This hands-on approach will showcase how a finetuned model can outperform generic models.

In the realm of financial markets, predicting reactions can be exceptionally challenging. For instance, positive results announcements may lead to profit bookings and a subsequent price fall. Conversely, a negative result, if less negative than expected, might cause a price surge. The scenarios are diverse and complex. Only models specifically trained for such nuanced cases can make more accurate judgment calls in these situations.

So what are we building?

Take a deep look at below. Our final file structure looks like below.

News Aggregator API

first we need a way to collect all the daily news at one place. This is where newsapi.org comes in handy. we are particularly interested in their business news. Fortunately newsapi has both REST APIs and python library that we can install and use. Lets install this library using

pip3 install newsapi-python

Register your account and get their API Key. While the service costs for a business usage, their developer version has limited features and is free to use.

Let’s create a simple python class that extracts news articles.

#news_api.py
class NewsApi:
def __init__(self):
self.token = os.getenv(“NEWS_API_TOKEN”)
self.country = ‘in’
self.category=’business’
self.csv_location = ‘./news/’
self.filename = f’news_{date.today()}.csv’
self.newsapi = NewsApiClient(api_key=self.token) def extract_news(self):
page = 1
total_news_count = 0
all_articles = [] try:
while True:
top_headlines = self.newsapi.get_top_headlines(
country=self.country,
language=’en’,
category=self.category,
page=page
)
if top_headlines.get(‘status’)==’ok’:
for row in top_headlines.get(“articles”,None):
article = [row[‘publishedAt’],row[‘title’],row[‘description’],row[‘url’]] all_articles.append(article)
page+=1

if len(all_articles)>=top_headlines[‘totalResults’]:
break
self.all_articles = pd.DataFrame(all_articles,columns=[‘Date’,’Title’,’Description’,’URL’])
self.all_articles.to_csv(f'{self.csv_location}{self.filename}’)
except Exception as e:
print(e)

Long story short, we start with page 1, extract all the news in a page and go to next page. we collect the news and store it in all_articles list variable. Finally we store it in a csv file. while the API results have few other columns, we are only interested in Date, Title, Description and the URL.

Extract Complete News Articles

When exploring newsapi.org, you might observe that news articles are truncated beyond a certain character limit. However, a saving grace is that the link to the original source is provided. This presents a crucial need to extract the complete news content. By doing so, we empower Large Language Models (LLMs) to read the entire news, enabling them to make more informed judgments on the sentiment of the news.

In our quest to harness the full potential of LLMs, the extraction of complete news articles becomes a pivotal step. This not only ensures a comprehensive understanding of the content but also allows for a more accurate analysis of sentiments, ultimately enhancing the efficacy of our trading decisions.

Is it okay to scrape news articles from the website?The answer is it depends on the website’s terms and conditions. But most of the news articles can be scraped since all the information is intentionally made available to the public. All public data can be scraped as long as your scraping method does not harm the data or website owner.

There are already really good medium articles written about scrapping news sites. I will provide the links below.

https://dorianlazar.medium.com/scraping-medium-with-python-beautiful-soup-3314f898bbf5https://realpython.com/beautiful-soup-web-scraper-python/https://medium.com/@poojan.s/stock-market-news-scraper-using-beautifulsoup-bc7db5c75f99Note that each news site is different and we will need to scrape differently for different source. This is a classic use case for FactoryPattern.

Although I have skipped writing about news scraping, here is the my completed code for scrapping from different news articles. Watch out my github as I actively keep adding more news scraping at every update.

Summarization and Sentiment Analysis with LLM

While I’m personally using a PaLM2 for summarization and BloombergGPT from Huggingface Hub that I have fine tuned with a decent business news dataset for getting news sentiment, in this article I will be using PaLM2 from Google for both the purposes. with recent talks on Gemini, the best known multimodal built by Google, PaLM2 is definitely worth a try!

Note: This article is intended to show technical possibility of using LLM for helping trading decisions. Readers are requested to perform thorough testing and perform due diligence before using it for making realtime decisions.

Let’s start by building a simple PaLM2 interface, that takes the user query and extract results from LLM through Rest API. next we will build a simple prompt that can help us extract the data in the way we want.

First we will need the API key. Follow this link and get one. https://developers.generativeai.google/products/palmMake sure to store the API KEY in .env file as PALM2_API_TOKEN

Installing palm2 client is quite simple

pip3 install google-generativeai#palm_interface.py
class PalmInterface:
def prompt(self,query):
try:
payload = self.build_input(query)
defaults = { ‘model’: ‘models/text-bison-001’ }
response = palm.generate_text(**defaults, prompt=payload)
json_response = json.loads(response.result)
return json_response
except Exception as e:
print(e)
return None

Time for some prompt engineering…

def summarize(self,full_news):
template3 = f”You excel in succinctly summarizing business and finance-related news articles.
Upon receiving a news article, your objective is to craft a concise and accurate summary while
retaining the name of the company mentioned in the original article. The essence of the article
should be preserved in your summary. A job well done in summarizing may earn you a generous tip.
Please proceed with the provided full news article. {full_news}”
try:
defaults = { ‘model’: ‘models/text-bison-001’ }
response = palm.generate_text(**defaults, prompt=template3)
return response.result
except Exception as e:
print(e)
return None

Prompt Engineering…If this term is hitting your ears for the first time, think of Prompting as the Yoda of manifestation. The clearer your requests to the universe, paired with an unwavering belief, the more likely your desires will land in your lap. :-) But hey, it’s not just that — let’s take a breather, digest this PJ, and then promptly forget it.

This time a few-shot prompting…

def build_input(self,article):
article_1 = “Tech giant Apple Inc. reports record-breaking quarterly earnings, surpassing market expectations and driving stock prices to new highs. Investors express optimism for the company’s future prospects.”
article_2 = “Alphabet Inc., the parent company of Google, faces a setback as regulatory concerns lead to a sharp decline in share prices. The market reacts negatively to uncertainties surrounding the company’s antitrust issues.”
prompt = f”””
Few-shot prompt:
Task: Analyze the impact of the news on stock prices.
Instructions: As a seasoned finance expert specializing in the Indian stock market, you possess a keen understanding
of how news articles can influence market dynamics. In this task, you will be provided with a news article
or analysis. Upon thoroughly reading the article, if it contains specific information about a company’s
stock, please provide the associated Stock Symbol (NSE or BSE Symbol), the Name of the stock, and the
anticipated Impact of the news.The Impact value should range between -1.0 and 1.0, with -1.0 signifying
highly negative news likely to cause a significant decline in the stock price in the coming days/weeks,
and +1.0 representing highly positive news likely to lead to a surge in share price in the next few days/weeks.
Your response must be strictly in the JSON format.Consider the following factors while determining the impact:
The magnitude of the news, The sentiment of the news,Market conditions at the date of the news, Liquidity
of the stock, The sector in which the company operates, The JSON response should include the keys: symbol,
name, and impact. Do not consider indices such as NIFTY. If the news is not related to the stock market or any
specific company, leave the values blank. Do not invent values; maintain accuracy and integrity in your response.
Examples:
1. Article: “{article_1}”
Response: {{“symbol”: “AAPL”, “name”: “Apple Inc.”, “impact”: 0.9}}

2. Article: “{article_2}”
Response: {{“symbol”: “GOOGL”, “name”: “Alphabet Inc.”, “impact”: -0.5}}

3. Article: “{article}”
Response:
“””
return prompt

well there is definitely better prompting techniques available than the one above.

A good read for prompting techniques — , check out these and modify as needed.

https://developers.generativeai.google/guide/prompt_best_practices

Some examples to get inspired from https://developers.generativeai.google/prompt-gallery

Develop Trading Strategy

So far we have seen things in bits and pieces. It’s time to connect them together.

we are going to develop a very simple strategy using a news sentiment and RSI.

Strong Buy — If a news has positive sentiment and current RSI < 35 (over sold zone)

Buy — if a news has a positive sentiment

Sell — if a news has a negative sentiment

Strong Sell — If a news has a negative sentiment and Current RSI > 75 (Over bought zone)

The strategy is pretty basic and shouldn’t be used as is, the intent of this post is to combine news sentiment and technical indicators and not to recommend a strategy itself. readers are requested to develop their own proven strategy and back test it. Sometimes a bad news on a fundamentally strong companies will give us opportunity to buy. negative sentiments can sometimes become buying opportunities too.

Let’s put this strategy in action. A simple class with 4 categories.

#news_tech_trader.py
from news_api import news
import pandas as pd
from palm_interface import palm_interface
from yahoo_finance import yfi
import os
from datetime import date
from news_scrapper import factory
class NewsTrader:
def __init__(self):
self.strong_buy = [] self.buy = [] self.strong_sell = [] self.sell = []

The imports might confuse you a bit. don’t worry. take a look at the github and it will look fine.

A helper method to obtain RSI for a given Symbol.

def get_rsi(self, symbol):
row = yfi.download_data(symbol,’./data/’)
print(row)
if row is None:
return 50.0
else:
return row[‘RSI14’] if row[‘RSI14’] else 50.0

Since we have already discussed about download historical data, calculating the RSI etc in the earlier part of our series, we will skip that topic in this article. Refer to github for how this is done. Also note how we are defaulting the RSI to 50 when RSI is not available. we are setting it to a neurtral value making it neither over sold nor over bought.

Now connecting the pieces together…

def run(self):

# PART – I – Check below explanation
results_list = [] extracted_news_file = f’./data/news/{date.today()}.csv’
df = pd.DataFrame()
if os.path.exists(extracted_news_file):
df = pd.read_csv(extracted_news_file)
else:
extracted_news_file = news.extract_news()
df = pd.read_csv(extracted_news_file)

# PART – II – Check below explanation
if ‘symbol’ not in df:
for index,row in df.iterrows():
text = factory.create_and_scrape(row[‘URL’])
if text is None or len(text)<10:
print(‘scrape not successful!’)
text = str(row[‘Title’]) + ‘ ‘ + str(row[‘Description’])
text = palm_interface.summarize(text)
data = palm_interface.prompt(text)
results_list.append({
‘symbol’: data[‘symbol’] if data else “”,
‘name’: data[‘name’] if data else “”,
‘impact’: data[‘impact’] if data else 0.0
})
df2 = pd.DataFrame(results_list)
df = pd.concat([df, df2], axis=1)
df.to_csv(extracted_news_file, index=False)

# PART – III, IV – Check below explanation
for index, row in df.iterrows():
if not pd.isna(row[‘symbol’]) and len(row[‘symbol’])>0:
rsi = self.get_rsi(row[‘symbol’])
if row[‘impact’] > 0.4:
if rsi <=35:
self.strong_buy.append(row[‘symbol’])
else:
self.buy.append(row[‘symbol’])
elif row[‘impact’] < -0.4:
if rsi >=75:
self.strong_sell.append(row[‘symbol’])
else:
self.sell.append(row[‘symbol’])

Don’t get overwhelmed by long lines of code. its actually easier than how it looks. here is whats happening..

Read the news CSV file if it exists. if not download the news articles in a csv first.Iterate through each article, Scrape the full article as needed and inference the palm API. concat the inference results with the news df itself. store it back to the same fileiterate through each article again. get the RSI value of each stock by calling get_rsi.Depending on the category it falls, store in a list. For some of the articles the Symbol may not be available or incorrect. Its just fine for now, in one of the coming articles we will be looking at how to get rid of such issues by using ElasticSearch and ensure the extraction results from LLM are proper.

Complete code is available on Github. Please star or fork the project if you like.

Conclusion

One of my personal favorite quote goes like

Great investment opportunities come around when excellent companies are surrounded by unusual circumstances that cause the stock to be misappraised.~ Warren Buffett

And what better way to understand a company’s current circumstances than through its latest news?

Our strategy implementation here is refreshingly simple, yet the real excitement unfolds when we blend the power of news and technical analysis with fundamental insights. In the upcoming articles, we’ll leverage the Screener API to add a layer of complexity in real-time. For instance, consider the lasting impact of news such as positive revenue or an increase in the order book — it often extends beyond a single day in the market. Conversely, negative news can influence stock performance for weeks. So, why limit ourselves to just one day of news? In an engaging exercise for our readers, we encourage exploring multiple days of news for each company, averaging sentiment, combine the volume and experiment to find that sweet spot. Trust me, it’s a fascinating journey.

In the next set of articles, we’ll delve into fine-tuning Language Models (LLMs) with historical news data, aligning them with actual price movements and considering macroeconomic conditions. We’ll explore the process of publishing and testing the model to assess its performance. Additionally, we’ll venture into scraping Twitter feeds for listed companies, integrating essential fundamental data such as Debt-to-Equity ratio (D/E), Return on Equity (ROE), Return on Capital Employed (ROCE), Price-to-Earnings ratio (P/E), and more. To enhance the recommendation system further, we’ll incorporate volume movement analysis. Get ready for an exciting journey as we build a more valuable and comprehensive recommendation system together.

Thanks for reading it this far. If you find this post useful, please leave a clap or two, or if you have suggestions or feedbacks, please feel free to comment, It would mean a lot to me!

Incase of queries or details, Please feel free to connect with me on LinkedIn or X(formerly twitter).

Python and LLM for Market Analysis — Part III — Allow your trading system to react for daily news was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

​ Level Up Coding – Medium

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner
Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.