CS491‑2 Senior Project Design Logbook

Week of October 1, 2024

We officially began the project this week. Although preliminary discussions took place over the summer, this marked the start of structured work. I reviewed several broad domains and reaffirmed our group’s interest in applying AI techniques to financial data. No meeting was held with our instructor this week as we were still formulating the initial concept.

Week of October 7, 2024

This week, we had our first formal meeting with Prof. Altay. We presented our preliminary idea and received encouraging feedback. The discussion helped clarify expectations and scope. I began researching financial reporting standards and time series prediction methods to assess the technical feasibility of our concept.

Week of October 14, 2024

We continued to develop the project’s structure and objectives. I focused on understanding SEC filings (10-K, 8-K reports) and explored the potential of extracting value from unstructured financial documents. Additionally, we looked at existing sentiment analysis models and whether they could be applied to financial narratives.

Week of October 21, 2024

Our weekly meeting with Prof. Altay led us to refine our research question and narrow our scope. I investigated previous academic work on financial prediction using NLP and deep learning. We also discussed various model architectures we could test later on, including RNNs and transformers.

Week of November 1, 2024

As November began, we completed our initial literature review. The focus was on aligning our approach with realistic data access and processing constraints. I contributed ideas for dataset design and potential evaluation metrics for financial prediction accuracy. Overall, the project direction became more defined as we transitioned into implementation planning. We finalized our project roadmap and assigned initial responsibilities within the group. I was tasked with looking into the feasibility of accessing financial documents via the SEC’s EDGAR database. Our instructor emphasized setting measurable milestones during this phase.

Week of November 4, 2024

In our Wednesday meeting with Prof. Altay, we presented our technical setup plans. I started exploring LLM models for sentiment analysis as part of our project. We also discussed beginning with a baseline model before implementing more advanced ones.

Week of November 11, 2024

This week, I initiated tests of Finbert LLM model. I faced some issues with gathering inline texts of each quarter report because they do not provide it from API so we had to fetch it from web with HTML scraping.

Week of November 18, 2024

I began working on frontend planning for the product website that would showcase our results. I also contributed to outlining the different pages, including a company profile page and a comparison analysis section. The week’s meeting focused on balancing frontend development with modeling progress. I continued refining the dataset pipeline and finalized the structure for storing downloaded reports.

Week of December 1, 2024

This week, I completed the initial setup for the project’s frontend. We created the project repository, configured the folder structure, and implemented routing. The group divided responsibilities for individual pages. During our meeting with Prof. Altay, we showcased the routing demo and received feedback to keep the UI lightweight and focused.

Week of December 8, 2024

I developed the “About Us” and basic layout pages. At the same time, I supported the team in refining the structured dataset based on the previously downloaded reports. We faced some difficulty with inconsistent company naming conventions, which we planned to address through a name normalization function.

Week of December 15, 2024

I wrapped up the initial development of the profile and main pages. Meanwhile, our team finalized the structured dataset that would be used for modeling. Our meeting with Prof. Altay included a short demo of the frontend progress, and we agreed to shift attention to model development starting in January.

Week of December 22, 2024

This week was a major milestone for me personally. I completed the entire frontend of the project, including all pages routing, authentication, company profile, comparison page, and additional small sections. It was a tight timeline, but I managed to deliver a fully functioning prototype in time for our live demo. The demo was successful and received positive feedback. On the data side, our team finalized and stored cleaned versions of the structured and unstructured datasets. With the UI and data infrastructure in place, we were ready to move on to modeling in January.

Week of December 23, 2024

Established project team and defined individual responsibilities.
Researched NLP techniques for financial sentiment analysis.
Collected initial dataset of financial reports for model training.
Set up development environment for machine learning experiments.

Week of December 30, 2024

Created data preprocessing pipeline for financial text analysis.
Implemented baseline sentiment analysis model using BERT.
Evaluated performance metrics for sentiment classification accuracy.
Developed data visualization tools for sentiment analysis results.

Week of January 1, 2025

January was relatively quiet due to the semester break. Most team members had returned to their hometowns, and we couldn’t maintain our regular meetings or collaborative momentum. The general exhaustion from the fall semester was noticeable, and everyone took some time to rest and focus on individual tasks. During this period, I continued to monitor and refine the frontend, ensuring everything remained functional and up-to-date. I also revisited our dataset to validate its completeness and structure. Despite our initial plan to begin modeling this month, we fell slightly behind schedule due to the lack of group coordination. Although progress was slower than expected, this downtime also gave us a chance to recharge. We expect February to be a more productive month as the team regroups with renewed motivation.

Week of February 1, 2025

We resumed active project work at the beginning of the new semester. The team reconvened and started realigning with our original timeline. I shifted my full focus to working with our dataset. Since the frontend was already near completion by the end of the previous term, I did not make any new changes to it.

Week of February 8, 2025

This week, I spent considerable time cleaning and organizing the dataset. I expanded it to include more companies and financial quarters, ensuring consistency across records. Our Wednesday meeting focused on progress updates, and I shared early insights from the expanded data.

Week of February 15, 2025

With the dataset finalized, I began preparing the environment to train our first predictive model. I experimented with different configurations and preprocessing pipelines. There were minor challenges with aligning structured data inputs, but they were resolved with a few adjustments.

Week of February 22, 2025

I successfully ran our initial model on the refined dataset. This was a critical milestone, as we could finally start evaluating prediction performance. Our meeting this week centered on model results and deciding how to improve accuracy. I began drafting a plan for tuning and comparing models in March.

Week of March 1, 2025

Following February’s initial modeling phase, we began analyzing our model’s performance. Unfortunately, the prediction error was higher than expected. During this week’s meeting, we decided to prioritize parameter optimization and model tuning. I began working with LSTM and GRU architectures to assess their ability to better capture temporal patterns in financial data.

Week of March 8, 2025

My focus this week was experimenting with different configurations of LSTM networks, including layer depth and sequence length. In parallel, Emre worked on regression-based models such as Random Forest and a heuristic-based method. Our team explored how different algorithms might complement each other or be used in an ensemble.

Week of March 15, 2025

I shifted to tuning GRU models and began comparing their performance with LSTMs using validation datasets. I also evaluated the impact of different activation functions and optimizers. Emre shared results from Random Forest and heuristic models, and we discussed merging insights for improved interpretability.

Week of March 22, 2025

By the end of the month, I had a clearer understanding of the strengths and weaknesses of LSTM vs. GRU in our context. We documented each model’s metrics and began thinking about presenting these findings in our final report. Our Wednesday meeting focused on refining model evaluation criteria and setting priorities for April.

Week of April 1, 2025

At the start of April, we decided to focus on the Random Forest model, which had shown the most promising performance among all the models tested so far. Emre and I worked closely to improve its predictive accuracy. Meanwhile, other team members concentrated on polishing the frontend and completing the backend infrastructure.

Week of April 8, 2025

To enhance our model’s input quality, I began collecting 8-K reports—press releases filed by companies that often contain timely, high-impact financial information. Compared to quarterly reports, these were far more numerous. Gathering and organizing them took nearly a full week, as I had to handle large volumes and ensure consistency.

Week of April 15, 2025

After completing the 8-K collection, I manually reviewed their contents and assigned sentiment-based scores to each document. These scores were designed to quantify the tone and potential financial implications of each report. I then carefully integrated these new features into our main dataset, aligning them by company and reporting date.

Week of April 22, 2025

With the enriched dataset, I retrained the Random Forest model. The new input features significantly improved our model’s performance, as reflected in a notable decrease in Mean Absolute Error (MAE). Our Wednesday meeting focused on reviewing these results and planning how to present the impact of 8-K integration in our final evaluation.

Week of April 28, 2025

Finalized machine learning model deployment and integration.
Documented model architecture and training methodology.
Created explainability components for sentiment analysis results.
Participated in final project presentation and demonstration.