← Back to Home

CS491‑2 Senior Project Design Logbook

Emre Akgül
September 16, 2024 – May 2, 2025

Week of October 1, 2024

Started initial research on financial forecasting and reviewed related literature.
Began exploring SEC EDGAR API for fetching financial datasets.
Participated in project meeting with Atakan Erdem.
Attended seminar: "Introduction to CS491-2 Senior Design Project" by Selim Aksoy.

Week of October 7, 2024

Encountered problematic issues with SEC EDGAR API due to inconsistent data and documentation.
Started searching for alternative sources and datasets.
Participated in weekly meeting with Prof. Altay Güvenir.
Attended seminar: "Atakan Erdem and Mert Bıçakçı Introduction" by Atakan Erdem and Mert Bıçakçı.

Week of October 14, 2024

Discovered a Kaggle dataset covering NASDAQ financial data from 2008–2023.
Performed initial analysis to assess data completeness and quality.
Participated in weekly meeting with Prof. Altay Güvenir.
Attended seminar: "Innovation Life Cycle" by Atakan Erdem.

Week of October 21, 2024

Identified major data gaps and inconsistencies in Kaggle dataset, traced back to original SEC EDGAR API issues.
Began extensive data cleaning and preprocessing.
Attended seminar: "The Role of Documentation in (OO) Software Development" by Uğur Doğrusöz.

Week of November 1, 2024

Continued intensive data cleaning and validation to create a robust dataset for net income loss predictions.
Participated in weekly meeting with Prof. Altay Güvenir.
Attended seminar: "La Chanson de Shannon: Shannon's Song" by Fazlı Can.

Week of November 4, 2024

Successfully established a somewhat reliable dataset structure for net income prediction.
Started investigating Kaggle competitions related to financial forecasting.

Week of November 11, 2024

Joined a Kaggle competition focused on financial predictions to gain insights into effective modeling techniques.
Experimented with initial feature engineering approaches.
Participated in weekly meeting with Prof. Altay Güvenir.
Attended seminar: "Tools and Processes in Software Development Lifecycle" by Murat Ergun.

Week of November 18, 2024

Explored various tabular data models including Random Forest, XGBoost, LightGBM, and CatBoost.
Evaluated their preliminary performance on financial forecasting tasks.
Attended seminar: "Technology Entrepreneurship and Investment Ecosystem" by Numan Numan.

Week of December 1, 2024

Started studying multimodal neural network architectures for potential integration of textual data from reports with tabular data.
Investigated relevant literature and existing implementations.
Participated in weekly meeting with Prof. Altay Güvenir.
Attended seminar: "AI-Driven Mobile Apps: Building Profitable Solutions on a Budget" by Melih Gurgah.

Week of December 8, 2024

Explored text-tabular and image-tabular multimodal architectures, focusing particularly on text-tabular designs.
Began drafting initial architecture designs suitable for our data.
Attended seminar: "Computational Law" by Dilek Küçük.

Week of December 15, 2024

Implemented a basic multimodal text-tabular neural network.
Identified issues with handling multi-chunk report data.
Participated in weekly meeting with Prof. Altay Güvenir.
Attended seminar: "Bitcoin's First ZK Rollup" by Murat Karademir and Ömer Talip Akalın.

Week of December 22, 2024

Revised dataset structure and prepared documentation for smoother onboarding and future work.
Participated in project retrospectives and planning sessions for upcoming semester.
Participated in weekly meeting with Prof. Altay Güvenir and project meeting with Atakan Erdem; attended senior design seminar.

Week of December 30, 2024

Finalized semester work; conducted dataset quality assurance checks.
Prepared initial presentations and reports summarizing semester progress.
Participated in weekly meeting with Prof. Altay Güvenir and project meeting with Atakan Erdem; attended senior design seminar.

Week of January 1, 2025

It was final period + spring break, we take a break on project for this period.

Week of February 1, 2025

Realized dataset was insufficiently rich and complex for optimal modeling performance; began considering strategies for dataset enhancement.
Continued standard machine learning work to better understand model limitations.
Participated in weekly meeting with Prof. Altay Güvenir.

Week of February 8, 2025

Began collaborating with Alara to improve data acquisition using SEC EDGAR API, significantly enriching the dataset.
Downloaded financial reports from 2001 to 2025 to improve dataset quality.
Participated in weekly meeting with Prof. Altay Güvenir.

Week of February 15, 2025

Started part-time role as an LLM researcher, limiting project progress temporarily due to simultaneous work commitments.

Week of February 22, 2025

Continued balancing new role responsibilities with minimal project work.

Week of March 1, 2025

Began integrating LLM and chunking strategies into the data pipeline to extract structured financial data directly from textual reports.
Participated in weekly meeting with Prof. Altay Güvenir.

Week of March 8, 2025

Successfully enriched dataset with chunked text data, extending dataset coverage back to 2001.
Encountered challenges with missing data points, began manual correction and data filling process.

Week of March 15, 2025

Completed painful manual dataset enrichment; initiated training with updated dataset.
Noted performance improvement in predictive models over baseline heuristics (latest quarter, latest year).
Participated in weekly meeting with Prof. Altay Güvenir.

Week of March 22, 2025

Implemented extensive hyperparameter optimization for predictive models, faced difficulties due to financial data characteristics.
Experimented extensively with various validation schemas to find an effective evaluation strategy.

Week of April 1, 2025

Selected validation strategy of quarterly training with validation on 2023 data, showing good predictive correlation with 2024.
Participated in weekly meeting with Prof. Altay Güvenir.

Week of April 8, 2025

Evaluated models: Random Forest, XGBoost, LightGBM, and CatBoost. Found Random Forest consistently outperformed others.
Identified overfitting issues with boosting methods despite high regularization.

Week of April 15, 2025

Finalized Random Forest model achieving a 15% better mean absolute error (net income) and 30% better RMSE (earnings per share) compared to baselines.
Participated in weekly meeting with Prof. Altay Güvenir.

Week of April 22, 2025

Prepared final model documentation and started compiling results for project presentation.

Week of April 28, 2025

Reviewed final model and performance metrics, ensuring readiness for final presentation and report submission.
Participated in final project review meeting with Prof. Altay Güvenir.