Learning New Skills for Fun (and Profit?)
Around 2010, I was looking for a way to learn Windows PowerShell that would apply to something personal. I decided to write an API client that would download End-of-Day stock market data and upload it to SQL Server for analysis. For incentive, I put a little bit of money into a trade account and used a strategy called mean reversion trading to try to find stocks that were ideal to flip for a profit. As I write this in 2023, it’s clear that this idea didn’t lead to my retirement but it didn’t lose money either.
Through the magic of PowerShell, I had a scheduled script that would download market data, load it into SQL Server, perform some analysis, and generate a PDF using SQL Server Reporting Services.
Studying Graphs for Hope of Profit
While it seemed like a good idea, the algorithm wasn’t quite right. Each day, it would develop a list of “candidates” that I would manually look up to see if it was worth trading. It shouldn’t have too many wild swings in price; it shouldn’t be near an earnings announcement that might make the price go down; it shouldn’t have any recent news announcements that might make it too unpredictable for future sale. I was successful in learning PowerShell, but it required more time to maintain than I had available.
A Solution in Search of a Problem
Fast forward 13 years… I have a certificate from UW Big Data Engineering, I’ve been working somewhere that uses Apache Spark, Airflow, Redshift, Kinesis, and Druid, which are all the things I recently ramped up on to round out my skills. But I didn’t set it up. Conceptually, I know how it all works, but I didn’t do it. I could get a low cost AWS or Azure account but I always have this lingering fear that I’ll leave something on and end up with a large credit card bill (or use up my monthly allotment). So… why not do this at home with used hardware?
Coming up next…
Many of the bullet points below will be posts. As I write this, the current state is:
- Rewritten in Python: Last year, I rewrote my EOD Stock downloader in Python, but the company I was using has recently become unreliable. My authentication token expires in minutes, making it impossible to download any new data from this company. I’ve canceled my account and I’m looking for a new source.
- Improved to run as an Airflow DAG: Prior to shutting down the account, I rewrote the Python solution to run in Airflow. Immediate lesson was that a 4GB RAM VM wasn’t enough to run it (necessitating a new solution). It’s also possible the async changes I made caused the solution provider to force my token to expire.
- Need a new Data Store: I no longer have a developer SQL Server box. My old algorithms are on some backup, somewhere, but it’s not worth digging out. I’m trying to decide the best homelab products for data lake, database, and processing.
- Need an architecture: The VLAN is set up; Vault is running on one box but it’s not storing/giving secrets yet; I can programmatically spin up hardware but I need to deploy services. MAAS has the ability to customize builds, so I’m setting up a template build that will run against each machine as it sets up. Will I use Docker, VMs, or a combination of solutions?
- Current Technologies of Interest: Jenkins and Docker solution, ProxMox for live VM migration, Spark Cluster.