In this post we explore the Instruct Llama2 created by Sridykhan [1] who altered Karpathy’s Baby Llama [2] model to “follow instructions and write tiny stories accordingly.” This new Instruct Llama2 model uses the same model and dataset as the original used by Karpathy. However, with Instruct Llama2, the model is trained with different inputs …
Understanding Baby Llama2 Training – A Visual Design Walkthrough
In this post we explore how to train the Llama2 model using the Baby Llama2 created by Andrej Karpathy [1] which is based on his original minGPT model [2] and has the same basic transformer architecture that other generative AI models use, such as ChatGPT [3]. Generative transformer models employ a stack of transformer blocks …
Continue reading “Understanding Baby Llama2 Training – A Visual Design Walkthrough”
Understanding Llama2.c And ChatGPT Inferencing – A Visual Design Walkthrough
ChatGPT is an amazing technology that has taken the world by storm. Under the hood, a highly trained large language model (LLM) creates the response to each query sent to the service. In February 2023 Meta released the open-source Llama2 LLM and on September 29, 2023, Meta released an open-source Llama2-Long LLM [1] which appears …
Continue reading “Understanding Llama2.c And ChatGPT Inferencing – A Visual Design Walkthrough”
Understanding LLM Fine Tuning with Low-Rank Adaptation (LoRA)
In this blog post we discuss a MyCaffe implementation design of the paper “LoRA: Low-Rank Adaptation of Large Language Models” by Hu et al. [1] and describe how LoRA helps leverage the knowledge of the trained LLM to solve new specific problems in an efficient manner through fine-tuning. LLMs are immensely powerful but are created at …
Continue reading “Understanding LLM Fine Tuning with Low-Rank Adaptation (LoRA)”
Using Synthetic Data (change points) to enhance the Momentum Transformer for High(er) Sharpe Ratios
In this post we describe a method of calculating change points using gaussian processes as described in the paper “Slow Momentum with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint Detection” by Wood et. al. [1], published in 2021. In addition, we show how the change point synthetic data enhances the Momentum Transformer …
Understanding TFT Momentum Rebalancing for High Sharpe Ratios
In this post we describe the Temporal Fusion Transformer based Momentum Rebalancing Transformer described in the paper, “Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture” by Wood et. al. [1], published in 2022. The original code analyzed can be found on GitHub at [2]. Time-series momentum (TSMOM) strategies such as ‘buying the winners and …
Continue reading “Understanding TFT Momentum Rebalancing for High Sharpe Ratios”
Understanding Adaptive LSTM-Autoencoder Change Point Detection
As discussed in our previous post, Change Point Detection (CPD) is an important part of time-series analysis used in numerous fields such as Medicine, Aerospace, Finance, Business, Metrology and Entertainment. In this post, we expand our analysis of an adaptive, online algorithm for Change Point Detection based on the paper “Memory-free Online Change-point Detection: A …
Continue reading “Understanding Adaptive LSTM-Autoencoder Change Point Detection”
Understanding Contrastive Change Point Detection
Change Point Detection (CPD) is an important field of time-series analysis that provides methods of detecting changes in mean, variance, and distribution structure within time-series data. It has many applications in different fields, such as: Medicine: Change point detection can help monitor the health condition of patients, detect anomalies in vital signs, diagnose diseases, and …
Continue reading “Understanding Contrastive Change Point Detection”
Understanding the PatchTST Model for Time Series Prediction
In this blog post, we evaluate from a programmer’s perspective, the PatchTST model described in “A Time Series is Worth 64 Words: Long-term Forecasting with Transformers” by Nie, et. al., 2022. The PatchTST is a transformer-based model for multivariate time-series prediction that separates the input data into ‘patches’ that are then fed into a standard …
Continue reading “Understanding the PatchTST Model for Time Series Prediction”
Understanding FSNets Learning Fast and Slow for Online Time Series Forecasting
In this blog post, we evaluate from a programmer’s perspective, the FSNet described in “Learning Fast and Slow for Online Time Series Forecasting” by Pham et. al., 2022.[1] The authors of FSNet describe the model as inspired by “Complementary Learning Systems (CLS) theory” to provide “a novel framework to address the challenges of online forecasting” …
Continue reading “Understanding FSNets Learning Fast and Slow for Online Time Series Forecasting”