AI embodies a burgeoning technology that holds substantial promise in enhancing data analysis and comprehension. At the core of these large language models propelling the AI momentum are vast troves of data that shape the generated responses. Despite its ubiquity, semiconductor manufacturing often remains in the background, rendering it a more elusive field for individuals to delve into. Consequently, data in this domain can be challenging to acquire, reflecting a more closed nature of the technology.
With this backdrop, employing a large language model on semiconductor data could prove invaluable. Despite the moniker "OpenAI," popular platforms like "ChatGPT" are proprietary, offering scant data security measures. Conversely, there are emerging open-source models which can be run locally. Mistral has emerged as a key open-source model which rivals the likes of ChatGPT, with great performance and size benefits from its 7B model. However, all models are limited in their knowledge of semiconductor physics, processing, and nuances.
In this endeavor, Low-Rank Adaptation (LoRA) of a large language model (LLM) can be performed in order to tune the existing Mistral 7B model on a semiconductor dataset. LoRA fine-tuning provides a very cost-effective way to train an existing model by adapting the lower layers and keeping the higher layers frozen. This approach helps ensure that the fine-tuning provides an understanding of the new data, while also retaining base knowledge and understanding.
To derive the most insightful output, data relevant to my research areas is supplied. Consequently, I have digitized hundreds of papers alongside my annotations, as well as presentations, lectures, and notes from my undergraduate through PhD studies. I have maintained a digital archive as I believe it is immensely beneficial in today's landscape as it facilitates easy data access, ensures security, and promotes data longevity. These datasets are meticulously organized locally, where initially over 45,000 data points are generated for training.
Cabinet with some scientific papers read and annotated (left), and notebooks I have scanned in (right)
The LoRA fine-tuning is done locally on a GTX 4090 graphics card, going through several epochs, totaling over 75 hours. The adapter output from training is converted over to the versatile ‘gguf’ format with an intial size of ~14GB, and then 4-bit quantized to be only 4GB. The final model can be loaded into the system’s RAM and run on the CPU or in VRAM and run on the GPU. Where the model performs best with the parallel processing provided by a GPU, giving near instantaneous results to inquiries.
The open source Ollama software it utilized to locally run the model, along with the Ollama web-ui which provides a ‘ChatGPT’ like interface on my local network to interact with the model. The advantage of the local web-ui is that any device on my network can utilize the hardware power of a desktop computer to run the model, such that an iPhone can readily interface with the model for questions.
Desktop computer for training and running the model (left), and Ollama web-ui running the trained model (right)
Below are illustrative question and responses generated by the model.
Going forward I plan to expand the scope of the data and continue to refine the model. Possessing a tailored AI as a semiconductor engineer today is an invaluable asset, immensely aiding in problem-solving. Harnessing emerging technologies to address contemporary challenges is pivotal in navigating the complex landscape of the semiconductor industry today.