6 November 2023

Fine-Tuning GPT-2 for Generating Scientific Abstracts

In the ever-evolving field of Natural Language Processing, fine-tuning established models like GPT-2 offers a targeted approach to task-specific challenges. This article delves into the process and benefits of customizing GPT-2 through fine-tuning, compares it with the alternative of building custom transformer models from scratch, and presents a direct experimental comparison of both methods in generating scientific abstracts.

Understanding Fine-Tuning

Fine-tuning involves refining a pre-trained model like GPT-2, equipping it with additional capabilities tailored to a specialized task.

Advantages of Fine-Tuning

  • Customization: Not all businesses or applications need a jack-of-all-trades. Some require an expert. Fine-tuning tailors GPT-2’s vast knowledge to cater to specific domains, be it healthcare, finance, or creative writing.
  • Efficiency: Training a model of GPT-2’s magnitude from the ground up demands enormous resources. Fine-tuning is the shortcut to customizing without starting from scratch.
  • Accuracy: While GPT-2 is remarkable, it’s not perfect for every niche out of the box. Refining it on domain-specific data increases its accuracy for that domain

Disadvantages of Fine-Tuning

  • Data Requirements: Fine-tuning still requires a substantial amount of relevant data, which can be a limitation if such data is scarce or expensive to obtain.
  • Overfitting Risk: There’s a risk of the model overfitting to the fine-tuning data, which can reduce its generalizability to new, unseen data.
  • Resource Intensive: Although less so than training from scratch, fine-tuning large models like GPT-2 still demands significant computational resources.

The Pros and Cons of Creating Custom Transformer Models

Custom transformer models are tailored to excel at specific tasks by addressing unique data nuances that general models might overlook. However, developing them requires significant time, expertise, and resources.

Advantages of Custom Transformer Models

  • Tailored Architecture: Designing a custom transformer model allows for an architecture that is finely adjusted to the specific needs and nuances of the task at hand.
  • Specialized Insights: A model built from zero is capable of learning unique patterns in the data that pre-trained models, which carry prior knowledge, may not easily adapt to.

Challenges of Custom Transformer Models

  • Resource Intensity: The creation of a custom transformer demands significant computational resources, as well as time for development and training.
  • Expertise Requirement: This route requires a team with deep understanding and expertise in the intricacies of transformer model architectures and machine learning.
  • Data Volume: Securing a sufficiently large and relevant dataset is crucial, and often a substantial challenge, for training a custom model effectively.

About GPT-2/3

GPT-2, developed by OpenAI, is an earlier iteration of the Generative Pre-trained Transformer series, which utilizes deep learning to produce human-like text. It can handle a wide range of tasks, such as translation, summarization, and question-answering, by predicting the next word in a sentence in a way that’s coherent and contextually relevant.

Building on this, GPT-3 made significant strides in language model performance. It’s much larger than GPT-2, with 175 billion parameters, allowing for improved understanding and generation of text. Its capabilities include writing essays, creating poetry, coding, and even engaging in dialogue that can be indistinguishable from human conversation.

Metrics Used for Evaluation

Here are some tools to evaluate the model’s performance:

  • METEOR: Assesses the quality of translated text.
  • ROUGE: Measures how closely the generated text resembles the reference text.
  • BLEU: Evaluates how the machine-generated text aligns with human-written text.

Although this article focuses on GPT-2, it’s pertinent to note that other models, such as GPT-3 and BART, are also amenable to fine-tuning for diverse requirements.

The Experiment: Fine-tuned GPT-2 vs. Custom Transformer Model

The ever-growing volume of scientific papers makes it challenging to stay informed. Two methods were tried:

  • Creating a custom transformer model
  • Fine-tuning the GPT-2 model

Custom Transformer Model Architecture

import torch
import torch.nn as nn

 class CustomTransformerModel(nn.Module):
    def __init__(self, ntoken, ninp, nhead, nhid, nlayers, dropout=0.5):
        super(TransformerModel, self).__init__()
        self.model_type = 'Transformer'
        self.pos_encoder = PositionalEncoding(ninp, dropout)
        encoder_layers = nn.TransformerEncoderLayer(ninp, nhead, nhid, dropout)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, nlayers)
        decoder_layers = nn.TransformerDecoderLayer(ninp, nhead, nhid, dropout)
        self.transformer_decoder = nn.TransformerDecoder(decoder_layers, nlayers)
        self.encoder = nn.Embedding(ntoken, ninp)
        self.decoder = nn.Linear(ninp, ntoken)

    def generate_square_subsequent_mask(self, sz):
        mask = (torch.triu(torch.ones(sz, sz)) == 1).transpose(0, 1)
        mask = mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0))
        return mask

    def init_weights(self):
        initrange = 0.1
        self.encoder.weight.data.uniform_(-initrange, initrange)
        self.decoder.weight.data.uniform_(-initrange, initrange)

    def forward(self, src, tgt):
        src = self.encoder(src) * math.sqrt(src.size(-1))
        src = self.pos_encoder(src)
        tgt = self.encoder(tgt) * math.sqrt(tgt.size(-1))
        tgt = self.pos_encoder(tgt)
        memory = self.transformer_encoder(src)
        tgt_mask = self.generate_square_subsequent_mask(len(tgt)).to(device)
        out = self.transformer_decoder(tgt, memory, tgt_mask=tgt_mask)
        return self.decoder(out)
  1. Initialization (__init__):

    • It sets up the building blocks for the transformer model.
    • It includes parts for understanding the position of words (pos_encoder), the main encoder (transformer_encoder) that reads the data, and the decoder (transformer_decoder) that gives the output.
    • nn.Embedding and nn.Linear help change the size of data so it fits into the model.
  2. Masks (generate_square_subsequent_mask):

    • In language models, sometimes you don’t want certain parts of the data to affect others. Masks hide these parts.
    • This function makes a special type of mask to help the model while training.
  3. Weights (init_weights):

    • Every model has “weights”, which are like its memory of past training.
    • This function sets the starting values for these weights.
  4. Forward Pass (forward):

    • This is where data goes through the model.
    • The data (or text) goes through the encoder, then the decoder, and then comes out as a result.

Fine-tuned GPT-2 Architecture

import torch
import torch.nn as nn
from transformers import GPT2Tokenizer, GPT2LMHeadModel

class GPT2Model(nn.Module):
    def __init__(self, model_name="gpt2"):
        super(GPT2Model, self).__init__()
        self.gpt2 = GPT2LMHeadModel.from_pretrained(model_name)
    def forward(self, input_ids):
        outputs = self.gpt2(input_ids=input_ids)
        return outputs.logits
  1. Initialization (__init__):

    • When the model is set up, it loads the GPT-2 tool with a pre-trained version specified by the model_name.
  2. Processing Data (forward):

    • This part takes in data (like sentences) using input_ids.
    • The GPT-2 tool then processes this data and gives an output.
    • The output, called “logits”, is what the model thinks the next part of the data (or next words) should be.

Evaluation/Performance Metrics

The experiment was conducted on a MacBook Pro with an Apple M1 Pro chip and 16GB memory.

Metric GPT-2 Fine-tuned Custom Transformer
Training Time (on 10,000 datasets) 08:54:12 75:05:24
Evaluation Execution Time (on 1,000 datasets) 00:03:47 00:23:13
METEOR (Average Score) 18.1% 3.7%
ROUGE (Average Score) 18.3% 1.8%
BLEU (Average Score) 21.6% 3.1%

In an experiment comparing a fine-tuned GPT-2 model with a custom transformer model for generating scientific abstracts, the fine-tuned GPT-2 outperformed the custom model significantly across all metrics. It was much faster, taking approximately 9 hours for training on 10,000 datasets compared to 75 hours for the custom model. Its METEOR, ROUGE, and BLEU scores were consistently higher, indicating better text generation quality.

Examples of Created Text

Title The free energy of the non-isotropic Ising lattice with Brascamp-Kunz boundary conditions
Real Abstract The free energy of the finite and non-isotropic Ising lattice with Brascamp-Kunz boundary conditions is calculated exactly as a series in the absence of an external magnetic field.
Fine-tuned Gpt-2 The free energy of the non-isotropic Ising lattice with Brascamp-Kunz boundary conditions in the L
Custom Transformer Classes analogy motion sample hypersurface relation recent surface used article spherical analytic performed withdrawn martian multipleoutput maximum polymer photon field specialized truncated fermionic highquality equation cells formed review

When it came to the quality of the generated text, the fine-tuned GPT-2 produced more coherent abstracts that closely matched real scientific abstracts. For instance, for a given title about the free energy of an Ising lattice, GPT-2’s output was more precise and on-topic, while the custom transformer generated a random collection of scientific terms with no coherent meaning.

Discussion and Future Work

Exploring the depths of Natural Language Processing (NLP), this experiment reveals the synergy between advanced deep learning and NLP. By integrating linguistic principles and tools, the efficiency of contemporary models, particularly a custom Transformer and refined GPT-2, becomes evident. The presented prototype validates the feasibility of an automated system adept at producing scientific abstracts.

Yet, there’s room for improvement:

  • Expanding the Dataset: More data could enhance text outcomes.
  • Enhancing Model Information: Including elements like authorship might fine-tune abstract precision and resonance.
  • Broadening Capabilities: These models have potential applications beyond abstracts and could reshape academic writing by generating comprehensive research sections or even full papers.

This research highlights the expansive potential within NLP and deep learning, suggesting promising avenues for future advancements in scientific text generation.

Your Contribution is Valued

The NLP domain continues to evolve. Readers are encouraged to share insights, perspectives, and innovative ideas. Such contributions can further refine the models, introduce new pathways, or even steer the project towards uncharted territories. The collective expertise can truly redefine the boundaries of text generation!



Arapsih Güngör

Arapsih Güngör is a Senior Software Developer at OpenValue with expertise in multiple programming languages and frameworks. He is dedicated to producing high-quality work, enjoys problem-solving, and is passionate about writing clean, efficient code. Outside of work, Arapsih enjoys badminton, tennis, dancing, and lifting weights.