Evaluating Steel Performance Using Evolutionary Algorithms

Student: Yichun Li, Northwestern University

Principal Investigator: Mark Messner, Argonne National Laboratory


  1. Introduction

Grade-91 Steel was developed around 30 years ago. Now a new design of nuclear reactors has been proposed and under review. Water was originally used as a moderator and coolant in existing nuclear reactors. Now liquid sodium has been considered as the new coolant because designs have forgone water as a coolant. However, steam is still used to transform the energy from nuclear reactions to electricity. Therefore, the new design adds in an intermediate heat exchanger with both water and liquid flowing through to perform the heat exchange process. In turn, the materials used for building the heat exchanger has been Grade-91 steel. However, such steel has not been used in such structure in the nuclear reactor before, therefore it is important to understand and examine its properties, such as stress and strain, and performance, especially under the condition of high temperatures. The designed proposed life of the steel for the heat exchanger is 60 years. However, there has not been enough research on this topic. Therefore, this project is to fill in the gap, by looking at the experimental data of such steel under different physical parameters under stress and strain. There are two other materials, though having less data that are considered for the same project.


The original goal of this project was to utilize machine learning techniques to configure a model that will help us understand the performance (deformation under high temperatures) better.

  1. EA (Evolutionary Algorithms): ODE-based models
  2. RNN (Recurrent Neural Network): Use the synthetic model to train and run the model first before using the big dataset.


Due to time constraint, a completion of EA on a simulated dataset was achieved. The following is the general timeline:

Week 1-4: Building and debugging the EA algorithm using DEAP package;

Week 5-9: Testing the algorithm on simulated datasets and analyzing results;

Week 10: Final wrap-up.


  1. Setup
  2. Data Processing (scaffold.py)

The file includes functions to read and load entries from a dataset of 200 experiments, parsing them and putting them into an evaluative model to assess individuals.

The fitness of the evaluation(R) is a Euclidean norm summation over all experiments. For purpose of fitting into the DEAP package, the goal of the algorithm instance is to maximize fitness(F), where , where F will take values between 0 and 1 (inclusive).

Additionally, a penalty (R = 10000) is added when evaluating F, which despite its mathematical bound, could result in invalid values. In those cases, the fitness of that individual will be set to the penalty, which will result in the individual being eliminated for selection of the next generation.

  1. Algorithm Build(small.py)

DEAP package is a Python package built for Evolutionary Algorithms. small.py is the file that implements an instance of an Evolutionary Algorithm using DEAP. The structure of the instance is as follows:

  1. Registering the algorithm to maximize fitness, an individual and a population that is made up of certain numbers of individuals into the instance toolbox. An individual is a list of attribute values with a fitness class and a population is a list of those individuals. In an individual, the attributes in order represents, E(Poisson’s Ratio) ranging from 0 to 300,000; Sy(Yield Stress) ranging from 0 to 1000; H (kinematic hardening modulus) ranging from 0 to 120,000; K (isotropic hardening modulus) ranging from 0 to 120,000.
  2. Registering evolutionary methods into the instance toolbox, namely:
    • Mutation: functions that will mutate and generate a new offspring from an individual in the population.
    • Crossover(mate): functions that will mate and reproduce a new offspring from two individuals in the population.
    • Selection: functions that will select a pre-determined number of individuals from the offspring or both offspring and old individuals in the population.
  3. Setting up the selected algorithm with parameters as follows: number of experiments used to evaluate the individuals, number of individuals, maximum number of generations, number of individuals selected to be in the next generation as will as an statistics object that records mean, standard deviation, minimum and maximum for each generation.


  • Methodology

The core idea of this optimization using EA is to find the best feasible algorithm, the best method for mutation, crossover and selection, where probabilities for the methods need to be tested and adjusted.

For algorithms, the following have been tested:

  • OneMax (similar to eaSimple in DEAP): both mutation and crossover is performed in every individual, and replaces the entire population with resulting offspring.
  • Evolution Strategies (using eaMuCommaLambda algorithm): either mutation or crossover is performed on a select amount of individuals to produce a determined number of offspring. Then, the next generation is selected only among the offspring. For crossover and mutation, the strategy restrains the standard deviation of the offspring.
  • eaMuPlusLambda: either mutation or crossover is performed on a select number of individuals to produce a determined number of offspring. Then, the next generation is selected among both the offspring and the parents.

For crossover, the following functions have been tested:

  • cxTwoPoint: individuals swap two of their attribute values.
  • cxUniform: individuals conditionally swap every attribute depending on a probability value.
  • cxESBlend: individuals swap attribute both attribute and strategy values.
  • cxSimulatedBinaryBounded: the attribute values will be converted into binary sequences, which the function will crossover the sequences and generate an attribute value. A lower and upper bound for each attribute value (same bound as initial setup), as well as a crowding degree of how much the resulting value will resemble the parental value, need to be set.

For mutation, the following functions have been tested:

  • mutFlipBit: convert the attribute value into a binary sequence and perform NOT operation. An independent probability for each attribute to be flipped needs to be provided.
  • mutShuffleIndexes: shuffle the attribute values inside an individual. An independent probability for each attribute value to be exchanged to another attribute needs to be provided.
  • mutESLogNormal: mutate the evolution strategy according to an extended log normal rule. Then the individual attribute values is mutated according to the generated strategy as standard deviation.
  • mutPolynomialBounded: mutate the attribute values using polynomial mutation as implemented in NSGA-II algorithm in C by Deb.

For selection, the following functions have been tested:

  • selTournament: select k individuals from the population using k tournaments.
  • selRoulette: select k individuals from the population using k spins of a roulette.
  • selBest: select k best individuals from the population.
  • selWorst: select k worst individuals from the population.

In addition, the following parameters are tested:

  • MUTPB and CXPB: the probability of an individual going through mutation and that of crossover. The summation of both needs to be smaller than 1.
  • Indpb: the independent probabilities of each attribute to be changed in terms of crossover and/or mutation.

As the goal is to maximize fitness, the goal is to generate an individual with the fitness(F) value of 1, which is R value of 0.


  1. Results and analysis

Initial Fitness

Maximum fitness is the most important statistics in the fitting process. Initial mx fitness varies proportionately to the number of experiments, which will extend the number of generations needed to converge to F valuing 1.

Number of Experiments 5 50 100
Average Max Initial Fitness(F) 0.25 0.03 0.01

Because of that, the penalty in scaffold.py needs to be a very big number in order to exclude individuals with very “bad” fitness.

On an additional note, the fitting process did not exact follow a systematic fashion as the number of experiments as well as the number of individuals in the population affects the time for each trial to finish.


Premature Convergence

Premature convergence was a significant issue with a big part of the fitting process, which means that the population tends to get stuck on a local maxima for F, resulting in a very low and consistent standard deviation as well as max F around generation 15. Most of the crossover methods tend to decrease the diversity of the population, as follows:

Methods Influence on diversity
cxTwoPoint Same as mutShuffleIndexes
cxUniform Same as mutShuffleIndexes
cxESBlend Same as mutShuffleIndexes
cxSimulatedBinaryBounded Will maintain a good diversity.


mutFlipBit It can produce values that are potentially outliers. The issue is that it could generate invalid values.
mutShuffleIndexes Shuffling does not necessarily bring in new values into the attributes pool, which will make the algorithm hard to converge to a right F.
mutESLogNormal The range of values upon mutation depends on the strategy value.


Another factor playing into premature convergence is the values of CXPB, MUTPB and indpb, as mentioned towards the end of the previous section. From the trials, it was also very apparent that mutation performs better than crossover in terms of increasing or maintaining a healthy diversity in the population. However, a heavy tilt towards mutation could result in the algorithm having a hard time locking up the correct direction to increase the fitnesses of most individuals, which will result in outliers and some hopping around. Therefore, having roughly a 0.4:0.6 for the CXPB:MUTPB is good ratio.


In addition, indpb dictates how possible is each attribute value likely to crossover/mutate. In that case, the probability of a single attribute to be mutated is usually: indpb * CXPB or indpb*MUTPB. Maintaning a high indpb (0.9) will help crossover and mutation in maintaining diversity as well as evolving the population correctly and timely.


A third factor for premature convergence is the relationship between the number of experiments and the size of the population. With 5 experiments, population of size 50 is a good number that will allow evolution to happen. With 100 experiments, population of size 400 will do the same. The following table shows the best performance of population under 5 experiments versus 100 experiments using eaMuPlusLambda:


Number of Experiments Population Size Initial Max F Final Max F E Sy H K
5 50 0.2642 0.9739 199807.2398




3172.1629 3856.7229
100 400 0.0172 0.9999 200000.0000 300.0000 2999.9999 4000.0000


It is apparent that with more experiments and a much bigger population size, the algorithm is able to fit the model really well.


Results of different algorithms

There was no recorded statistics for trials done with OneMax algorithm, however, based on recall, the algorithm did marginal improvement to the initial avg max F.

The following table is a comparison between the other two algorithms:

Algorithm Tested Initial Avg Max F Final Avg Max F Change
Evolution Strategies (eaMuCommaLambda) 0.1896 0.2769 66.06547%


eaMuPlusLambda 0.06702 0.7246 981.1407%



For both OneMax and Evolution Strategies, experiments were not able to run past 25 generations because of the value error triggered by individuals with very “bad” fitness. However, both algorithms encounter premature convergence. Regardless of the number of experiments used as well as the size of the population, eaMuPlusLambda performs better than the other two. The reason might be that the other two both have, to a certain degree, an exclusive selection rule. Both algorithms, upon selection, replace the old population exclusively with offspring. This could potentially be problematic for the following reasons:

  • The parents with “good” fitness in the old population are not allowed onto the next generation, which might slow down the evolution.
  • The selection measure would allow more offspring with “bad” fitness to be in the next generation, which might slow down the evolution.

See Appendix for graphs on individual trials.


  1. Conclusion and Future Development

In conclusion, the evolutionary algorithm could fit the parameters of stress and strain rather well in order to predict a set of physical parameters that will work under the given experimental data. The following key components are important to building the algorithm that work well:

  • A big population
  • A balanced crossover-to-mutation ratio that can maintain a good diversity in the population
  • A selection measure that will be comprehensive

Given time, Evolution Strategies could be further studied to fit the model well as it learns and modifies the evolution according to the distribution of the individuals. Furthermore, we could use optimization measure in scipy package to benchmark against the performance of EA. RNN could be developed and trained to fit the model as well. Also, trying to fit a more complex model with more attributes could be valuable.