Evaluating Steel Performance Using Evolutionary Algorithms

Student: Yichun Li, Northwestern University

Principal Investigator: Mark Messner, Argonne National Laboratory

 

  1. Introduction

Grade-91 Steel was developed around 30 years ago. Now a new design of nuclear reactors has been proposed and under review. Water was originally used as a moderator and coolant in existing nuclear reactors. Now liquid sodium has been considered as the new coolant because designs have forgone water as a coolant. However, steam is still used to transform the energy from nuclear reactions to electricity. Therefore, the new design adds in an intermediate heat exchanger with both water and liquid flowing through to perform the heat exchange process. In turn, the materials used for building the heat exchanger has been Grade-91 steel. However, such steel has not been used in such structure in the nuclear reactor before, therefore it is important to understand and examine its properties, such as stress and strain, and performance, especially under the condition of high temperatures. The designed proposed life of the steel for the heat exchanger is 60 years. However, there has not been enough research on this topic. Therefore, this project is to fill in the gap, by looking at the experimental data of such steel under different physical parameters under stress and strain. There are two other materials, though having less data that are considered for the same project.

 

The original goal of this project was to utilize machine learning techniques to configure a model that will help us understand the performance (deformation under high temperatures) better.

  1. EA (Evolutionary Algorithms): ODE-based models
  2. RNN (Recurrent Neural Network): Use the synthetic model to train and run the model first before using the big dataset.

 

Due to time constraint, a completion of EA on a simulated dataset was achieved. The following is the general timeline:

Week 1-4: Building and debugging the EA algorithm using DEAP package;

Week 5-9: Testing the algorithm on simulated datasets and analyzing results;

Week 10: Final wrap-up.

 

  1. Setup
  2. Data Processing (scaffold.py)

The file includes functions to read and load entries from a dataset of 200 experiments, parsing them and putting them into an evaluative model to assess individuals.

The fitness of the evaluation(R) is a Euclidean norm summation over all experiments. For purpose of fitting into the DEAP package, the goal of the algorithm instance is to maximize fitness(F), where , where F will take values between 0 and 1 (inclusive).

Additionally, a penalty (R = 10000) is added when evaluating F, which despite its mathematical bound, could result in invalid values. In those cases, the fitness of that individual will be set to the penalty, which will result in the individual being eliminated for selection of the next generation.

  1. Algorithm Build(small.py)

DEAP package is a Python package built for Evolutionary Algorithms. small.py is the file that implements an instance of an Evolutionary Algorithm using DEAP. The structure of the instance is as follows:

  1. Registering the algorithm to maximize fitness, an individual and a population that is made up of certain numbers of individuals into the instance toolbox. An individual is a list of attribute values with a fitness class and a population is a list of those individuals. In an individual, the attributes in order represents, E(Poisson’s Ratio) ranging from 0 to 300,000; Sy(Yield Stress) ranging from 0 to 1000; H (kinematic hardening modulus) ranging from 0 to 120,000; K (isotropic hardening modulus) ranging from 0 to 120,000.
  2. Registering evolutionary methods into the instance toolbox, namely:
    • Mutation: functions that will mutate and generate a new offspring from an individual in the population.
    • Crossover(mate): functions that will mate and reproduce a new offspring from two individuals in the population.
    • Selection: functions that will select a pre-determined number of individuals from the offspring or both offspring and old individuals in the population.
  3. Setting up the selected algorithm with parameters as follows: number of experiments used to evaluate the individuals, number of individuals, maximum number of generations, number of individuals selected to be in the next generation as will as an statistics object that records mean, standard deviation, minimum and maximum for each generation.

 

  • Methodology

The core idea of this optimization using EA is to find the best feasible algorithm, the best method for mutation, crossover and selection, where probabilities for the methods need to be tested and adjusted.

For algorithms, the following have been tested:

  • OneMax (similar to eaSimple in DEAP): both mutation and crossover is performed in every individual, and replaces the entire population with resulting offspring.
  • Evolution Strategies (using eaMuCommaLambda algorithm): either mutation or crossover is performed on a select amount of individuals to produce a determined number of offspring. Then, the next generation is selected only among the offspring. For crossover and mutation, the strategy restrains the standard deviation of the offspring.
  • eaMuPlusLambda: either mutation or crossover is performed on a select number of individuals to produce a determined number of offspring. Then, the next generation is selected among both the offspring and the parents.

For crossover, the following functions have been tested:

  • cxTwoPoint: individuals swap two of their attribute values.
  • cxUniform: individuals conditionally swap every attribute depending on a probability value.
  • cxESBlend: individuals swap attribute both attribute and strategy values.
  • cxSimulatedBinaryBounded: the attribute values will be converted into binary sequences, which the function will crossover the sequences and generate an attribute value. A lower and upper bound for each attribute value (same bound as initial setup), as well as a crowding degree of how much the resulting value will resemble the parental value, need to be set.

For mutation, the following functions have been tested:

  • mutFlipBit: convert the attribute value into a binary sequence and perform NOT operation. An independent probability for each attribute to be flipped needs to be provided.
  • mutShuffleIndexes: shuffle the attribute values inside an individual. An independent probability for each attribute value to be exchanged to another attribute needs to be provided.
  • mutESLogNormal: mutate the evolution strategy according to an extended log normal rule. Then the individual attribute values is mutated according to the generated strategy as standard deviation.
  • mutPolynomialBounded: mutate the attribute values using polynomial mutation as implemented in NSGA-II algorithm in C by Deb.

For selection, the following functions have been tested:

  • selTournament: select k individuals from the population using k tournaments.
  • selRoulette: select k individuals from the population using k spins of a roulette.
  • selBest: select k best individuals from the population.
  • selWorst: select k worst individuals from the population.

In addition, the following parameters are tested:

  • MUTPB and CXPB: the probability of an individual going through mutation and that of crossover. The summation of both needs to be smaller than 1.
  • Indpb: the independent probabilities of each attribute to be changed in terms of crossover and/or mutation.

As the goal is to maximize fitness, the goal is to generate an individual with the fitness(F) value of 1, which is R value of 0.

 

  1. Results and analysis

Initial Fitness

Maximum fitness is the most important statistics in the fitting process. Initial mx fitness varies proportionately to the number of experiments, which will extend the number of generations needed to converge to F valuing 1.

Number of Experiments 5 50 100
Average Max Initial Fitness(F) 0.25 0.03 0.01

Because of that, the penalty in scaffold.py needs to be a very big number in order to exclude individuals with very “bad” fitness.

On an additional note, the fitting process did not exact follow a systematic fashion as the number of experiments as well as the number of individuals in the population affects the time for each trial to finish.

 

Premature Convergence

Premature convergence was a significant issue with a big part of the fitting process, which means that the population tends to get stuck on a local maxima for F, resulting in a very low and consistent standard deviation as well as max F around generation 15. Most of the crossover methods tend to decrease the diversity of the population, as follows:

Methods Influence on diversity
cxTwoPoint Same as mutShuffleIndexes
cxUniform Same as mutShuffleIndexes
cxESBlend Same as mutShuffleIndexes
cxSimulatedBinaryBounded Will maintain a good diversity.

 

mutFlipBit It can produce values that are potentially outliers. The issue is that it could generate invalid values.
mutShuffleIndexes Shuffling does not necessarily bring in new values into the attributes pool, which will make the algorithm hard to converge to a right F.
mutESLogNormal The range of values upon mutation depends on the strategy value.

 

Another factor playing into premature convergence is the values of CXPB, MUTPB and indpb, as mentioned towards the end of the previous section. From the trials, it was also very apparent that mutation performs better than crossover in terms of increasing or maintaining a healthy diversity in the population. However, a heavy tilt towards mutation could result in the algorithm having a hard time locking up the correct direction to increase the fitnesses of most individuals, which will result in outliers and some hopping around. Therefore, having roughly a 0.4:0.6 for the CXPB:MUTPB is good ratio.

 

In addition, indpb dictates how possible is each attribute value likely to crossover/mutate. In that case, the probability of a single attribute to be mutated is usually: indpb * CXPB or indpb*MUTPB. Maintaning a high indpb (0.9) will help crossover and mutation in maintaining diversity as well as evolving the population correctly and timely.

 

A third factor for premature convergence is the relationship between the number of experiments and the size of the population. With 5 experiments, population of size 50 is a good number that will allow evolution to happen. With 100 experiments, population of size 400 will do the same. The following table shows the best performance of population under 5 experiments versus 100 experiments using eaMuPlusLambda:

 

Number of Experiments Population Size Initial Max F Final Max F E Sy H K
5 50 0.2642 0.9739 199807.2398

 

299.9368

 

3172.1629 3856.7229
100 400 0.0172 0.9999 200000.0000 300.0000 2999.9999 4000.0000

 

It is apparent that with more experiments and a much bigger population size, the algorithm is able to fit the model really well.

 

Results of different algorithms

There was no recorded statistics for trials done with OneMax algorithm, however, based on recall, the algorithm did marginal improvement to the initial avg max F.

The following table is a comparison between the other two algorithms:

Algorithm Tested Initial Avg Max F Final Avg Max F Change
Evolution Strategies (eaMuCommaLambda) 0.1896 0.2769 66.06547%

 

eaMuPlusLambda 0.06702 0.7246 981.1407%

 

 

For both OneMax and Evolution Strategies, experiments were not able to run past 25 generations because of the value error triggered by individuals with very “bad” fitness. However, both algorithms encounter premature convergence. Regardless of the number of experiments used as well as the size of the population, eaMuPlusLambda performs better than the other two. The reason might be that the other two both have, to a certain degree, an exclusive selection rule. Both algorithms, upon selection, replace the old population exclusively with offspring. This could potentially be problematic for the following reasons:

  • The parents with “good” fitness in the old population are not allowed onto the next generation, which might slow down the evolution.
  • The selection measure would allow more offspring with “bad” fitness to be in the next generation, which might slow down the evolution.

See Appendix for graphs on individual trials.

 

  1. Conclusion and Future Development

In conclusion, the evolutionary algorithm could fit the parameters of stress and strain rather well in order to predict a set of physical parameters that will work under the given experimental data. The following key components are important to building the algorithm that work well:

  • A big population
  • A balanced crossover-to-mutation ratio that can maintain a good diversity in the population
  • A selection measure that will be comprehensive

Given time, Evolution Strategies could be further studied to fit the model well as it learns and modifies the evolution according to the distribution of the individuals. Furthermore, we could use optimization measure in scipy package to benchmark against the performance of EA. RNN could be developed and trained to fit the model as well. Also, trying to fit a more complex model with more attributes could be valuable.

 

 

 

 

Northwestern Undergraduates: Plan now for a Summer 2018 internship at Argonne National Laboratory through DOE’s Summer Undergrad Lab Inhip (SULI) programterns

In 2017, Northwestern undergraduate students worked on the projects you can read about on this blog at Argonne National Laboratory.  The U.S. Department of Energy’s (DOE) Summer Undergraduate Laboratory Internships (SULI) program is one route to working at Argonne (or another national lab) over the summer.   (Although other routes are possible to summer internships at the lab, this is the best one for undergraduates.) The deadline for summer 2018 will be in Dec/Jan so now is the time to start thinking about applying!  Watch this space for the deadline when it is announced.

Participation in the SULI Program is a great way to get a taste of research and life at the national laboratories.  You’ll meet national laboratory researchers and many other undergraduates carrying out research in many disciplines.   Please apply to the SULI program through this DOE website link:

https://science.energy.gov/wdts/suli/

Argonne PIs have provided descriptions below about projects that will likely be available in 2018.  Many other projects will also be available.  As you complete your SULI application, you’ll be asked about your research interests. Please feel free to mention the topics in one of the projects below if they meet your research interests.  Please let NAISE know if you’ve applied (naise_summer(at)northwestern.edu).

Some details: SULI students can live on-site at the lab.  You must be a U.S. Citizen or Lawful Permanent Resident at the time of application.

Biology

Fe and S cycles’ role in contaminant mobility

Research in the Biogeochemical Process Group at Argonne National Laboratory is investigating the interplay between the Fe and S cycles and their roles in controlling contaminant mobility, carbon and nutrient cycling, and greenhouse gas emissions. The project’s long-term vision is to integrate their findings into multiscale modeling approaches to understand and predict relevant environmental processes. The program integrates two unique strengths—(1) the Advanced Photon Source (APS) for synchrotron-based interrogation of systems, and (2) next-generation DNA sequencing and bioinformatics approaches for microbial community and metabolic pathway analysis—with biogeochemistry and microbial ecology.

Bioinformatics and Computational Biology

We apply a wide range of computational approaches within the DOE Systems Biology Knowledgebase (KBase) to answer questions about complex biological systems, including: (i) how microbes and plants degrade or produce specific metabolites; (ii) how microbes, plants, and fungi interact within an environment (e.g. human gut, soil, bioreactor) to display a specific phenotype; and (iii) how microbial genomes evolve in response to stress, stimuli, and selection. Students in the Henry lab will learn to apply tools like (meta)genome assembly, genome annotation, RNA-seq read alignment, contig binning, and metabolic modeling to answer these questions. Students with programming skills can also contribute to the KBase platform by integrating new apps, visualizations, and algorithms.

Materials

The goal of the project is to significantly improve the understanding and prediction of thermodynamic stability/metastability of “imperfect” (e.g., highly defective, non-stoichiometric, or doped) oxide material phases, via innovative theory (i.e., embedded uncertainty), advanced experiments at APS, and “intelligent” software (i.e., able to learn and quickly solve new problems). We envision building the knowledge and capabilities that will allow, over the next decade, the prediction of thermodynamic properties of imperfect materials, with impact on materials design, synthesis, and smart manufacturing. Furthermore, we expect this methodology to accelerate the development of the material genome and next generation computers. We focus on high-k dielectric materials for complementary metal-oxide-semiconductor (CMOS), which are of particular importance for creating Dynamic Memory Allocation (DRAM) devices. Many CMOS properties strongly depend on material defects such as vacancies, interstitials, defect clusters that occur during synthesis, and thermal treatment. Inclusion of other chemical elements (e.g., dopants) in CMOS can significantly change physical properties such as thermal conductivity, electrical conductivity, and magnetism. Our approach is original and is based on calculating the free energy of each phase as function of temperature and composition (oxygen and dopant content) using atomistic (quantum mechanical, ab-initio Molecular Dynamics), meso-scale (reactive force fields and kinetic Monte Carlo), and continuum (phase diagram calculation) methods. Uncertainty evaluation is embedded in this multi-scale methodology via Bayesian analysis. We validate the models and computer simulations using high-temperature experiments at APS. Furthermore, we develop a machine learning (ML) open code to perform supervised and unsupervised computations on Mira (Aurora when available) for calculations/simulations, and on Athena for big data analytics. The intelligent software assists the development of interatomic potentials and force fields, performs analysis of massive sets of CMOS phases and defect structures, evaluates uncertainty of phase diagrams, and guides the experimental characterization measurements.

Chemical/Environmental Engineering

Bio-manufacturing of Porous Carbon as Catalyst Supports from Organic Waste Streams

Porous carbon materials, like activated carbon (AC), have demonstrated unmatched efficiency in applications such as filtration, catalysis, and energy storage. The problem is that conventional AC is produced from supply limited coal or coconut shells using multistage manufacturing processes that are energy intensive, polluting, and result in sub-par performance. In fact, an estimated 4 million metric tones of AC will be produced in 2025, requiring the harvesting and shipping of significant feedstock from around the world. We have been developing a biomanufacturing process to produce high performance, low cost porous carbon materials from low or negative value waste streams. High performance biocarbon manufacturing process (Patent App. No. PCT/US 2017-043304) has been scaled up from bench- to pilot-scale. The performance, cost, and life-cycle impact of AC and its end-uses are primarily determined by how it is fabricated.

Arrested Methanogenesis for Volatile Fatty Acid Production

Huge quantities of high organic strength wastewater and organic solid waste are produced and disposed of in the US each year (EPA, 2016). We have been developing a new high rate arrested anaerobic digestion (AD) processes for transforming organic waste supplanting starch, sucrose or glucose currently used as feedstock into VFAs and alcohols. We will design and construct a high rate sequencing batch reactor (SBR) and fluidized anaerobic membrane bioreactor (FAnMBR) technologies to produce and separate VFAs and alcohols from the fermenters to facilitate high product yield, minimize the toxicity of VFAs, reduce mass transfer limitations and ensure the health, stability, and productivity of AD communities. This research will specifically determine the links between organic wastewater characteristics, microbe community structure and the design and operation of high rate arrested AD system at the bench-scale. Specific research targets include the isolation and integration of highly diverse microbial functionalities within high rate arrested AD fermenters for high strength organic wastewater treatment coupled to renewable chemical production.

Ecosystem, Environment, Water Resources

Ecosystem services of bioenergy

The Fairbury Project studies sustainable ways to produce bioenergy and evaluates the dual provision of biomass (as a commodity crop) and ecosystem services (environmental benefits) through the integration of short rotation shrub willow buffers into a Midwest row cropping system. The project started in 2009 on a 16 acre agricultural corn field in Fairbury, IL. The field site is close to the Indian Creek which sits at the headwaters of the Vermillion River, considered impaired by the Illinois EPA. The strategic placement of the willow buffers on the landscape was designed to improve land use efficiency by providing farmers and landowners with an alternative land management strategy. In this case, the placement of the willow buffers were to target areas on the field that would have the greatest impact on nutrient reduction while mitigating conflicts with grain production by targeting areas that are underproductive as well. In order to assess the success of the use of willow buffers in a traditional row-cropping system on biomass production and ecosystem service provision, many field and crop based parameters are continually or annually measured. These parameters include assessing crop impact on water quality (water collection, ion exchange resins), water quantity (soil moisture, transpiration and water table elevation), nutrient uptake and storage (vegetation collection), biomass production (vegetation collection and allometric measurements), soil health (chemical & physical parameters), greenhouse gas flux (gas sample collections), and habitat provision (soil microbiology and macroinvertebrates including pollinators).

Student involvement:
As part of The Fairbury Project, students will work alongside Argonne staff and fellow interns doing an array of tasks in the field, lab, and office. Students are expected to travel to the field once or more a week under various weather conditions for data and sample collection. In the lab students may be involved in sample processing and analysis including ion exchange resin extraction, water quality testing (UV spectroscopy), greenhouse gas analysis (gas chromatographer), aboveground vegetation processing, and root analysis. In the office, students will be tasked with processing and analyzing data using software including but not limited to excel, R, DNDC, and ArcGIS. Additional tasks may include literature reviews and method development. Students will work both collectively with their fellow interns and staff as well as independently on various assigned tasks.

Qualified candidates:
Candidates must meet the general requirements for SULI. Additional requirements include but are not limited to previous experience or general interests related to water, soil, greenhouse gases, biodiversity, bioenergy production, environmental engineering, environmental sciences, and agriculture. Candidates should have a flexible schedule over the 10 weeks of the internship and must be available for the full 10 weeks. Field days start at 6am, therefore qualified candidates are required to have some form of transportation to the lab on field days (transportation from the lab to the field site will be provided), if not living on site.

Water Resources- Fuel production

Water resource is a critical component in energy production. Water resource availability varies by region throughout the United States. Population growth, energy development, and increased production increase pressure on water demand. This project evaluates potential of using ground water resource and municipal wastewater for fuel production in the United States. It will examine various level of water quality and estimate the water available for use from both historical production and future production perspective. Factors affecting regional resource use, feedstock production, and technology deployment and their trade-offs will be analyzed.

Water Resources- Crop production

Agriculture crop production requires water. However, not all of the crop production requires irrigation and the irrigation needs for the same crop varies from region to region. This project will analyze the amount of fresh water used for irrigation by different crops and irrigation technology surveyed in last few years in the United States. Spatial and temporal analysis will be conducted to calculate amount of irrigation water applied to produce a unit of grains and other products. The dataset will be compared with historical irrigation data to identify potential issues related to production of food, fiber, and fuel.

How can we more easily produce high quality protein crystals to be used for x-ray crystallography for protein structure identification?

Hi! My name is Josh Werblin and I am a rising junior at Northwestern University studying biomedical engineering. This summer, I have been working in Gyorgy Babnigg’s lab, investigating the applicability of droplet-based microfluidics in obtaining high-quality protein crystals.

He is part of the Midwest Center for Structural Genomics (MCSG, PI: Andrzej Joachimiak) and the Center for Structural Genomics of Infectious Diseases (CSGID, co-directors: Karla Satchell, Andrzej Joachimiak), two structural biology efforts that offer atomic-level insights into the function of proteins and their complexes.  Currently, high-quality protein crystals are difficult to generate, and account for one of the major bottlenecks in structural biology. During my summer project, I tested the feasibility of using droplet-based microfluidics to generate crystals amenable for structural studies.

Generation of protein droplets GIF
Slowed footage of droplets being generated. The two channels from the left are the reagent channels, protein and crystallization screen, respectively. The vertical channels are oil that pinches off droplets of the reagents.

To get crystals to form, you need to combine a purified protein with crystallization screens, which is a set of 96 mix of buffers, salts, and precipitants. Unfortunately, finding the right crystallization condition to make crystals for a protein can take a lot of testing. Even when you figure out a compatible crystallization after testing many crystallization screens and incubation conditions for a protein that yields crystals, the resulting crystals are too tiny for data collection, only a few microns across. Droplet-based microfluidics has the potential to test the many combinations using only small amount of protein for testing.

I tested the crystallization of a previously characterized protein using the microfluidic setting. Small aqueous droplets are formed in fluorinated oil. This can be done by using a small chip and pumping your reagents in. This stream of protein and crystallization screen are cut off by two streams of oil, which creates the droplets.

Using droplet-based microfluidics, I generated tiny droplets containing the protein and crystallization screen at the right concentrations, and was able to grow relatively large crystals (over 50 microns).

The protein I’ve focused on this summer is an enzyme called sialate O-acetylesterase from one of the good bacteria in our gut (Bacteroides vulgatus). After testing different concentrations and conditions, I finally generated some really good-looking crystals!

Two images of droplets showing multiple different sized crystals
Left: Small variations in the ratio of protein and crystallization solution result in droplets with single large crystals, a few smaller crystals, or many microcrystals.
Right: An image of big protein crystals ready to be put onto the beam-line for x-ray crystallography.

With everything all ready, Gyorgy and his colleague, Youngchang Kim, were able to test these crystals at the SBC beam-line of the Advanced Photon Source and hopefully soon I will have the structure of this protein!

I am so thankful to have had this opportunity to work in this lab with wonderful and knowledgeable people and I learned so much about 3D design and printing, experimental design, and proteins and their crystallization.

Students present their summer’s work at Argonne

This morning the students delivered compelling presentations about their research in a diverse set of areas that have been highlighted on this blog.  They did an excellent job communicating the complexities of their work to an audience made up of technical experts in a range of disciplines.  This event truly showcased their talents as researchers and communicators.   

Ugly Boxes + Experimental Sensors

Hello! I’m Jordan Fleming, a recent graduate of Northwestern University’s Mechanical Engineering and Environmental Engineering departments. I’ll soon be starting my Master’s with a focus in Water and Energy Engineering. I’ve spent this summer working on the Waggle sensor platform for Ugly Boxes and Ugly Kits (yes, that’s really what they’re called, and yes, they could stand to be slightly more attractive) with Peter Beckman and Rajesh Sankaran. Waggle is an integrated, intelligent, attentive sensor designed and developed at Argonne National Laboratory. The Waggle platform enables distributed sensing through edge computing and on-the-node data storage, a deviation from traditional sensors that simply collect data. While the Array of Things (AOT) node is the official design that is deployed in the city of Chicago, the Ugly Box allows for experimentation and development of both the core Waggle platform and also testing and integrating new sensors. One of the main advantages of the Ugly Boxes is that they are also conducive to use with smaller Arduino or embedded modules like Photon and Electron Particle boards, that can transfer important information via WiFi and cellular communication. The sensors are adaptable to fit the needs of the area and situation in which they are deployed. The nodes can gather data a wide variety of data from conventional environmental parameters like barometric pressure and sulfur dioxide concentrations, to computed inferences through computer vision and machine learning algorithms deployed on the nodes. This makes it possible for the nodes to detect standing water to signal flooding, sky color and cloud cover, and the number of pedestrians in an intersection, among others.

I’ve been streamlining Ugly Box manufacture, plugin creation, and sensor testing and documentation. I’ve enjoyed learning new things like coding in Python and C. The sensors I’m working on will be deployed at Indian Boundary Prarie and the Chicago Botanic Gardens. I’m also working on sensors for Center for Neighborhood Technology’s (CNT) RainReady initiative to prepare ground and surface water level, and sump pump detection sensors in the basements of the homes of individuals living in the Chatham area looking to prevent residential flooding. Flooding is an important problem to address because it causes serious health problems and property damage. I’ve been testing sensor interfaces with the Ugly Box setup to ensure reliable quality in a variety of environmental conditions. In order to ensure these sensors can be used in homes, or anywhere really, I’ve been working on a Python script for publishing sensor data from Ugly Box, Photon, and Electron to the Waggle cloud, “Beehive”, a platform where data can be manipulated and analyzed. Creating an open-source system for sensor deployment will further the goal of extensibility, ease of use and adaptability. These endeavors align with the common principles of modularity in the intelligent, cloud computing, and urban-sensing Waggle platform.

The variety and quantity of data collected by the AoT nodes and Ugly Boxes have implications for policy improvements in many arenas, including public health, urban planning, urban heat insland effect quantification, and flood mapping. Additionally, groups like Chi Hack Night, composed of civically-focused students and professionals in the community who work with the City of Chicago’s open source data, will be better able to serve the community as a result of locality specific data. Capturing the pulse of the city through the AoT nodes and Ugly Boxes will increase Chicago’s operation efficiency, and improve the quality of life of its residents.

I’ve had so much fun this summer. Getting to know everyone in the office, meeting interns from other schools, and exploring different research career paths has been great. When the Waggle team isn’t working, you can find us at Wrigley field or an Indian buffet. Employees who dine together, stay together.

Waggle interns at Wrigley Field.
   Waggle interns at Wrigley Field.
Waggle Team enjoying lunch at an Indian buffet.

Fun and informative science communication workshop

Students are gearing up to present their work at a seminar this week and Michelle Paulsen and Byron Stewart Northwestern’s Ready Set Go (RSG) program that trains students in science communication visited Argonne to work with NAISE undergraduate researchers.

Students considered key elements of presentation preparation including knowing your audience, avoiding jargon, and framing the problem. They worked through some improvisations such as explaining to a partner, who posed as a time traveler, how and why to get through airport security.  This exercise relates to  explaining your research to a non-expert. They considered how to be persuasive as they convinced their cohorts to join them at a favorite lunch spot. And they practiced delivering their presentations one. word. at. a. time. They also received feedback on their upcoming presentations. We’re grateful to RSG for the visit and great insights and looking forward to student presentations this week at Argonne and early September at Northwestern. Students have the opportunity to showcase their valuable contributions and build bridges to collaborations between Argonne and Northwestern.

Growth of Ferromagnetic Cobalt Nanotubes using Atomic Layer Deposition

Hello everyone! My name is Braxton Cody, and next year I will be in my third year at Northwestern University studying Mechanical Engineering. This summer I have been working with Jonathan Emery of the Material Science Department at Northwestern University and Charudatta Phatak of the Material Science Division at Argonne National Laboratory to figure out how to create cylindrical nanotubes of Cobalt metal to characterize and the magnetic domain walls based on the curvature of the metal. To guarantee even coatings of Cobalt metal, atomic layer deposition (ALD) techniques will be used as a means to gain extremely precise control the thickness of the metal on the order of angstroms.

During the first part of this summer, I collected data regarding a specific ALD process for Cobalt recently developed by the Winter Group of Wayne State University. After establishing the specific processing parameters, becoming familiar with the various characterization techniques, and ordering the chemicals required for the ALD process, we began implanting the process resulting in varying degrees of success. We did manage to grow Cobalt metal as seen in the x-ray fluorescence data shown below. The data for the Pt/Co multilayer (shown in beige) was created using sputtering deposition as a standard for  comparison, providing information about the measurement sensitivity and accuracy.

Despite growing Cobalt metal, we soon discovered that a preparation error resulted in exposing the chemical to air ruining the precursor chemical. We ran a test to confirm our suspicions, and in the x-ray fluorescence data, there is no energy characteristic peak corresponding to Cobalt atoms, as seen in the figure below. Due to the expense of the chemical, we’ve begun pursuing alternate Co metal growth methods using plasma-enhanced ALD

Although we have encountered numerous setbacks, we will continue working towards controlled ALD of Cobalt. We are now researching new options involving Plasma-enhanced ALD as a cheaper and potentially more effective option. Due to the timeframe of this project, we hope to test this new chemistry on flat substrates and confirm Co metal deposition before the end of the summer. If that endeavor proves successful, we will proceed to grow the nanotubes and characterize their magnetic structures. While the previous weeks’ results have not proven successful, they have provided us with new directions for this project and set the groundwork that is instrumental in the continuation of this project.

Computer Vision and Detecting Flooding in Chicago

Hi everyone! My name is Ethan Trokie and I’m a rising junior at Northwestern University and I’m studying computer engineering. I’m currently working with Pete Beckman, Zeeshan Nadir, and Nicola Ferrier as part of the Waggle research project in the Mathematics and Computer Science Division. The goal of the Waggle research project is to deploy an array of sensors all over Chicago to detect different things such as air quality, noise, and other factors. The data that will be collected will become open source so that scientists and policy makers can work together to make new discoveries and informed policy. What waggle is doing is a massive shift from previous environmental science data collection techniques. Previously scientists used very large sensors that are very expensive and precise but very sparse.  Waggle is trying to move towards small sensors that are much less expensive and slightly less precise, but there are a lot more of them. This new technique can give scientist much more localized data which can lead to novel discoveries.

What I’m working on specifically is machine learning and computer vision that runs locally on the Waggle nodes, which are what we call the containers which hold all of the sensors. My task is to use a camera that is on the Waggle node to detect flooding in the streets using just the camera. This can help the city of Chicago get data where flooding commonly happens and can help then clean up the flooding faster by knowing where the actual flooding is happening.

I’ve spent this summer so far learning what machine learning is and how to use it to detect water. What makes my project interesting is the fact that water is difficult to detect because water doesn’t have a shape or color, so it’s difficult to tell the computer exactly what to look for. But there has been some research into detecting moving water and I’ve created a good detector in python by just looking at a short video. Below are two sample videos that my program has classified. The center image is a frame from the video, the left most image is the mask over the non-water that my program created, and the right most image is the mask over the actual image.

  

 

Next I am going to improve this classifier to become even more accurate. In addition, right now it only really works on moving water, but I hope to be able to expand this machine learning to be able to classify standing water as well. I’m excited to get more acquainted with different types of machine learning algorithms and hopefully see my code run on a waggle node in Chicago and see if it creates positive impact on Chicago.

Data Preprocessing for Predictive Models

Greetings, I am Connor Moen! I’m a rising sophomore at Northwestern University studying computer science and environmental engineering. This summer I am working under Dr. Stefan Wild at Argonne National Laboratory, where I am assisting him with developing accurate flood prediction models for the City of Chicago. The goal of these models is to analyze weather conditions and soil moisture on a block-by-block basis (or, for the time being, where sensors are installed) and then determine if flooding will occur. This knowledge can be used to notify homeowners in flood-prone regions to prepare for flooding, thereby minimizing property damage and disruption after heavy storms.

I have spent much of the summer collecting vast amounts of data from the Chicago Data Portal and UChicago’s Thoreau Sensor Network, preprocessing it using the AWK programming language, and working to visualize it in MATLAB. Below is a MATLAB plot showing the Volumetric Water Content for all sensors in the Thoreau Network over the past few months.

The future of the project will involve qualitatively describing the trends we see in our data (for example, might the uncharacteristic behavior seen in a number of sensors after mid-June be caused by an outside factor such as sprinklers?), and then writing, testing, and refining the predictive models. Personally, I am most excited to dive into these predictive models; I am fascinated by the idea of combining environmental sensing with machine learning in order to directly help those living in my neighboring city.

Machine Learning and the Power Flow Equations

Hello! My name is Wesley Chan, and I’m a rising junior studying computer science and economics at Northwestern University. This summer I’m interning at Argonne in the CEEESA (Center for Energy, Environmental, and Economic Systems Analysis) division. I’m working with my PI, Dr. Daniel Molzahn, to research the topic of worst-case errors in linearizations of the power flow equations for electric power systems.

What does that even mean? Well to break it down, steady-state mathematical models of electric power grids are formulated as systems of nonlinear “power flow” equations. The power flow equations form the key constraints in many optimization problems used to ensure reliable and economically efficient operation of electric power grids. However, the nonlinearities in the power flow equations result in challenging optimization problems. In many practical applications, such as electricity markets, linear approximations of the power flow equations are used in order to obtain tractable optimization problems. These linear approximations induce errors in the solutions of the optimization problems relative to the nonlinear power flow equations. In addition, characterizing the accuracies of the power flow approximations has proved extremely challenging and test-case dependent.

A depiction of electric power generation, transmission, and distribution in our grid system.

As a result, the research effort Dr. Molzahn is trying to carry out aims to develop new characterizations of the problem features that yield large errors in the power flow linearizations through utilizing a variety of data analysis and machine learning methods. If accurate and reliable characterizations can be made, it would allow power system operators to identify when the results of their optimization problems may be erroneous, thus improving the reliability and economic efficiency of electric power grids.

So what I’ve been working on is building and implementing a number of different machine learning algorithms in order to help accomplish that. One of those algorithms I’ve developed is a multilayer perception neural network using Python and Tensorflow. Using the IEEE 14 bus test case, we were able generate actual optimization results for an AC and DC case using Matlab and Matpower. With the data from those results, I was able to create a dataset with enough samples and features to train on. I would use the neural network model to predict the difference between optimal cost generated from the AC model vs the DC model. The neural network takes in the data, splits it into training and testing sets, and then using forward and back propagation, will iterate through a specified number of epochs, and learn the data, minimizing error on each epoch using stochastic gradient descent.

Because I am still relatively new to machine learning and Tensorflow, I ran into some difficulties trying to build the model. For close to a full week, there was a bug in my code that was yielding an uncharacteristically large error no matter how many epochs I trained the model on. I tried countless different things in order to remedy this. Finally, I realized the bug lied in the fact that I was “normalizing” my input data (a technique I read somewhere online to help deal with varying scales in the features) when I should have been “scaling” it. A simple one word fix helped change my results drastically. With that change, my model went from making predictions with a mean squared error of 600, to a mean squared error of 0.8. Given that the range of optimal cost difference was between 300-600 dollars, a mean squared error of 0.8 was less than a 0.01% average error.

Following that, I’m now working on generalizing the neural network model to predict other relevant aspects such as the power generation from each bus, and the power generation cost of each bus. I’m excited to gain more hands on experience with machine learning, to work more on this topic, and to see what kind of results we can get!