How can we more easily produce high quality protein crystals to be used for x-ray crystallography for protein structure identification?

Hi! My name is Josh Werblin and I am a rising junior at Northwestern University studying biomedical engineering. This summer, I have been working in Gyorgy Babnigg’s lab, investigating the applicability of droplet-based microfluidics in obtaining high-quality protein crystals.
He is part of the Midwest Center for Structural Genomics (MCSG, PI: Andrzej Joachimiak) and the Center for Structural Genomics of Infectious Diseases (CSGID, co-directors: Karla Satchell, Andrzej Joachimiak), two structural biology efforts that offer atomic-level insights into the function of proteins and their complexes.  Currently, high-quality protein crystals are difficult to generate, and account for one of the major bottlenecks in structural biology. During my summer project, I tested the feasibility of using droplet-based microfluidics to generate crystals amenable for structural studies.

Generation of protein droplets GIF
Slowed footage of droplets being generated. The two channels from the left are the reagent channels, protein and crystallization screen, respectively. The vertical channels are oil that pinches off droplets of the reagents.

To get crystals to form, you need to combine a purified protein with crystallization screens, which is a set of 96 mix of buffers, salts, and precipitants. Unfortunately, finding the right crystallization condition to make crystals for a protein can take a lot of testing. Even when you figure out a compatible crystallization after testing many crystallization screens and incubation conditions for a protein that yields crystals, the resulting crystals are too tiny for data collection, only a few microns across. Droplet-based microfluidics has the potential to test the many combinations using only small amount of protein for testing.
I tested the crystallization of a previously characterized protein using the microfluidic setting. Small aqueous droplets are formed in fluorinated oil. This can be done by using a small chip and pumping your reagents in. This stream of protein and crystallization screen are cut off by two streams of oil, which creates the droplets.
Using droplet-based microfluidics, I generated tiny droplets containing the protein and crystallization screen at the right concentrations, and was able to grow relatively large crystals (over 50 microns).
The protein I’ve focused on this summer is an enzyme called sialate O-acetylesterase from one of the good bacteria in our gut (Bacteroides vulgatus). After testing different concentrations and conditions, I finally generated some really good-looking crystals!
Two images of droplets showing multiple different sized crystals
Left: Small variations in the ratio of protein and crystallization solution result in droplets with single large crystals, a few smaller crystals, or many microcrystals.
Right: An image of big protein crystals ready to be put onto the beam-line for x-ray crystallography.

With everything all ready, Gyorgy and his colleague, Youngchang Kim, were able to test these crystals at the SBC beam-line of the Advanced Photon Source and hopefully soon I will have the structure of this protein!
I am so thankful to have had this opportunity to work in this lab with wonderful and knowledgeable people and I learned so much about 3D design and printing, experimental design, and proteins and their crystallization.

Students present their summer's work at Argonne

This morning the students delivered compelling presentations about their research in a diverse set of areas that have been highlighted on this blog.  They did an excellent job communicating the complexities of their work to an audience made up of technical experts in a range of disciplines.  This event truly showcased their talents as researchers and communicators.   

Ugly Boxes + Experimental Sensors

Hello! I’m Jordan Fleming, a recent graduate of Northwestern University’s Mechanical Engineering and Environmental Engineering departments. I’ll soon be starting my Master’s with a focus in Water and Energy Engineering. I’ve spent this summer working on the Waggle sensor platform for Ugly Boxes and Ugly Kits (yes, that’s really what they’re called, and yes, they could stand to be slightly more attractive) with Peter Beckman and Rajesh Sankaran. Waggle is an integrated, intelligent, attentive sensor designed and developed at Argonne National Laboratory. The Waggle platform enables distributed sensing through edge computing and on-the-node data storage, a deviation from traditional sensors that simply collect data. While the Array of Things (AOT) node is the official design that is deployed in the city of Chicago, the Ugly Box allows for experimentation and development of both the core Waggle platform and also testing and integrating new sensors. One of the main advantages of the Ugly Boxes is that they are also conducive to use with smaller Arduino or embedded modules like Photon and Electron Particle boards, that can transfer important information via WiFi and cellular communication. The sensors are adaptable to fit the needs of the area and situation in which they are deployed. The nodes can gather data a wide variety of data from conventional environmental parameters like barometric pressure and sulfur dioxide concentrations, to computed inferences through computer vision and machine learning algorithms deployed on the nodes. This makes it possible for the nodes to detect standing water to signal flooding, sky color and cloud cover, and the number of pedestrians in an intersection, among others.
I’ve been streamlining Ugly Box manufacture, plugin creation, and sensor testing and documentation. I’ve enjoyed learning new things like coding in Python and C. The sensors I’m working on will be deployed at Indian Boundary Prarie and the Chicago Botanic Gardens. I’m also working on sensors for Center for Neighborhood Technology’s (CNT) RainReady initiative to prepare ground and surface water level, and sump pump detection sensors in the basements of the homes of individuals living in the Chatham area looking to prevent residential flooding. Flooding is an important problem to address because it causes serious health problems and property damage. I’ve been testing sensor interfaces with the Ugly Box setup to ensure reliable quality in a variety of environmental conditions. In order to ensure these sensors can be used in homes, or anywhere really, I’ve been working on a Python script for publishing sensor data from Ugly Box, Photon, and Electron to the Waggle cloud, “Beehive”, a platform where data can be manipulated and analyzed. Creating an open-source system for sensor deployment will further the goal of extensibility, ease of use and adaptability. These endeavors align with the common principles of modularity in the intelligent, cloud computing, and urban-sensing Waggle platform.
The variety and quantity of data collected by the AoT nodes and Ugly Boxes have implications for policy improvements in many arenas, including public health, urban planning, urban heat insland effect quantification, and flood mapping. Additionally, groups like Chi Hack Night, composed of civically-focused students and professionals in the community who work with the City of Chicago’s open source data, will be better able to serve the community as a result of locality specific data. Capturing the pulse of the city through the AoT nodes and Ugly Boxes will increase Chicago’s operation efficiency, and improve the quality of life of its residents.
I’ve had so much fun this summer. Getting to know everyone in the office, meeting interns from other schools, and exploring different research career paths has been great. When the Waggle team isn’t working, you can find us at Wrigley field or an Indian buffet. Employees who dine together, stay together.

Waggle interns at Wrigley Field.

   Waggle interns at Wrigley Field.

Waggle Team enjoying lunch at an Indian buffet.

Fun and informative science communication workshop

Students are gearing up to present their work at a seminar this week and Michelle Paulsen and Byron Stewart Northwestern’s Ready Set Go (RSG) program that trains students in science communication visited Argonne to work with NAISE undergraduate researchers.
Students considered key elements of presentation preparation including knowing your audience, avoiding jargon, and framing the problem. They worked through some improvisations such as explaining to a partner, who posed as a time traveler, how and why to get through airport security.  This exercise relates to  explaining your research to a non-expert. They considered how to be persuasive as they convinced their cohorts to join them at a favorite lunch spot. And they practiced delivering their presentations one. word. at. a. time. They also received feedback on their upcoming presentations. We’re grateful to RSG for the visit and great insights and looking forward to student presentations this week at Argonne and early September at Northwestern. Students have the opportunity to showcase their valuable contributions and build bridges to collaborations between Argonne and Northwestern.

Growth of Ferromagnetic Cobalt Nanotubes using Atomic Layer Deposition

Hello everyone! My name is Braxton Cody, and next year I will be in my third year at Northwestern University studying Mechanical Engineering. This summer I have been working with Jonathan Emery of the Material Science Department at Northwestern University and Charudatta Phatak of the Material Science Division at Argonne National Laboratory to figure out how to create cylindrical nanotubes of Cobalt metal to characterize and the magnetic domain walls based on the curvature of the metal. To guarantee even coatings of Cobalt metal, atomic layer deposition (ALD) techniques will be used as a means to gain extremely precise control the thickness of the metal on the order of angstroms.
During the first part of this summer, I collected data regarding a specific ALD process for Cobalt recently developed by the Winter Group of Wayne State University. After establishing the specific processing parameters, becoming familiar with the various characterization techniques, and ordering the chemicals required for the ALD process, we began implanting the process resulting in varying degrees of success. We did manage to grow Cobalt metal as seen in the x-ray fluorescence data shown below. The data for the Pt/Co multilayer (shown in beige) was created using sputtering deposition as a standard for  comparison, providing information about the measurement sensitivity and accuracy.

Despite growing Cobalt metal, we soon discovered that a preparation error resulted in exposing the chemical to air ruining the precursor chemical. We ran a test to confirm our suspicions, and in the x-ray fluorescence data, there is no energy characteristic peak corresponding to Cobalt atoms, as seen in the figure below. Due to the expense of the chemical, we’ve begun pursuing alternate Co metal growth methods using plasma-enhanced ALD

Although we have encountered numerous setbacks, we will continue working towards controlled ALD of Cobalt. We are now researching new options involving Plasma-enhanced ALD as a cheaper and potentially more effective option. Due to the timeframe of this project, we hope to test this new chemistry on flat substrates and confirm Co metal deposition before the end of the summer. If that endeavor proves successful, we will proceed to grow the nanotubes and characterize their magnetic structures. While the previous weeks’ results have not proven successful, they have provided us with new directions for this project and set the groundwork that is instrumental in the continuation of this project.

Computer Vision and Detecting Flooding in Chicago

Hi everyone! My name is Ethan Trokie and I’m a rising junior at Northwestern University and I’m studying computer engineering. I’m currently working with Pete Beckman, Zeeshan Nadir, and Nicola Ferrier as part of the Waggle research project in the Mathematics and Computer Science Division. The goal of the Waggle research project is to deploy an array of sensors all over Chicago to detect different things such as air quality, noise, and other factors. The data that will be collected will become open source so that scientists and policy makers can work together to make new discoveries and informed policy. What waggle is doing is a massive shift from previous environmental science data collection techniques. Previously scientists used very large sensors that are very expensive and precise but very sparse.  Waggle is trying to move towards small sensors that are much less expensive and slightly less precise, but there are a lot more of them. This new technique can give scientist much more localized data which can lead to novel discoveries.
What I’m working on specifically is machine learning and computer vision that runs locally on the Waggle nodes, which are what we call the containers which hold all of the sensors. My task is to use a camera that is on the Waggle node to detect flooding in the streets using just the camera. This can help the city of Chicago get data where flooding commonly happens and can help then clean up the flooding faster by knowing where the actual flooding is happening.
I’ve spent this summer so far learning what machine learning is and how to use it to detect water. What makes my project interesting is the fact that water is difficult to detect because water doesn’t have a shape or color, so it’s difficult to tell the computer exactly what to look for. But there has been some research into detecting moving water and I’ve created a good detector in python by just looking at a short video. Below are two sample videos that my program has classified. The center image is a frame from the video, the left most image is the mask over the non-water that my program created, and the right most image is the mask over the actual image.
  
 
Next I am going to improve this classifier to become even more accurate. In addition, right now it only really works on moving water, but I hope to be able to expand this machine learning to be able to classify standing water as well. I’m excited to get more acquainted with different types of machine learning algorithms and hopefully see my code run on a waggle node in Chicago and see if it creates positive impact on Chicago.

Data Preprocessing for Predictive Models

Greetings, I am Connor Moen! I’m a rising sophomore at Northwestern University studying computer science and environmental engineering. This summer I am working under Dr. Stefan Wild at Argonne National Laboratory, where I am assisting him with developing accurate flood prediction models for the City of Chicago. The goal of these models is to analyze weather conditions and soil moisture on a block-by-block basis (or, for the time being, where sensors are installed) and then determine if flooding will occur. This knowledge can be used to notify homeowners in flood-prone regions to prepare for flooding, thereby minimizing property damage and disruption after heavy storms.
I have spent much of the summer collecting vast amounts of data from the Chicago Data Portal and UChicago’s Thoreau Sensor Network, preprocessing it using the AWK programming language, and working to visualize it in MATLAB. Below is a MATLAB plot showing the Volumetric Water Content for all sensors in the Thoreau Network over the past few months.

The future of the project will involve qualitatively describing the trends we see in our data (for example, might the uncharacteristic behavior seen in a number of sensors after mid-June be caused by an outside factor such as sprinklers?), and then writing, testing, and refining the predictive models. Personally, I am most excited to dive into these predictive models; I am fascinated by the idea of combining environmental sensing with machine learning in order to directly help those living in my neighboring city.

Machine Learning and the Power Flow Equations

Hello! My name is Wesley Chan, and I’m a rising junior studying computer science and economics at Northwestern University. This summer I’m interning at Argonne in the CEEESA (Center for Energy, Environmental, and Economic Systems Analysis) division. I’m working with my PI, Dr. Daniel Molzahn, to research the topic of worst-case errors in linearizations of the power flow equations for electric power systems.
What does that even mean? Well to break it down, steady-state mathematical models of electric power grids are formulated as systems of nonlinear “power flow” equations. The power flow equations form the key constraints in many optimization problems used to ensure reliable and economically efficient operation of electric power grids. However, the nonlinearities in the power flow equations result in challenging optimization problems. In many practical applications, such as electricity markets, linear approximations of the power flow equations are used in order to obtain tractable optimization problems. These linear approximations induce errors in the solutions of the optimization problems relative to the nonlinear power flow equations. In addition, characterizing the accuracies of the power flow approximations has proved extremely challenging and test-case dependent.

A depiction of electric power generation, transmission, and distribution in our grid system.

As a result, the research effort Dr. Molzahn is trying to carry out aims to develop new characterizations of the problem features that yield large errors in the power flow linearizations through utilizing a variety of data analysis and machine learning methods. If accurate and reliable characterizations can be made, it would allow power system operators to identify when the results of their optimization problems may be erroneous, thus improving the reliability and economic efficiency of electric power grids.
So what I’ve been working on is building and implementing a number of different machine learning algorithms in order to help accomplish that. One of those algorithms I’ve developed is a multilayer perception neural network using Python and Tensorflow. Using the IEEE 14 bus test case, we were able generate actual optimization results for an AC and DC case using Matlab and Matpower. With the data from those results, I was able to create a dataset with enough samples and features to train on. I would use the neural network model to predict the difference between optimal cost generated from the AC model vs the DC model. The neural network takes in the data, splits it into training and testing sets, and then using forward and back propagation, will iterate through a specified number of epochs, and learn the data, minimizing error on each epoch using stochastic gradient descent.
Because I am still relatively new to machine learning and Tensorflow, I ran into some difficulties trying to build the model. For close to a full week, there was a bug in my code that was yielding an uncharacteristically large error no matter how many epochs I trained the model on. I tried countless different things in order to remedy this. Finally, I realized the bug lied in the fact that I was “normalizing” my input data (a technique I read somewhere online to help deal with varying scales in the features) when I should have been “scaling” it. A simple one word fix helped change my results drastically. With that change, my model went from making predictions with a mean squared error of 600, to a mean squared error of 0.8. Given that the range of optimal cost difference was between 300-600 dollars, a mean squared error of 0.8 was less than a 0.01% average error.
Following that, I’m now working on generalizing the neural network model to predict other relevant aspects such as the power generation from each bus, and the power generation cost of each bus. I’m excited to gain more hands on experience with machine learning, to work more on this topic, and to see what kind of results we can get!
 

Toward an Artificial Neuron

Hello, I am AnnElise Hardy, a biomedical engineer at Northwestern University, ‘19, and I am working with Elena Rozhkova in the Nanoscience and Technologies Division as part of the larger Artificial Neuron Group led by Chris Fry.  The group is working towards creating an artificial neuron, a bio-inspired assembly.  The proposed design will place light activated transmembrane proton pumps either taken from the archaea Halobacterium halobium, or created synthetically, on a gold compartmentalized structure in order to create an assembly that can mimic the low-voltage ion flow of a neuron.  These “protocells” are the first step in creating an artificial neuron to then be used in neuromorphic computing systems.
Currently, I am working to isolate the proton pumps, each attempt takes a few days and a couple more days to grow more archaea. Our first few attempts were not successful, but we are adapting our procedure to address what we think the problems are.  For example, we have increased the amount we distress the cells in order to break up the membranes more. If we cannot achieve isolation directly from the archaea, we will then move to create the pumps in a cell-free synthesis, which Dr. Rozhkova has shown here.  The benefit of cell-free synthesis lies in the removal of time- and labor-intensive culturing of the archaea, limiting the issues we have seen in harvesting the pumps at the optimal point of cell growth.

fig 1. Isolated archaea cells before membrane breakdown

Genomic analysis with Argonne's KBase

Hello, my name is Beomsoo (Michael) Park ’19 and I am a biomedical engineer at Northwestern University. I am a part of a research group led by Chris Henry focused on developing a new software called KBase, which is a user-friendly program allowing researchers to quickly run genomic analyses to study their own data, rather than relying on other computational scientists to do it for them. KBase allows users to assemble and annotate microbial genomes, build metabolic models, analyze RNA-seq, and overall work with very large quantities of data all at once using “apps” that my Argonne research group designed. KBase has also incorporated numerous other publically available software, such as IDBA, MegaHit, and MetaSPAdes genome assemblies and even quality assessment tools like QUAST. As a new user, my job will be to test KBase to see how compatible and effective the software is. To do this, I will be using KBase to identify unknown species from a large number of soil samples and will eventually write a publication exploring their behaviors by using metabolic modeling and to show off how KBase was used to perform all of the analyses.
I am currently at the stage in my research where I have fully grasped the understandings behind every program and “app” that I will be using. More specifically, I have been able to upload all of the data onto KBase, run three different types of genome assembly methods (IDBA, MegaHit, MetaSPAdes), and use a program called QUAST to identify the possible species in each particular sample. The pie chart attached shows the distribution of species in one particular sample. For now, I focused on working with a smaller group of samples to use the quality assessment tool within metaQUAST to decide which of the three assembly methods I used is the most accurate, so that I could extrapolate and use this “best method” for all of the samples. In the next coming steps, I will be using KBase to build metabolic models and further exploring the microbial behaviors.

From this summer I wish to learn more about computer programming, since I have a very minimal background knowledge on any programming languages, other than Matlab. I also wish to learn more about metabolic modeling, since I am also very new to this area and was intrigued how we could use computers to predict microbial behaviors. I am hoping to use my time in Argonne in the next following weeks to dive deeper into research to further explore my possible career options.