Connect with us

Uncategorized

Doubling down on known protein families

Imagine researchers exploring a dark room with a flashlight, only able to clearly identify what falls within that single beam. When it comes to microbial…

Published

on

Imagine researchers exploring a dark room with a flashlight, only able to clearly identify what falls within that single beam. When it comes to microbial communities, scientists have historically been unable to see beyond the beam — worse, they didn’t even know how big the room is.

Credit: Samantha Trieu/Berkeley Lab

Imagine researchers exploring a dark room with a flashlight, only able to clearly identify what falls within that single beam. When it comes to microbial communities, scientists have historically been unable to see beyond the beam — worse, they didn’t even know how big the room is.

A new study published online October 11, 2023 in Nature highlights the vast array of functional diversity of microbes through a novel approach to better understand microbial communities by looking at protein function within them. The work was led by a team of scientists at the U.S. Department of Energy (DOE) Joint Genome Institute (JGI), a DOE Office of Science User Facility located at Lawrence Berkeley National Laboratory (Berkeley Lab), and collaborators across multiple other research centers around the world.

“We’ve more than doubled the number of protein families known up until now, and identified many novel structure predictions,” said lead author on the paper Georgios Pavlopoulos, now a research director at the Biomedical Sciences Research Center Alexander Fleming. “This was a massive analysis of 1.3 billion proteins with massively parallel computations.”

Guided by JGI scientists, the team embarked on a mission to unveil the mysteries concealed within the “dark” functional realm. Their focus sharpened on deciphering the intricate world of protein functional diversity: the novel protein families and novel functions in as-yet unveiled microbes. Harnessing the collective power of more than 26,000 microbiome datasets, all accessible through the publicly available Integrated Microbial Genomes & Microbiomes (IMG/M) database, they successfully crafted the Novel Metagenome Protein Families (NMPF) Catalog.

“We can now analyze new datasets by comparing against these protein families, or further analyze the protein families in order to predict new functions,” said Nikos Kyrpides, senior author of the study and head of the JGI’s Microbiome Data Science group.

Shining a Light on Functional “Dark Matter”

Microbial communities living everywhere from soils and stomachs to the deep sea are capable of doing a lot of unique things when it comes to energy cycles — turning biomass into things like ethanol or hydrogen, or solar energy into hydrogen.

Microbial communities are also incredibly difficult to study. Many of the microbes within them cannot be cultivated in lab settings. Since each microbial community has its own unique makeup of microbe players and the functions they perform, artificially replicating a whole community is impossible.

Metagenomic sequencing allows researchers to study the entire genetic makeup of these communities via whole genome sequencing of the samples, without being able to distinguish which gene belongs to each individual microbial species within a community. Therefore, the process hinges on referencing to existing genome sequences.

Some of these proteins are what the scientists call “known knowns” — that is, they are similar to genes with known function. Others are called “known unknowns” — that is, they are similar to previously known genes from isolate organisms, but we still aren’t sure of their function.

However, if a gene in the community doesn’t match any of the previously known genes from isolates, there isn’t much scientists can tell about its function or its origin. As a result, these genes were typically discarded from any analysis as useless information. These represent the “unknown unknowns” because they aren’t similar to anything we’ve already defined.

“A huge percentage — around 30–50% of the protein families that we knew so far — still does not have any known function, but we knew the families,” Kyrpides said. Yet, “almost 20 years of metagenomic data and metagenomic analysis, and still there has been no real analysis of protein families from metagenomes per se.”

Recently, other research teams have leveraged the power of artificial intelligence to decode the language of protein sequences and obtain hints of their possible functions. Yet these efforts were limited to the realm of already-known protein sequences.

“In this endeavor, we have not only ventured into the uncharted territory of understanding the vast landscape of functional diversity, but we have also pushed the boundaries by applying AI methodologies to unravel their roles,” Pavlopoulos said. “Consequently, we have amassed an extensive repository of groundbreaking insights, significantly expanding the horizons of potential functions across various categories of proteins, including those with pivotal applications in biotechnology, such as DNA editing enzymes.”

Leveraging Protein Families in a New Way

The discovery of new protein families had started to plateau in recent years, perhaps suggesting that scientists had “captured” much of the diversity out there, even if it hadn’t yet defined what it did, exactly. But what kind of diversity might those “unknown unknowns” hold?

The team started with 8 billion metagenome genes from IMG (the study also references data from the JGI’s Genomes from Earth’s Microbiome, or GEM catalog). Then they removed any genes with even a remote similarity to previously known genes, leaving them with around 1.2 billion novel genes.

They took what they were left with and clustered them into families. From there they focused on families with at least 100 members.

“If you have 100 sequences, the quality of the cluster is significantly higher because it is very hard to have 100 sequences from different locations or habitats that align very well, randomly,” Kyrpides explained. “Replicating that 100 times would have been almost impossible.”

When the team was finished with this phase, they found that the protein family diversity within this metagenomic space (the “unknown unknowns”) was vastly greater than that of the reference genomes — by at least double.

“As we keep on adding more samples, we’re getting more protein families,” Kyrpides said. “In a few years, as we keep on sequencing more metagenomes, some of the clusters that have currently 50 members or more will grow to 100 members or more as well. So, we’re saying diversity has doubled, but in reality it could be three or four or five or tenfold more out there.”

Digging Further into an Array of Diversity

While the team didn’t drill down function, they were able to further characterize these families. They divided the protein families up by environment and found only 7% of protein families were shared across all eight environmental categories. Instead, families preferred a specific environment — whether that be soil, animal hosts, marine ecosystems, etc.

“So, they must be doing something interesting or important for that habitat,” Pavlopoulos explained. “That is definitely material that the scientific community now can use further. Let’s say somebody is working on soil environments or the human body — they may take some of those families and try to functionally characterize them because they are very specific to that habitat.”

Taxonomic analysis found that the majority of these protein families belonged to bacteria and viruses, though 6 million of the sequences evaded classification. Researchers also tried to hone in on the function of the genes via 3D modeling, and comparing structures of the unknown to those of the known — similar structure equates to high likelihood of similar function. The team also identified protein families with completely novel structures.

The computational power to perform this level of analysis hinged on access to the National Energy Research Scientific Computing Center, another user facility at Berkeley Lab.

“It’s also a credit to Aydin Buluç’s team with Berkeley Lab’s Applied Mathematics and Computational Research Division,” Pavlopoulos said. “They developed parallel algorithms to perform ‘all-vs-all’ comparisons and graph clustering able to run in such highly parallel infrastructures.”

This is the first time protein structures have been used to help characterize the vast array of microbial dark matter. The study took roughly two years to complete, with only about 20,000 metagenomes sequenced at the time. Now, that number is closer to 60,000.

“There is still 70–80% of known microbial diversity out there that is not yet captured genomically,” Kyrpides said. “So, that diversity is definitely holding a lot of new secrets in terms of functional diversity as well.”

Researchers from Harvard University, Indiana University. University of Crete (Greece). Georgia Institute of Technology, Michigan State University, Lawrence Livermore National Laboratory, University of Washington, Centre for Research & Technology Hellas (Greece), Aristotle University of Thessalonica (Greece), and the University of California, Berkeley were also involved in the work. Other authors on the paper are Fotis Baltoumas, Sirui Liu, Oguz Selvitopi, Antonio Camargo Stephen Nayfach, Ariful Azad, Simon Roux, Lee Call, Natalia N. Ivanova, I Min Che, David Paez-Espino, Evangelos Karatzas, Novel Metagenome Protein Families Consortium, Ioannis Iliopoulos, Konstantinos Konstantinidis, James M. Tiedje, Jennifer Pett-Ridge, David Baker, Axel Visel, Christos A. Ouzounis, and Sergey Ovchinnikov.

 

Publication: Pavlopoulos G et al. Unraveling the functional dark matter through global metagenomics. Nature. 2023 October 11. doi: 10.1038/s41586-023-06583-7

 

***

The U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility at Lawrence Berkeley National Laboratory, is committed to advancing genomics in support of DOE missions related to clean energy generation and environmental characterization and cleanup. The JGI provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges. Follow @jgi on Twitter.

###

Founded in 1931 on the belief that the biggest scientific challenges are best addressed by teams, Lawrence Berkeley National Laboratory and its scientists have been recognized with 16 Nobel Prizes. Today, Berkeley Lab researchers develop sustainable energy and environmental solutions, create useful new materials, advance the frontiers of computing, and probe the mysteries of life, matter, and the universe. Scientists from around the world rely on the Lab’s facilities for their own discovery science. Berkeley Lab is a multiprogram national laboratory, managed by the University of California for the U.S. Department of Energy’s Office of Science.

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.


Read More

Continue Reading

Uncategorized

February Employment Situation

By Paul Gomme and Peter Rupert The establishment data from the BLS showed a 275,000 increase in payroll employment for February, outpacing the 230,000…

Published

on

By Paul Gomme and Peter Rupert

The establishment data from the BLS showed a 275,000 increase in payroll employment for February, outpacing the 230,000 average over the previous 12 months. The payroll data for January and December were revised down by a total of 167,000. The private sector added 223,000 new jobs, the largest gain since May of last year.

Temporary help services employment continues a steep decline after a sharp post-pandemic rise.

Average hours of work increased from 34.2 to 34.3. The increase, along with the 223,000 private employment increase led to a hefty increase in total hours of 5.6% at an annualized rate, also the largest increase since May of last year.

The establishment report, once again, beat “expectations;” the WSJ survey of economists was 198,000. Other than the downward revisions, mentioned above, another bit of negative news was a smallish increase in wage growth, from $34.52 to $34.57.

The household survey shows that the labor force increased 150,000, a drop in employment of 184,000 and an increase in the number of unemployed persons of 334,000. The labor force participation rate held steady at 62.5, the employment to population ratio decreased from 60.2 to 60.1 and the unemployment rate increased from 3.66 to 3.86. Remember that the unemployment rate is the number of unemployed relative to the labor force (the number employed plus the number unemployed). Consequently, the unemployment rate can go up if the number of unemployed rises holding fixed the labor force, or if the labor force shrinks holding the number unemployed unchanged. An increase in the unemployment rate is not necessarily a bad thing: it may reflect a strong labor market drawing “marginally attached” individuals from outside the labor force. Indeed, there was a 96,000 decline in those workers.

Earlier in the week, the BLS announced JOLTS (Job Openings and Labor Turnover Survey) data for January. There isn’t much to report here as the job openings changed little at 8.9 million, the number of hires and total separations were little changed at 5.7 million and 5.3 million, respectively.

As has been the case for the last couple of years, the number of job openings remains higher than the number of unemployed persons.

Also earlier in the week the BLS announced that productivity increased 3.2% in the 4th quarter with output rising 3.5% and hours of work rising 0.3%.

The bottom line is that the labor market continues its surprisingly (to some) strong performance, once again proving stronger than many had expected. This strength makes it difficult to justify any interest rate cuts soon, particularly given the recent inflation spike.

Read More

Continue Reading

Uncategorized

Mortgage rates fall as labor market normalizes

Jobless claims show an expanding economy. We will only be in a recession once jobless claims exceed 323,000 on a four-week moving average.

Published

on

Everyone was waiting to see if this week’s jobs report would send mortgage rates higher, which is what happened last month. Instead, the 10-year yield had a muted response after the headline number beat estimates, but we have negative job revisions from previous months. The Federal Reserve’s fear of wage growth spiraling out of control hasn’t materialized for over two years now and the unemployment rate ticked up to 3.9%. For now, we can say the labor market isn’t tight anymore, but it’s also not breaking.

The key labor data line in this expansion is the weekly jobless claims report. Jobless claims show an expanding economy that has not lost jobs yet. We will only be in a recession once jobless claims exceed 323,000 on a four-week moving average.

From the Fed: In the week ended March 2, initial claims for unemployment insurance benefits were flat, at 217,000. The four-week moving average declined slightly by 750, to 212,250


Below is an explanation of how we got here with the labor market, which all started during COVID-19.

1. I wrote the COVID-19 recovery model on April 7, 2020, and retired it on Dec. 9, 2020. By that time, the upfront recovery phase was done, and I needed to model out when we would get the jobs lost back.

2. Early in the labor market recovery, when we saw weaker job reports, I doubled and tripled down on my assertion that job openings would get to 10 million in this recovery. Job openings rose as high as to 12 million and are currently over 9 million. Even with the massive miss on a job report in May 2021, I didn’t waver.

Currently, the jobs openings, quit percentage and hires data are below pre-COVID-19 levels, which means the labor market isn’t as tight as it once was, and this is why the employment cost index has been slowing data to move along the quits percentage.  

2-US_Job_Quits_Rate-1-2

3. I wrote that we should get back all the jobs lost to COVID-19 by September of 2022. At the time this would be a speedy labor market recovery, and it happened on schedule, too

Total employment data

4. This is the key one for right now: If COVID-19 hadn’t happened, we would have between 157 million and 159 million jobs today, which would have been in line with the job growth rate in February 2020. Today, we are at 157,808,000. This is important because job growth should be cooling down now. We are more in line with where the labor market should be when averaging 140K-165K monthly. So for now, the fact that we aren’t trending between 140K-165K means we still have a bit more recovery kick left before we get down to those levels. 




From BLS: Total nonfarm payroll employment rose by 275,000 in February, and the unemployment rate increased to 3.9 percent, the U.S. Bureau of Labor Statistics reported today. Job gains occurred in health care, in government, in food services and drinking places, in social assistance, and in transportation and warehousing.

Here are the jobs that were created and lost in the previous month:

IMG_5092

In this jobs report, the unemployment rate for education levels looks like this:

  • Less than a high school diploma: 6.1%
  • High school graduate and no college: 4.2%
  • Some college or associate degree: 3.1%
  • Bachelor’s degree or higher: 2.2%
IMG_5093_320f22

Today’s report has continued the trend of the labor data beating my expectations, only because I am looking for the jobs data to slow down to a level of 140K-165K, which hasn’t happened yet. I wouldn’t categorize the labor market as being tight anymore because of the quits ratio and the hires data in the job openings report. This also shows itself in the employment cost index as well. These are key data lines for the Fed and the reason we are going to see three rate cuts this year.

Read More

Continue Reading

Uncategorized

Inside The Most Ridiculous Jobs Report In History: Record 1.2 Million Immigrant Jobs Added In One Month

Inside The Most Ridiculous Jobs Report In History: Record 1.2 Million Immigrant Jobs Added In One Month

Last month we though that the January…

Published

on

Inside The Most Ridiculous Jobs Report In History: Record 1.2 Million Immigrant Jobs Added In One Month

Last month we though that the January jobs report was the "most ridiculous in recent history" but, boy, were we wrong because this morning the Biden department of goalseeked propaganda (aka BLS) published the February jobs report, and holy crap was that something else. Even Goebbels would blush. 

What happened? Let's take a closer look.

On the surface, it was (almost) another blockbuster jobs report, certainly one which nobody expected, or rather just one bank out of 76 expected. Starting at the top, the BLS reported that in February the US unexpectedly added 275K jobs, with just one research analyst (from Dai-Ichi Research) expecting a higher number.

Some context: after last month's record 4-sigma beat, today's print was "only" 3 sigma higher than estimates. Needless to say, two multiple sigma beats in a row used to only happen in the USSR... and now in the US, apparently.

Before we go any further, a quick note on what last month we said was "the most ridiculous jobs report in recent history": it appears the BLS read our comments and decided to stop beclowing itself. It did that by slashing last month's ridiculous print by over a third, and revising what was originally reported as a massive 353K beat to just 229K,  a 124K revision, which was the biggest one-month negative revision in two years!

Of course, that does not mean that this month's jobs print won't be revised lower: it will be, and not just that month but every other month until the November election because that's the only tool left in the Biden admin's box: pretend the economic and jobs are strong, then revise them sharply lower the next month, something we pointed out first last summer and which has not failed to disappoint once.

To be fair, not every aspect of the jobs report was stellar (after all, the BLS had to give it some vague credibility). Take the unemployment rate, after flatlining between 3.4% and 3.8% for two years - and thus denying expectations from Sahm's Rule that a recession may have already started - in February the unemployment rate unexpectedly jumped to 3.9%, the highest since February 2022 (with Black unemployment spiking by 0.3% to 5.6%, an indicator which the Biden admin will quickly slam as widespread economic racism or something).

And then there were average hourly earnings, which after surging 0.6% MoM in January (since revised to 0.5%) and spooking markets that wage growth is so hot, the Fed will have no choice but to delay cuts, in February the number tumbled to just 0.1%, the lowest in two years...

... for one simple reason: last month's average wage surge had nothing to do with actual wages, and everything to do with the BLS estimate of hours worked (which is the denominator in the average wage calculation) which last month tumbled to just 34.1 (we were led to believe) the lowest since the covid pandemic...

... but has since been revised higher while the February print rose even more, to 34.3, hence why the latest average wage data was once again a product not of wages going up, but of how long Americans worked in any weekly period, in this case higher from 34.1 to 34.3, an increase which has a major impact on the average calculation.

While the above data points were examples of some latent weakness in the latest report, perhaps meant to give it a sheen of veracity, it was everything else in the report that was a problem starting with the BLS's latest choice of seasonal adjustments (after last month's wholesale revision), which have gone from merely laughable to full clownshow, as the following comparison between the monthly change in BLS and ADP payrolls shows. The trend is clear: the Biden admin numbers are now clearly rising even as the impartial ADP (which directly logs employment numbers at the company level and is far more accurate), shows an accelerating slowdown.

But it's more than just the Biden admin hanging its "success" on seasonal adjustments: when one digs deeper inside the jobs report, all sorts of ugly things emerge... such as the growing unprecedented divergence between the Establishment (payrolls) survey and much more accurate Household (actual employment) survey. To wit, while in January the BLS claims 275K payrolls were added, the Household survey found that the number of actually employed workers dropped for the third straight month (and 4 in the past 5), this time by 184K (from 161.152K to 160.968K).

This means that while the Payrolls series hits new all time highs every month since December 2020 (when according to the BLS the US had its last month of payrolls losses), the level of Employment has not budged in the past year. Worse, as shown in the chart below, such a gaping divergence has opened between the two series in the past 4 years, that the number of Employed workers would need to soar by 9 million (!) to catch up to what Payrolls claims is the employment situation.

There's more: shifting from a quantitative to a qualitative assessment, reveals just how ugly the composition of "new jobs" has been. Consider this: the BLS reports that in February 2024, the US had 132.9 million full-time jobs and 27.9 million part-time jobs. Well, that's great... until you look back one year and find that in February 2023 the US had 133.2 million full-time jobs, or more than it does one year later! And yes, all the job growth since then has been in part-time jobs, which have increased by 921K since February 2023 (from 27.020 million to 27.941 million).

Here is a summary of the labor composition in the past year: all the new jobs have been part-time jobs!

But wait there's even more, because now that the primary season is over and we enter the heart of election season and political talking points will be thrown around left and right, especially in the context of the immigration crisis created intentionally by the Biden administration which is hoping to import millions of new Democratic voters (maybe the US can hold the presidential election in Honduras or Guatemala, after all it is their citizens that will be illegally casting the key votes in November), what we find is that in February, the number of native-born workers tumbled again, sliding by a massive 560K to just 129.807 million. Add to this the December data, and we get a near-record 2.4 million plunge in native-born workers in just the past 3 months (only the covid crash was worse)!

The offset? A record 1.2 million foreign-born (read immigrants, both legal and illegal but mostly illegal) workers added in February!

Said otherwise, not only has all job creation in the past 6 years has been exclusively for foreign-born workers...

Source: St Louis Fed FRED Native Born and Foreign Born

... but there has been zero job-creation for native born workers since June 2018!

This is a huge issue - especially at a time of an illegal alien flood at the southwest border...

... and is about to become a huge political scandal, because once the inevitable recession finally hits, there will be millions of furious unemployed Americans demanding a more accurate explanation for what happened - i.e., the illegal immigration floodgates that were opened by the Biden admin.

Which is also why Biden's handlers will do everything in their power to insure there is no official recession before November... and why after the election is over, all economic hell will finally break loose. Until then, however, expect the jobs numbers to get even more ridiculous.

Tyler Durden Fri, 03/08/2024 - 13:30

Read More

Continue Reading

Trending