top of page

Breakthrough Study: Inaccurate Labeling of Commercial Cannabis in U.S. & New Terpene Clusters

A summary of PLOSOne Article: Understanding Phytochemical Diversity of Commercial Cannabis in the U.S., Published May 2022

Cannabis products are commonly marketed to consumers using strain names and categorical labels of Indica, Sativa, or Hybrid to differentiate their effects. But how scientifically accurate is this labeling system inherited from the black market? A breakthrough research project from PlosOne investigated a massive amount of cannabis flower, resulting in the most significant cannabis analysis ever done. 89,923 commercial cannabis samples destined for sale across six states in the U.S legal market were tested for terpene and cannabinoid content to understand the chemical diversity of retail cannabis. Here are some highlights from the report's findings:

Cannabinoid Composition

Not surprisingly, the most abundant cannabinoids were THC, CBD, and CBG. Very few samples had an abundance of minor cannabinoids illustrating that commercial cannabis in the U.S. is much more homogenous than it should be. This is likely due to breeding practices driven by marketing focused on high-THC products.

The cannabinoid analysis also found that CBD-dominant samples had higher than expected THC levels. Due to the potential regulatory significance, it is worth noting that most CBD-dominant strains effectively contained a higher amount of THC than is legally allowed under current cannabis legislation in the U.S. "84.5% of the [CBD-dominant] samples had a total THC above 0.3%, which is the legal threshold for cannabis producers." CBD farms will have significant challenges bringing their low THC products to the market under the current legal framework, known as the 2018 Farm Bill.

Terpene Composition

Of the almost 90,000 flower samples tested, the overall terpene content averaged 2% by weight, with individual terpenes rarely present at more than 0.5% and most testing at less than 0.2%. Myrcene, b-caryophyllene, and limonene were present at the highest levels. Humulene, B-pinene, Linalool, and A-pinene were secondary. Bisabolol, Camphene, Terpinolene, Ocimene, a-Terpinene, y-terpinene, and Nerolidol were commonly present but not in large quantities.

Terpene Diversity

THC-dominant flowers displayed distinct levels of terpene diversity compared to CBD-dominant strains. The lack of terpene diversity in high-CBD cultivars is likely due to the historical focus on breeding THC-dominant cannabis for consumers.

An analysis of CBD-dominant and Balanced CBD:THC samples found a higher proportion of these products displayed myrcene-dominant terpene profiles compared to the high-THC flower samples.

Terpene Co-occurrence

The top fourteen cannabis terpenes listed above were mapped onto various graphs to look for distinct clusters of commonly co-occurring terpenes and understand their relationships. The idea was that strong positive correlations between terpenes, or distinctive groups, would emerge to confirm the validity of commercial categorizations: Indica, Sativa, and Hybrid. The graphs should visually distinguish chemical composition and terpene diversity if these classifications were scientifically accurate. When plotted on a correlation matrix, commonly recurring terpenes were discovered. However, the results did not verify the accuracy of commercial labels.

All THC-dominant flower samples were plotted onto a graph using their primary terpenes and industry designations of Indica, Sativa, or Hybrid. The outcome was highly intermingled, with no evident segregation of data points by the commercial label. It established that a sample labeled Indica will likely have an indistinguishable terpene profile from a hybrid or Sativa sample. Industry labels are poorly aligned with the underlying chemistry of cannabis and are inconsistently assigned. Indica, Sativa, and Hybrid marketing inaccurately represented the products.

Consistency of Strain Names

Currently, there are no enforced rules for naming cannabis varieties and no naming standards in the legal industry. Forty-one commercial strain names were analyzed, with a minimum of 5 samples each. When tested, they displayed different levels of chemical consistency, and the data showed considerable variability across all strains. On average many strain names did not have terpene profile consistency across other producers.

However, some cultivars were more consistent than others. In specific samples, the product name did indicate the dominant terpenes, potentially due to some cultivars being more often clonally cultivated than grown from seed.

Terpene Clusters

One of the more exciting findings of the study was the results gathered from the k-means clustering of the dominant terpenes of all 90,000 samples. K-Means clustering effectively finds consistencies in thousands of data points and always reveals a result by determining the centers and assigning all points. A computer algorithm creates groupings based on proximity and a predetermined number of clusters(k) to set all data points. Once all data points are assigned to a group, the center of these clusters is established. All points re-group and attach to the closest centroid until the algorithm has no possible changes and all data groups with the nearest cluster.

K-means clustering analysis identified three major terpene sets characterized by relatively high levels of specific terpene pairs.

Cluster 1: Caryophyllene + Limonene

Cluster 2: Myrcene + Pinene

Cluster 3: Terpinolene + Myrcene

These distinct categories of terpene clusters need to be further analyzed. Interestingly, Cluster 3 was also associated with modestly higher levels of CBG.


"Legal THC products are marketed to consumers as if there are clear cut associations between a product label and its psychoactive effects." By testing almost 90,000 samples, this research study confirms that Indica, Sativa, and Hybrid labels have a poor relationship to the underlying chemistry of retail cannabis. Strain names could be a better marker for product chemistry but were not consistent. Overall, existing commercial category labels are not reliable indicators to differentiate effects for cannabis consumers. These standard designations did not accurately represent terpene patterns or correlate to the massive dataset.

The research indicates labeling products with their primary terpenes would be a more accurate representation of the chemical diversity of retail cannabis. It is clear that terpene composition most effectively distinguishes differences when categorizing cannabis products for consumers. All commercial labels should include primary terpenes and cannabinoid content to guide consumers' purchases.

Further investigation will need to be conducted into the three distinct categories of terpene clusters that emerged through this research. Altogether this is a groundbreaking study for the cannabis community.


Find the PLOSOne Research Article here.


bottom of page