Skip to main content

Data mining used to catalog watermelon compounds

Prior to the pandemic, the postharvest physiology lab of Dr. Penelope Perkins-Veazie was typically buzzing with activity processing fruits and vegetables to assess how plant compounds change after harvest and/or storage. The past few years have seen research projects on watermelon, blackberry, muscadine, butternut squash and tomatoes, to name a few. When COVID-19’s initial stay at home order went into effect in late March 2020, lab staff who were largely accustomed to extracting data from lab instruments found themselves wondering what job they could do from home. What they learned was a new research technique, one Perkins-Veazie admits they likely would have never had time for between harvest seasons; they were tasked with data mining. 

Erin Deaton | Guoying Ma

With abundant online resources, computer assisted searches of those resources (known as data mining), allow for a large body of work to be reviewed in a relatively short period of time. Requiring little more than a computer, an internet connection and know-how, research technician, Erin Deaton, and research specialist, Guoying (Jenny) Ma were up for the task; contributing to an ongoing research effort, from home.  Dr. Amnon Levi, research geneticist for the USDA-Agricultural Research Service, spearheaded this project and Dr. Larry Parnell, computational biologist at USDA-ARS, provided training and support. The end result was the recent publication, “A Catalog of Natural Products Occurring in Watermelon—Citrullus lanatus,” 

Perkins-Veazie says, “It’s a bit more than a normal publication. It serves as a model paper for defining what’s in just one of the foods we eat–watermelon.” Many watermelon compounds have been previously identified, though some singular compounds have been identified multiple times and given multiple names. A search of the chemistry database might reveal 10 names for the same chemical structure, generally creating more confusion than providing clarity. By combing the literature for previously identified compounds in watermelon, the project aimed to reduce redundancy and gather all known products into one searchable catalog. 

Graphical abstract of watermelon compounds

The effort resulted in nearly 1,700 small molecules being included in the final manuscript. Further, the catalog provides reference to the original literature as well as the concentration of the compound in watermelon–specific to plant part, if known. Each of these phytochemicals are tagged with chemical class, molecular weight and formula, chemical structure, and physical and chemical properties. All the information is housed online at

With this newly available catalog, pharmaceutical companies can identify compounds of interest for new drug development and scientists can use this information to strengthen proposals, whether their research targets breeding new varieties with enhanced concentrations of certain phytochemicals or whether it involves a controlled feeding study in the field of precision nutrition. A similar catalog exists for blueberries and hopefully future efforts will tackle additional crops. Deaton and Ma concur, saying, while they enjoyed the project and the collaboration, they wouldn’t want it to entirely replace the hands-on work and wet chemistry being in the lab affords.

Citation: Sorokina M, McCaffrey KS, Deaton EE, Ma G, Ordovás JM, Perkins-Veazie PM, Steinbeck C, Levi A and Parnell LD (2021) A Catalog of Natural Products Occurring in Watermelon—Citrullus lanatusFront. Nutr. 8:729822. doi: 10.3389/fnut.2021.729822