Recipes Dataset
RecipeNLG dataset is available for download here. It contains 2.2 million recipes. The size is slightly less than 1 GB.
Download and Unpack the Datasetβ
- Go to the download page https://recipenlg.cs.put.poznan.pl/dataset.
- Accept Terms and Conditions and download zip file.
- Unpack the zip file with
unzip
. You will get thefull_dataset.csv
file.
Create a Tableβ
Run clickhouse-client and execute the following CREATE query:
CREATE TABLE recipes
(
title String,
ingredients Array(String),
directions Array(String),
link String,
source LowCardinality(String),
NER Array(String)
) ENGINE = MergeTree ORDER BY title;
Insert the Dataβ
Run the following command:
clickhouse-client --query "
INSERT INTO recipes
SELECT
title,
JSONExtract(ingredients, 'Array(String)'),
JSONExtract(directions, 'Array(String)'),
link,
source,
JSONExtract(NER, 'Array(String)')
FROM input('num UInt32, title String, ingredients String, directions String, link String, source LowCardinality(String), NER String')
FORMAT CSVWithNames
" --input_format_with_names_use_header 0 --format_csv_allow_single_quote 0 --input_format_allow_errors_num 10 < full_dataset.csv
This is a showcase how to parse custom CSV, as it requires multiple tunes.
Explanation:
- The dataset is in CSV format, but it requires some preprocessing on insertion; we use table function input to perform preprocessing;
- The structure of CSV file is specified in the argument of the table function
input
; - The field
num
(row number) is unneeded - we parse it from file and ignore; - We use
FORMAT CSVWithNames
but the header in CSV will be ignored (by command line parameter--input_format_with_names_use_header 0
), because the header does not contain the name for the first field; - File is using only double quotes to enclose CSV strings; some strings are not enclosed in double quotes, and single quote must not be parsed as the string enclosing - that's why we also add the
--format_csv_allow_single_quote 0
parameter; - Some strings from CSV cannot parse, because they contain
\M/
sequence at the beginning of the value; the only value starting with backslash in CSV can be\N
that is parsed as SQL NULL. We add--input_format_allow_errors_num 10
parameter and up to ten malformed records can be skipped; - There are arrays for ingredients, directions and NER fields; these arrays are represented in unusual form: they are serialized into string as JSON and then placed in CSV - we parse them as String and then use JSONExtract function to transform it to Array.
Validate the Inserted Dataβ
By checking the row count:
Query:
SELECT count() FROM recipes;
Result:
ββcount()ββ
β 2231141 β
βββββββββββ
Example Queriesβ
Top Components by the Number of Recipes:β
In this example we learn how to use arrayJoin function to expand an array into a set of rows.
Query:
SELECT
arrayJoin(NER) AS k,
count() AS c
FROM recipes
GROUP BY k
ORDER BY c DESC
LIMIT 50
Result:
ββkβββββββββββββββββββββ¬ββββββcββ
β salt β 890741 β
β sugar β 620027 β
β butter β 493823 β
β flour β 466110 β
β eggs β 401276 β
β onion β 372469 β
β garlic β 358364 β
β milk β 346769 β
β water β 326092 β
β vanilla β 270381 β
β olive oil β 197877 β
β pepper β 179305 β
β brown sugar β 174447 β
β tomatoes β 163933 β
β egg β 160507 β
β baking powder β 148277 β
β lemon juice β 146414 β
β Salt β 122557 β
β cinnamon β 117927 β
β sour cream β 116682 β
β cream cheese β 114423 β
β margarine β 112742 β
β celery β 112676 β
β baking soda β 110690 β
β parsley β 102151 β
β chicken β 101505 β
β onions β 98903 β
β vegetable oil β 91395 β
β oil β 85600 β
β mayonnaise β 84822 β
β pecans β 79741 β
β nuts β 78471 β
β potatoes β 75820 β
β carrots β 75458 β
β pineapple β 74345 β
β soy sauce β 70355 β
β black pepper β 69064 β
β thyme β 68429 β
β mustard β 65948 β
β chicken broth β 65112 β
β bacon β 64956 β
β honey β 64626 β
β oregano β 64077 β
β ground beef β 64068 β
β unsalted butter β 63848 β
β mushrooms β 61465 β
β Worcestershire sauce β 59328 β
β cornstarch β 58476 β
β green pepper β 58388 β
β Cheddar cheese β 58354 β
ββββββββββββββββββββββββ΄βββββββββ
50 rows in set. Elapsed: 0.112 sec. Processed 2.23 million rows, 361.57 MB (19.99 million rows/s., 3.24 GB/s.)
The Most Complex Recipes with Strawberryβ
SELECT
title,
length(NER),
length(directions)
FROM recipes
WHERE has(NER, 'strawberry')
ORDER BY length(directions) DESC
LIMIT 10
Result:
ββtitleβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬βlength(NER)ββ¬βlength(directions)ββ
β Chocolate-Strawberry-Orange Wedding Cake β 24 β 126 β
β Strawberry Cream Cheese Crumble Tart β 19 β 47 β
β Charlotte-Style Ice Cream β 11 β 45 β
β Sinfully Good a Million Layers Chocolate Layer Cake, With Strawb β 31 β 45 β
β Sweetened Berries With Elderflower Sherbet β 24 β 44 β
β Chocolate-Strawberry Mousse Cake β 15 β 42 β
β Rhubarb Charlotte with Strawberries and Rum β 20 β 42 β
β Chef Joey's Strawberry Vanilla Tart β 7 β 37 β
β Old-Fashioned Ice Cream Sundae Cake β 17 β 37 β
β Watermelon Cake β 16 β 36 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββ΄βββββββββββββββββββββ
10 rows in set. Elapsed: 0.215 sec. Processed 2.23 million rows, 1.48 GB (10.35 million rows/s., 6.86 GB/s.)
In this example, we involve has function to filter by array elements and sort by the number of directions.
There is a wedding cake that requires the whole 126 steps to produce! Show that directions:
Query:
SELECT arrayJoin(directions)
FROM recipes
WHERE title = 'Chocolate-Strawberry-Orange Wedding Cake'
Result:
ββarrayJoin(directions)ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Position 1 rack in center and 1 rack in bottom third of oven and preheat to 350F. β
β Butter one 5-inch-diameter cake pan with 2-inch-high sides, one 8-inch-diameter cake pan with 2-inch-high sides and one 12-inch-diameter cake pan with 2-inch-high sides. β
β Dust pans with flour; line bottoms with parchment. β
β Combine 1/3 cup orange juice and 2 ounces unsweetened chocolate in heavy small saucepan. β
β Stir mixture over medium-low heat until chocolate melts. β
β Remove from heat. β
β Gradually mix in 1 2/3 cups orange juice. β
β Sift 3 cups flour, 2/3 cup cocoa, 2 teaspoons baking soda, 1 teaspoon salt and 1/2 teaspoon baking powder into medium bowl. β
β using electric mixer, beat 1 cup (2 sticks) butter and 3 cups sugar in large bowl until blended (mixture will look grainy). β
β Add 4 eggs, 1 at a time, beating to blend after each. β
β Beat in 1 tablespoon orange peel and 1 tablespoon vanilla extract. β
β Add dry ingredients alternately with orange juice mixture in 3 additions each, beating well after each addition. β
β Mix in 1 cup chocolate chips. β
β Transfer 1 cup plus 2 tablespoons batter to prepared 5-inch pan, 3 cups batter to prepared 8-inch pan and remaining batter (about 6 cups) to 12-inch pan. β
β Place 5-inch and 8-inch pans on center rack of oven. β
β Place 12-inch pan on lower rack of oven. β
β Bake cakes until tester inserted into center comes out clean, about 35 minutes. β
β Transfer cakes in pans to racks and cool completely. β
β Mark 4-inch diameter circle on one 6-inch-diameter cardboard cake round. β
β Cut out marked circle. β
β Mark 7-inch-diameter circle on one 8-inch-diameter cardboard cake round. β
β Cut out marked circle. β
β Mark 11-inch-diameter circle on one 12-inch-diameter cardboard cake round. β
β Cut out marked circle. β
β Cut around sides of 5-inch-cake to loosen. β
β Place 4-inch cardboard over pan. β
β Hold cardboard and pan together; turn cake out onto cardboard. β
β Peel off parchment.Wrap cakes on its cardboard in foil. β
β Repeat turning out, peeling off parchment and wrapping cakes in foil, using 7-inch cardboard for 8-inch cake and 11-inch cardboard for 12-inch cake. β
β Using remaining ingredients, make 1 more batch of cake batter and bake 3 more cake layers as described above. β
β Cool cakes in pans. β
β Cover cakes in pans tightly with foil. β
β (Can be prepared ahead. β
β Let stand at room temperature up to 1 day or double-wrap all cake layers and freeze up to 1 week. β
β Bring cake layers to room temperature before using.) β
β Place first 12-inch cake on its cardboard on work surface. β
β Spread 2 3/4 cups ganache over top of cake and all the way to edge. β
β Spread 2/3 cup jam over ganache, leaving 1/2-inch chocolate border at edge. β
β Drop 1 3/4 cups white chocolate frosting by spoonfuls over jam. β
β Gently spread frosting over jam, leaving 1/2-inch chocolate border at edge. β
β Rub some cocoa powder over second 12-inch cardboard. β
β Cut around sides of second 12-inch cake to loosen. β
β Place cardboard, cocoa side down, over pan. β
β Turn cake out onto cardboard. β
β Peel off parchment. β
β Carefully slide cake off cardboard and onto filling on first 12-inch cake. β
β Refrigerate. β
β Place first 8-inch cake on its cardboard on work surface. β
β Spread 1 cup ganache over top all the way to edge. β
β Spread 1/4 cup jam over, leaving 1/2-inch chocolate border at edge. β
β Drop 1 cup white chocolate frosting by spoonfuls over jam. β
β Gently spread frosting over jam, leaving 1/2-inch chocolate border at edge. β
β Rub some cocoa over second 8-inch cardboard. β
β Cut around sides of second 8-inch cake to loosen. β
β Place cardboard, cocoa side down, over pan. β
β Turn cake out onto cardboard. β
β Peel off parchment. β
β Slide cake off cardboard and onto filling on first 8-inch cake. β
β Refrigerate. β
β Place first 5-inch cake on its cardboard on work surface. β
β Spread 1/2 cup ganache over top of cake and all the way to edge. β
β Spread 2 tablespoons jam over, leaving 1/2-inch chocolate border at edge. β
β Drop 1/3 cup white chocolate frosting by spoonfuls over jam. β
β Gently spread frosting over jam, leaving 1/2-inch chocolate border at edge. β
β Rub cocoa over second 6-inch cardboard. β
β Cut around sides of second 5-inch cake to loosen. β
β Place cardboard, cocoa side down, over pan. β
β Turn cake out onto cardboard. β
β Peel off parchment. β
β Slide cake off cardboard and onto filling on first 5-inch cake. β
β Chill all cakes 1 hour to set filling. β
β Place 12-inch tiered cake on its cardboard on revolving cake stand. β
β Spread 2 2/3 cups frosting over top and sides of cake as a first coat. β
β Refrigerate cake. β
β Place 8-inch tiered cake on its cardboard on cake stand. β
β Spread 1 1/4 cups frosting over top and sides of cake as a first coat. β
β Refrigerate cake. β
β Place 5-inch tiered cake on its cardboard on cake stand. β
β Spread 3/4 cup frosting over top and sides of cake as a first coat. β
β Refrigerate all cakes until first coats of frosting set, about 1 hour. β
β (Cakes can be made to this point up to 1 day ahead; cover and keep refrigerate.) β
β Prepare second batch of frosting, using remaining frosting ingredients and following directions for first batch. β
β Spoon 2 cups frosting into pastry bag fitted with small star tip. β
β Place 12-inch cake on its cardboard on large flat platter. β
β Place platter on cake stand. β
β Using icing spatula, spread 2 1/2 cups frosting over top and sides of cake; smooth top. β
β Using filled pastry bag, pipe decorative border around top edge of cake. β
β Refrigerate cake on platter. β
β Place 8-inch cake on its cardboard on cake stand. β
β Using icing spatula, spread 1 1/2 cups frosting over top and sides of cake; smooth top. β
β Using pastry bag, pipe decorative border around top edge of cake. β
β Refrigerate cake on its cardboard. β
β Place 5-inch cake on its cardboard on cake stand. β
β Using icing spatula, spread 3/4 cup frosting over top and sides of cake; smooth top. β
β Using pastry bag, pipe decorative border around top edge of cake, spooning more frosting into bag if necessary. β
β Refrigerate cake on its cardboard. β
β Keep all cakes refrigerated until frosting sets, about 2 hours. β
β (Can be prepared 2 days ahead. β
β Cover loosely; keep refrigerated.) β
β Place 12-inch cake on platter on work surface. β
β Press 1 wooden dowel straight down into and completely through center of cake. β
β Mark dowel 1/4 inch above top of frosting. β
β Remove dowel and cut with serrated knife at marked point. β
β Cut 4 more dowels to same length. β
β Press 1 cut dowel back into center of cake. β
β Press remaining 4 cut dowels into cake, positioning 3 1/2 inches inward from cake edges and spacing evenly. β
β Place 8-inch cake on its cardboard on work surface. β
β Press 1 dowel straight down into and completely through center of cake. β
β Mark dowel 1/4 inch above top of frosting. β
β Remove dowel and cut with serrated knife at marked point. β
β Cut 3 more dowels to same length. β
β Press 1 cut dowel back into center of cake. β
β Press remaining 3 cut dowels into cake, positioning 2 1/2 inches inward from edges and spacing evenly. β
β Using large metal spatula as aid, place 8-inch cake on its cardboard atop dowels in 12-inch cake, centering carefully. β
β Gently place 5-inch cake on its cardboard atop dowels in 8-inch cake, centering carefully. β
β Using citrus stripper, cut long strips of orange peel from oranges. β
β Cut strips into long segments. β
β To make orange peel coils, wrap peel segment around handle of wooden spoon; gently slide peel off handle so that peel keeps coiled shape. β
β Garnish cake with orange peel coils, ivy or mint sprigs, and some berries. β
β (Assembled cake can be made up to 8 hours ahead. β
β Let stand at cool room temperature.) β
β Remove top and middle cake tiers. β
β Remove dowels from cakes. β
β Cut top and middle cakes into slices. β
β To cut 12-inch cake: Starting 3 inches inward from edge and inserting knife straight down, cut through from top to bottom to make 6-inch-diameter circle in center of cake. β
β Cut outer portion of cake into slices; cut inner portion into slices and serve with strawberries. β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
126 rows in set. Elapsed: 0.011 sec. Processed 8.19 thousand rows, 5.34 MB (737.75 thousand rows/s., 480.59 MB/s.)
Online Playgroundβ
The dataset is also available in the Online Playground.