In part 1, we wasted a lot of time traipsing through Hackage, goofing off with Emacs, Hoogling (yes, that is a word!), and writing undefined functions. We're going to have to pick up the pace if we want to get this done before dinner.
Let's take a look at Alex's next Racket function, extract-html5-value
. In case you don't have it memorized:
(define (extract-html5-value element scope) (when (h:html-full? element) (cond [(html-itemscope? element) scope] [else (apply string-append (flatten (map (lambda (c) (cond [(x:pcdata? c) (list (x:pcdata-string c))] [else (list)])) (h:html-full-content element))))])))
It's fairly clear that this function takes an HTML element as its first parameter, but what is scope
? Scope gets returned if html-itemscope?
returns true
. Great -- that tells us nothing. Oh, but type-wise, it must be the same type as the second case in the cond
, and that looks like it returns a string. (Gee, if we had a typed language, this would be a whole lot simpler.) So we want a Haskell function like this:
extractHtml5Value :: Node -> Text -> Text extractHtml5Value elt scope = undefined
Or do we? The thing is, if you examine the rest of the Racket program, you'll find that what get passed to extract-html5-value
is not a string, but the value of new-scope
(from line 46) which is an itemscope
structure (from line 36). How can we have an itemscope
in one case, and a string in the other?
Answer: Alex is implementing this (or at least part of it). In one case (the first one) a value is an itemscope attribute, but in another (the last one) it's a piece of text. This is a job for algebraic data types!
An algebraic data type is a fancy name for a union. Not your Pipelayer's Local 57, but a type that has a number of different cases. In Haskell, we can define algebraic data types by defining a data type with more than one constructor. Let's create a MicrodataValue data type this way:
data MicrodataValue = ItemscopeValue Itemscope | TextValue Text deriving (Show, Eq)
Here we have 2 constructors: ItemscopeValue
that takes an Itemscope
as a parameter, and TextValue
that takes Text
as a parameter (the vertical bar separates them). You can think of these constructors as the tags in a tagged union... because that's exactly what they are. You can discriminate these 2 cases by pattern matching. In fact, we've seen this already with the Node data type.
But how did Racket get away with not needing something like this? It's because values in Racket (like all Lisp derivatives) have types at runtime. (In Haskell, types go away after the program is compiled.) Racket's runtime type system can distinguish structs from strings, so it 's happy. However, if you wanted to distinguish strings used for different purposes, you'd have to resort to some sort of home-grown tagging (maybe using another structure type to wrap them). In Haskell we're forced to create wrappers around these different cases whenever we want to to distinguish alternatives at runtime -- by using algebraic data types.
So I guess what we really want is a extractHtml5Value
function that looks like this:
extractHtml5Value :: Node -> Itemscope -> MicrodataValue extractHtml5Value elt scope = undefined
but without so much undefined stuff. The first case is easy:
extractHtml5Value :: Node -> Itemscope -> MicrodataValue extractHtml5Value elt scope = if hasItemscope elt then ItemscopeValue scope else undefined
Since the first thing we're doing here is an if statement, we can convert this into a couple of top-level function cases that use a guard clause to restrict whether the first case triggers. This is just a little cleaner:
extractHtml5Value :: Node -> Itemscope -> MicrodataValue extractHtml5Value elt scope | hasItemscope elt = ItemscopeValue scope extractHtml5Value elt scope = undefined
In the first case we've just wrapped the ItemscopeValue
constructor around the Itemscope
value we were given. The second case in the Racket code is doing some fancy footwork to concatenate the PCDATA of the element together. To do this in Haskell, we can map over the element's children, and if the child is a TextNode
(from the Text.XmlHtml
package -- part of the Node
algebraic data type) return the text. If it isn't a TextNode
, just return some empty text. This will give us a list of Text
values that we can concatenate:
extractHtml5Value :: Node -> Itemscope -> MicrodataValue extractHtml5Value elt scope | hasItemscope elt = ItemscopeValue scope extractHtml5Value elt scope = TextValue $ concat $ map getText $ elementChildren elt where getText elt@(TextNode text) = text getText _ = ""
Unfortunately, this doesn't work. What's going on?
/Users/warrenharris/projects/racket-hs/microdata.hs:57:29: Couldn't match expected type `[a0]' with actual type `Text' Expected type: [[a0]] Actual type: [Text] In the second argument of `($)', namely `map getText $ elementChildren elt' In the second argument of `($)', namely `concat $ map getText $ elementChildren elt' Failed, modules loaded: none.
We're looking at you, map
. Actually, map's return type is what we're expecting, [Text]
. It seems that the expected type isn't what we want -- a list of lists of things. (a0
is some type variable that isn't (yet) instantiated. Type variables always start with lower-case letters, whereas concrete types are always upper-case.) So the problem must be concat
. What's it's type?
Prelude> :t concat concat :: [[a]] -> [a]
So this isn't at all what we're expecting. We want something that converts [Text] -> Text
. Let's ask Hoogle: [Text] -> Text. (BTW, you can enable Hoogle in ghci if you like.) Doh. We want the concat
in Data.Text
. Let's add another import:
import Data.Text (Text, concat)
Fail.
/Users/warrenharris/projects/racket-hs/microdata.hs:57:20: Ambiguous occurrence `concat' It could refer to either `Prelude.concat', imported from Prelude or `Data.Text.concat', imported from Data.Text at /Users/warrenharris/projects/racket-hs/microdata.hs:6:25-30 Failed, modules loaded: none.
Let's hide the version of concat
coming from the Prelude
module (stuff that's there if you don't as for anything special):
import Prelude hiding (concat)
Works. But we can do better. Concatenating lists that result from mapping is a common pattern, and there's probably a better way to do this. Let's ask Hoogle again (by giving the overall type of the concat $ map
combined functions): (a -> Text) -> [a] -> Text. Hmmm... foldMap
looks vaguely like something that would fit:
foldMap :: (Foldable t, Monoid m) => (a -> m) -> t a -> m
but it requires a Foldable t
type function applied to a
(what does that mean) whereas we asked for [a]
, a list of whatever. Could it be that the list type is a Foldable
type function? Could be. And we asked for the result to be Text
, but Hoogle gave us something that returns a Monoid
(wtf?). Is Text
a Monoid
? I guess we can give it a shot. Let's import:
import Data.Foldable (foldMap)
and try it out:
extractHtml5Value :: Node -> Itemscope -> MicrodataValue extractHtml5Value elt scope | hasItemscope elt = ItemscopeValue scope extractHtml5Value elt scope = TextValue $ foldMap getText $ elementChildren elt where getText elt@(TextNode text) = text getText _ = ""
Yeah! Hoogle is awesome.
Review
Where are we now?
{-# LANGUAGE OverloadedStrings #-} import Data.Foldable (foldMap) import Data.List (find) import Data.Maybe (isJust) import Data.Text (Text, concat) import Prelude hiding (concat) import Text.XmlHtml data Itemprop = Itemprop { itempropName :: Text, itempropValue :: Text } deriving (Show, Eq) data Itemscope = Itemscope { itemscopeType :: Maybe Text, itemscopeProperties :: [Itemprop] } deriving (Show, Eq) hasItemscope :: Node -> Bool hasItemscope elt@(Element _ _ _) = isJust $ lookup "itemscope" $ elementAttrs elt hasItemscope _ = False data MicrodataValue = ItemscopeValue Itemscope | TextValue Text deriving (Show, Eq) extractHtml5Value :: Node -> Itemscope -> MicrodataValue extractHtml5Value elt scope | hasItemscope elt = ItemscopeValue scope extractHtml5Value elt scope = TextValue $ foldMap getText $ elementChildren elt where getText elt@(TextNode text) = text getText _ = ""
Ready for part 3?
No comments:
Post a Comment