开发者

Why is there "data" and "newtype" in Haskell? [duplicate]

This question already has answers here: Difference between `data` and `newtype` in Haskell (2 answers) Closed 8 years ago.

It seems that a newtype definition is just a data definition that obeys some restrictions (e.g., only one constructor), and that due to these restrictions the runtime system can handle newtypes more efficiently. And the handling of pattern matching for undefined values is slightly dif开发者_开发技巧ferent.

But suppose Haskell would only knew data definitions, no newtypes: couldn't the compiler find out for itself whether a given data definition obeys these restrictions, and automatically treat it more efficiently?

I'm sure I'm missing out on something, there must be some deeper reason for this.


Both newtype and the single-constructor data introduce a single value constructor, but the value constructor introduced by newtype is strict and the value constructor introduced by data is lazy. So if you have

data D = D Int
newtype N = N Int

Then N undefined is equivalent to undefined and causes an error when evaluated. But D undefined is not equivalent to undefined, and it can be evaluated as long as you don't try to peek inside.

Couldn't the compiler handle this for itself.

No, not really—this is a case where as the programmer you get to decide whether the constructor is strict or lazy. To understand when and how to make constructors strict or lazy, you have to have a much better understanding of lazy evaluation than I do. I stick to the idea in the Report, namely that newtype is there for you to rename an existing type, like having several different incompatible kinds of measurements:

newtype Feet = Feet Double
newtype Cm   = Cm   Double

both behave exactly like Double at run time, but the compiler promises not to let you confuse them.


According to Learn You a Haskell:

Instead of the data keyword, the newtype keyword is used. Now why is that? Well for one, newtype is faster. If you use the data keyword to wrap a type, there's some overhead to all that wrapping and unwrapping when your program is running. But if you use newtype, Haskell knows that you're just using it to wrap an existing type into a new type (hence the name), because you want it to be the same internally but have a different type. With that in mind, Haskell can get rid of the wrapping and unwrapping once it resolves which value is of what type.

So why not just use newtype all the time instead of data then? Well, when you make a new type from an existing type by using the newtype keyword, you can only have one value constructor and that value constructor can only have one field. But with data, you can make data types that have several value constructors and each constructor can have zero or more fields:

data Profession = Fighter | Archer | Accountant  

data Race = Human | Elf | Orc | Goblin  

data PlayerCharacter = PlayerCharacter Race Profession 

When using newtype, you're restricted to just one constructor with one field.

Now consider the following type:

data CoolBool = CoolBool { getCoolBool :: Bool } 

It's your run-of-the-mill algebraic data type that was defined with the data keyword. It has one value constructor, which has one field whose type is Bool. Let's make a function that pattern matches on a CoolBool and returns the value "hello" regardless of whether the Bool inside the CoolBool was True or False:

helloMe :: CoolBool -> String  
helloMe (CoolBool _) = "hello"  

Instead of applying this function to a normal CoolBool, let's throw it a curveball and apply it to undefined!

ghci> helloMe undefined  
"*** Exception: Prelude.undefined  

Yikes! An exception! Now why did this exception happen? Types defined with the data keyword can have multiple value constructors (even though CoolBool only has one). So in order to see if the value given to our function conforms to the (CoolBool _) pattern, Haskell has to evaluate the value just enough to see which value constructor was used when we made the value. And when we try to evaluate an undefined value, even a little, an exception is thrown.

Instead of using the data keyword for CoolBool, let's try using newtype:

newtype CoolBool = CoolBool { getCoolBool :: Bool }   

We don't have to change our helloMe function, because the pattern matching syntax is the same if you use newtype or data to define your type. Let's do the same thing here and apply helloMe to an undefined value:

ghci> helloMe undefined  
"hello"

It worked! Hmmm, why is that? Well, like we've said, when we use newtype, Haskell can internally represent the values of the new type in the same way as the original values. It doesn't have to add another box around them, it just has to be aware of the values being of different types. And because Haskell knows that types made with the newtype keyword can only have one constructor, it doesn't have to evaluate the value passed to the function to make sure that it conforms to the (CoolBool _) pattern because newtype types can only have one possible value constructor and one field!

This difference in behavior may seem trivial, but it's actually pretty important because it helps us realize that even though types defined with data and newtype behave similarly from the programmer's point of view because they both have value constructors and fields, they are actually two different mechanisms. Whereas data can be used to make your own types from scratch, newtype is for making a completely new type out of an existing type. Pattern matching on newtype values isn't like taking something out of a box (like it is with data), it's more about making a direct conversion from one type to another.

Here's another source. According to this Newtype article:

A newtype declaration creates a new type in much the same way as data. The syntax and usage of newtypes is virtually identical to that of data declarations - in fact, you can replace the newtype keyword with data and it'll still compile, indeed there's even a good chance your program will still work. The converse is not true, however - data can only be replaced with newtype if the type has exactly one constructor with exactly one field inside it.

Some Examples:

newtype Fd = Fd CInt
-- data Fd = Fd CInt would also be valid

-- newtypes can have deriving clauses just like normal types
newtype Identity a = Identity a
  deriving (Eq, Ord, Read, Show)

-- record syntax is still allowed, but only for one field
newtype State s a = State { runState :: s -> (s, a) }

-- this is *not* allowed:
-- newtype Pair a b = Pair { pairFst :: a, pairSnd :: b }
-- but this is:
data Pair a b = Pair { pairFst :: a, pairSnd :: b }
-- and so is this:
newtype Pair' a b = Pair' (a, b)

Sounds pretty limited! So why does anyone use newtype?

The short version The restriction to one constructor with one field means that the new type and the type of the field are in direct correspondence:

State :: (s -> (a, s)) -> State s a
runState :: State s a -> (s -> (a, s))

or in mathematical terms they are isomorphic. This means that after the type is checked at compile time, at run time the two types can be treated essentially the same, without the overhead or indirection normally associated with a data constructor. So if you want to declare different type class instances for a particular type, or want to make a type abstract, you can wrap it in a newtype and it'll be considered distinct to the type-checker, but identical at runtime. You can then use all sorts of deep trickery like phantom or recursive types without worrying about GHC shuffling buckets of bytes for no reason.

See the article for the messy bits...


Simple version for folks obsessed with bullet lists (failed to find one, so have to write it by myself):

data - creates new algebraic type with value constructors

  • Can have several value constructors
  • Value constructors are lazy
  • Values can have several fields
  • Affects both compilation and runtime, have runtime overhead
  • Created type is a distinct new type
  • Can have its own type class instances
  • When pattern matching against value constructors, WILL be evaluated at least to weak head normal form (WHNF) *
  • Used to create new data type (example: Address { zip :: String, street :: String } )

newtype - creates new “decorating” type with value constructor

  • Can have only one value constructor
  • Value constructor is strict
  • Value can have only one field
  • Affects only compilation, no runtime overhead
  • Created type is a distinct new type
  • Can have its own type class instances
  • When pattern matching against value constructor, CAN be not evaluated at all *
  • Used to create higher level concept based on existing type with distinct set of supported operations or that is not interchangeable with original type (example: Meter, Cm, Feet is Double)

type - creates an alternative name (synonym) for a type (like typedef in C)

  • No value constructors
  • No fields
  • Affects only compilation, no runtime overhead
  • No new type is created (only a new name for existing type)
  • Can NOT have its own type class instances
  • When pattern matching against data constructor, behaves the same as original type
  • Used to create higher level concept based on existing type with the same set of supported operations (example: String is [Char])

[*] On pattern matching laziness:

data DataBox a = DataBox Int
newtype NewtypeBox a = NewtypeBox Int

dataMatcher :: DataBox -> String
dataMatcher (DataBox _) = "data"

newtypeMatcher :: NewtypeBox -> String 
newtypeMatcher (NewtypeBox _) = "newtype"

ghci> dataMatcher undefined
"*** Exception: Prelude.undefined

ghci> newtypeMatcher undefined
“newtype"


Off the top of my head; data declarations use lazy evaluation in access and storage of their "members", whereas newtype does not. Newtype also strips away all previous type instances from its components, effectively hiding its implementation; whereas data leaves the implementation open.

I tend to use newtype's when avoiding boilerplate code in complex data types where I don't necessarily need access to the internals when using them. This speeds up both compilation and execution, and reduces code complexity where the new type is used.

When first reading about this I found this chapter of a Gentle Introduction to Haskell rather intuitive.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜