Serialization

.Serialization
DataContractSerializer | XML | XML Syntax | DTD vs XML Schemas

"Serialization is the process of transforming data stored in objects into a stream of bytes to be used for storage or communication."

.Serialization Process



Serialization-Deserialization Process

Serialization is the process of converting an in-memory object into a stream of bytes. Deserialization is the reverse processes where the data stream is reconstituted into an in-memory object. Three common uses of serialization are:

  1. As a persistence mechanism- to save an object's data values at a particular point in time.
  2. As a copy mechanism - to save an object's data and structure so the object can be recreated.
  3. As a communication mechanism - to transmit an object's data between processes or over a network.

The serialized stream of bytes can be formatted in different ways, depending upon the intended usage and systems involved. For example Binary Serialization creates a stream of binary values and results in a compact and fast method of communication. The binary byte stream is not human-readable and is not recommended as a communications method when going across-platforms, or between systems that have a different type system. The Binary Serializer was used frequently in .NET's Remoting framework, however .NET Remoting has been deprecated in favor of Windows Communication Foundation (WCF).

When serializing/deserializing across different platforms, the serialized stream of bytes needs to be formatted in a way that both systems can understand. However, even when not going across platforms, it may be desirable to have a standardized, or human-readable, way of formatting the data stream. For example when communicating with web services a standard, human-readable is desired. Two common standards for creating human-readable data formats are XML and JSON. XML (Extensible Markup Language) is the older specification while JSON (JavaScript Object Notation) is a newer format. JSON does not have the years of adoption or vendor support of XML, but it is catching up quickly. XML and JSON have different characteristics making each more suitable for different scenarios. While .NET currently provides more built-in support for XML, there are third-party extensions for JSON support within the .NET framework (e.g. Json.NET).

  1. XML (Extensible Markup Language)
    • Verbose syntax, larger size, slower parsing.
    • Uses tree data structure to store data.
    • Used for storing data in a file/database and data delivery between browsers and servers.
    • SOAP-based web services format the data in an XML payload wrapped within a SOAP envelope.
    • Data stored in XML is significantly larger than JSON. However when zipped, the difference in size is less drastic.
  2. JSON (JavaScript Object Notation)
    • Terse syntax, smaller size, faster parsing.
    • Uses value-pair data structure to store data.
    • Enables fast exchanges of small amounts of data between client browsers and AJAX-enabled Web services.
    • JSON's primary usage is data delivery between browsers and servers.
.NET Serialization Mechanisms

Earlier versons of .NET contained two serialization engines: the Binary Serializer for Remoting and the XmlSerializer for Web Services. Windows Communication Foundation (WCF) created the DataContractSerializer as part of its goal of unifying Remoting and Web Services. The DataContractSerializer did not totally replace the Binary Serializer and XmlSerializer as there are still situations where the two older engines find appropriate usage.

The serialization process can be explicitly invoked by coding calls to the serialization engines or methods. The framework can also implicitly call serialization processes when using features that involve WCF or Web Services. When explicitly calling the serialization engines attributes are used to indicate the how the data should be serialized. The current serialization mechanisms in the .NET Framework include:

  1. Data Contract Serializer
    • XML, for JSON use (DataContractJsonSerializer).
    • Newest engine.
    • Most verstatile engine, designed for both remoting and web services.
    • Choice of tight or loose coupling.
    • Slower than other engines.
    • Can serialize nonpublic fields.
    • Must opt-in to include fields.
  2. Binary Serializer
    • XML
    • Originally designed for use with Microsoft's Remoting framework. Remoting is deprecated in favor or using WFC.
    • Two formatters which are used with binary engine (BinaryFormatter and SoapFormatter)
    • Tight coupling; preserves types fidelity.
    • Faster than other engines.
    • Easy to use, highly automated.
    • Poorest version tolerance.
    • Must opt-out to excluding fields; by default includes both all private and all public fields (but not properties).
  3. XML Serializer
    • Can only produce XML.
    • Originally designed for web services and XML storage.
    • Loose coupling.
    • Most flexibility in reading/writing XML files.
    • Most flexible in following an arbitrary XML structure.
    • Better version tolerance
    • Must opt-out to excluding fields; by default includes all public fields and properties.

Note 1: The NetDataContractSerializer is a serializing engine similar to DataContractSerializer, however it includes CLR type information in the serialized XML, whereas the DataContractSerializer does not. Therefore, the NetDataContractSerializer can be used only if both the serializing and deserializing ends share the same CLR types.

Note 2: The JavaScriptSerializer class provides serialization and deserialization functionality for AJAX-enabled applications. The class is used when working with JSON in managed code.

Serialization Attributes

The simplest way to serialize an object is to code attributes which provide instructions to the serialization engine. All the serialization engines can use attributes, but there are differences in their naming and usage. DataContractSerializer requires data to be marked with an attribute to be included in the stream, while the older Binary and XML serializer requires the data be marked with an attribute to be excluded from the stream. For example, by simply adding the [Serializable] attribute on a class will instruct the Binary Serialization engine to include all the fields in the class (both public and private fields, but not properties). If certain fields are not to be included in the binary serialization, the fields are marked with the [NonSerialized] attribute. Attributes can also provide instructions to the serialization engine for: renaming fields, ordering the fields, converting the data as an XML element or an XML attribute, omitting fields with null values, etc). Below is an example of XMLSerialize that excludes a field, renames a field, and converts a field as an XML attribute.

XmlSerialize Example Program

using System;
using System.IO;
using System.Xml.Serialization;

namespace XmlSerialize
{
    /****************************************************************
     * XmlSerialize - Attributes: ignore, rename, convert attribute *
     ****************************************************************/
    public class Car
    {
        public string make;
        public string Model { get; set; }
        [XmlIgnore]
        public int year;
        [XmlElement("NickName")]
        public string name;
        [XmlAttribute]
        public string type;
    }

    class Program
    {
        static void Main()
        {
            Car hooptie = new Car()
            { make = "Toyota", Model = "SUV", year = 1996, name = "Slurpy", type = "Beaters" };
            XmlSerializer xs = new XmlSerializer(typeof(Car));

            using (Stream s = File.Create("car.xml"))
                xs.Serialize(s, hooptie);
        }
    }
}

Serialization Interfaces

Some situations may require more flexibility than can be provided by only using attributes, such as a need to dynamically control which data is serialized. In situations which require more control over the serialization process, the serialization methods can be overridden as part of implementing the serialization interface. The serialization engine interfaces include:

  1. ISerializable- Binary Serializer (Methods: GetObjectData()).
  2. IXmlSerializable - Data Contract and XmlSerializer (Methods: ReadXml, WriteXml()).
Running Custom Serialization Methods

Custom methods can also be run to correct data as part of the serialize/deserialize process. The methods are marked with one of the following attributes which indicate at what point in the serialization process the method executes:

  1. OnSerializedAttribute - specifies that the method is called after serialization of an object.
  2. OnSerializingAttribute - specifies that the method is during serialization of an object.
  3. OnDeserializedAttribute - specifies that the method is called immediately after deserialization of an object.
  4. OnDeserializingAttribute - specifies that the method is called during deserialization of an object.



DataContractSerializer

.DataContractSerializer
Program Output from DataContractSerialize Example


"DataContractSerialize is the latest .NET serialization engine which serializes and deserializes object data into XML based on a defined data contract. Data contracts are created through the specification of attributes."

DataContractSerialize is a class which serializes and deserializes object data into an XML stream or document using a supplied data contract. Data Contracts precisely define each data item that is serialized, making a formal agreement between processes which serialize and deserialize the data.

The DataContractSerialize class is located in the System.Runtime.Serialization namespace (System.Runtime.Serialization.dll). The following program includes these features:

  1. DataContract Attribute - implements a data contract for serializing data.
  2. DataMember Attribute - identifies data which are part of the data contract.
  3. Order Attribute Option - property for specifying the order of data members.
  4. Name Attribute Option - property for creating an XML tag name for data members.
  5. EmitDefaultValue Attribute Option - property that specifies if null data elements are to be serialized.
  6. Namespace Attribute Option - for data to be transmitted, the name of the data contract must be the same in both the client and the server.
  7. ToString Method Override - create an output line for each object.
  8. Average() LINQ Extension Method - average a group of interger values.
  9. String.Format() method - formats a floating-point number to two decimal digits.
  10. Sort Method using Lambda Expression - uses lambda expression to define CompareTo() for Sort.
  11. File Opens with Exception Handling - displays exception's message if file can not be opened.
  12. List Collection of Objects - generic list of custom objects.
  13. Ternary Conditional Operator (?:) - returns one of two values depending on the value of a Boolean expression.


DataContractSerializer Example Program - XML Serialialization

using System;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.Serialization;
using System.IO;

namespace XmlDataContractSerialize
{
    [DataContract (Name="Student", Namespace="http://www.kcshadow.net")]
    class Student
    {
        [DataMember (Order = 0)]
        public string Name { get; set; }

        [DataMember (Order = 2, Name = "ExamScores")]
        public int[] Scores { get; set; }

        [DataMember (Order = 1, Name = "Style", EmitDefaultValue=false)]
        public string Code { get; set; }

        public override string ToString()
        {
            return "Name: " + Name +
                   " Average Score: " + String.Format("{0:F2}", Scores.Average()) +
                   " Code: " + (Code == null ? "B#": Code);
        }
    }

    class Program
    {
        static void Main()
        {
            // Data to serialize
            List<Student> students = new List<Student>()
            {
               new Student { Name = "Myra", Scores = new int[] {98, 96, 97} },
               new Student { Name = "Mike",  Code = "B0", Scores = new int[] {23, 26, 13} },
               new Student { Name = "Matt",  Code = "K9", Scores = new int[] {85, 49, 11} }
            };

            // Serialize to file
            var dcs = new DataContractSerializer(typeof(List<Student>));
            try
            {
            using (Stream s = File.Create("sfile1.xml"))
                dcs.WriteObject(s, students);
            }
            catch (Exception e)
            {
                Console.WriteLine("{0}\n", e.Message);
                return;
            }

            // Deserialize from file
            List<Student> dsStudents = new List<Student>();
            try
            {
                using (Stream s = File.OpenRead("sfile1.xml"))
                    dsStudents = (List<Student>)dcs.ReadObject(s);
            }
            catch (Exception e)
            {
                Console.WriteLine("{0}\n", e.Message);
            }

            // Print Deserialized results by descending average score
            dsStudents.Sort((x,y) => y.Scores.Average().CompareTo(x.Scores.Average()));
            foreach (Student theStudent in dsStudents)
            {
                Console.WriteLine(theStudent.ToString());
            }
            Console.WriteLine();
        }
    }
}

Top




XML

"XML was designed to facilitate the sharing of data across different systems. In particular, systems connected by the Internet. Today XML has found widespread usage in the exchanging and storing of data."

Extensible Markup Language (XML) is a general-purpose markup language for documents which contain structured information. XML is specification which provides a standard way to define markup tags and their structural relationship. XML has no predefined tag set. The tags, and their meanings, are defined by the applications that process the XML, or by stylesheets. The XML specification was developed by W3C and is simplified subset of Standard Generalized Markup Language (SGML). The primary purpose of XML is to facilitate the sharing of data across different systems, particularly systems connected via the Internet.

Prior to the late 1960's publishers had electronic manuscripts which contained control codes (e.g. 'format-17') that caused the documents to be formatted in a particular way. During the late 1960's a movement was started among publishers to separate the information content of documents from their format. In 1969 an IBM research project invented the Generalized Markup Language (GML) which was implemented in mainframe publishing systems. In 1980 the American National Standards Institute (ANSI) used GML as a basis for their development SGML as an international standard for defining generalized markup languages for documents. To make use of SGML on the Internet, W3C formed a working group in 1996. Many organizations and interest groups were involved and the resulting XML 1.0 was released in 1998.

XML and HTML were both derived from SGML, but for different purposes. Tim Berners-Lee developed the HTML prototype in 1992 to display data on the Internet. XML was developed by a W3C group to transport and store data. Today XML is the most common tool for data transmissions between all sorts of online and offline applications. XML's popularity is a result of it's human readable format (plain text) and its software and hardware independent way of storing data. An alternative to XML which is gaining popularity is JavaScript Object Notation (JSON). JSON is promoted as a low-overhead and less verbose alternative to XML, resulting in faster transmission of data. See article on JSON for more information.

Today XML is extensively for exchanging and storing of data. XML is used in exchanging data on the Web (RSS, SOAP, Atom, XHTML), as a default format for various office-productivity tools, and as a format for configuration files in many computer related systems. Generality was a design goal of XML, so the development of XML tags was left open to the implementers. Many groups have created standard XML tags for data used in their disciplines. A listing of these standards can be found at XML Standards Library, with a small sample of these including

  1. X3D - a markup language used by a 3-D Real Time Communications Consortium.
  2. VoiceXML - markup language for creating voice user interfaces that use automatic speech recognition (ASR) and text-to-speech synthesis (TTS).
  3. Spacecraft Markup Language (SML) - standard definition of XML tags and concepts of structure to allow the definition of spacecraft and other support data objects.
  4. LandXML - non-proprietary data standard driven by an consortium of partners for the inter-operability of data utilized within the Land Development industry.
  5. SportsML - global XML standard for the interchange of sports data.
  6. golfml - XML Specifications for Golf Courses and Score Data Exchange.
Top




XML Syntax

XML has a few simple syntax rules. XML documents that conform to the syntax rules are said to be "well Formed".

  1. XML documents must contain a root element, which is the parent of all the other elements.
  2. All XML tags must have a closing tag.
  3. XML tags are case sensitive. Open and close tags must be written with the same case.
  4. XML tags must be properly nested within each other.
  5. XML elements can have attributes in name/value pairs. The attribute values must always be in quotes.
Correct: <message date="03/14/2014"> </message>
InCorrect: <message date=03/14/2014> </message>

XML uses some characters to for special purposes. When these special characters are used inside an XML document they must be replace with an entity reference.

XML Entity References
Entity Reference Special Character Description
&lt; < less than
&gt; > greater than
&amp; & ampersand
&apos; ' apostrophe
&quot; " quotation mark


XML Comments

Comments in XML are specified with the following syntax:

<!-- This is a comment --> 


Element Names

The XML rules for naming the elements (tag names) include:

  1. Names can contain letters, numbers, and other characters.
  2. Names cannot start with a number or punctuation character.
  3. Names cannot contain spaces.
  4. Names cannot start with the letters xml (in any combination of upper/lower case).

Recommendations for creating element names include:

  1. Make names descriptive.
  2. Avoid using a dash (-), dot (.), or colon (:) in the name.
  3. Only use English letters in the names.
Prolog

The prolog is an optional component of the XML document which consists of two parts: XML declaration and Document Type Declaration (DTD). Either or both the XML declaration or the DTD can be used. If both are used, the XML declaration must be before the DTD. If used, the prolog must be the first line(s) in the document and no other content or white space can precede it.

  1. XML declaration
    • If an XML declaration is used, the version number is required.
    • The encoding declaration is option, but if used, it must appear immediately after the version information.
    • Also option is a standalone pseudo-attribute which is only relevant if an external DTD exists. If an internal DTD, or no DTD, exists the "standalone" attributes is irrelevant. The default is standalone="no" (i.e. It is never necessary to use standalone="no" explicitly). If standalone="yes" is used, it signals the XML processor that the DTD is only used for validation. In which case the DTD is not used to look up anything, to normalize attribute values, or to remove ignorable whitespace. This option is only useful when the DTD is large or the network is slow, and the DTD is only used for validation.
  2. Document Type Declaration
    • The DTD contains the rules the XML code must follow. The DTD has a syntax that declares the location and contents of elements, attributes, and entities which may appear in the XML document. Programs which parse the XML then decide what to do about the DTD violations, they may try to fix them, or they may reject the invalid element.
    • A DTD can be stored in a separate file from the XML document it describes (.dtd), or it can be coded inside the XML document. It is also possible to have a mixture of some declarations stored directly in the XML document and others stored in an external file.
    • Standard DTDs can be stored at websites and accessed by specifying the URI and using the keyword PUBLIC (as opposed to SYSTEM) in the DTD.
    • DTDs can define entities to be used a replace an entity reference with text, such as:
      <!ENTITY supercal "supercalifragilisticexpialidocious">
      


      allows the reference &supercal; to be replaced by the text supercalifragilisticexpialidocious.

Typical XML Prolog
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">



Note1: New XML namespace-aware schema languages (W3C XML Schema {XSD}, ISO RELAX NG) are used to replace the DTD function of declaring elements and attributes. However the DTD remains necessary for various purposes which can not be performed by XML schemas.

Note2: An XML DTD can be used to create a denial of service (DoS) attack by defining nested entities that expand exponentially, or by sending the XML parser to an external resource that never returns. For this reason, .NET Framework provides a property that allows prohibiting or skipping DTD parsing, and recent versions of Microsoft Office applications (Microsoft Office 2010 and higher) refuse to open XML files that contain DTD declarations.

XML Attributes

Attributes should be used judiciously in XML as they are difficult to read and maintain. It is recommended that elements be used for data, instead of using attributes. Attributes should instead be used for metadata, when needing to provide information about the data itself. Cited as some of the problems with using attributes include:

  1. Attributes cannot contain multiple values, while elements can.
  2. Attributes cannot contain tree structures, while elements can.
  3. Attributes are not easily expandable for future modifications.

An example of avoiding the use of attributes would be to structure this XML with attributes:

<customer name="Kevin" age="140" sex="M"/> 




in this fashion with only elements (no attributes):

<customer>
   <name>      
      Kevin
   </name>
   <age>
      140
   </age>
   <sex>
      m
   </sex>
</customer>


Additional Information
  1. XML preserves embedded white-space (unlike HTML where multiple white-space characters are truncated to a single white-space character.

  2. W3C provides a tool to check the validity of XML at W3C Markup Validation Service.

Top




DTD vs XML Schemas

XML provides a platform-independent way of communicating data. However both the sending and receiving computers must agree upon the structure of the data which is being communicated. This information about the structure and format of an XML document defines how both the sending and receiving computer will interpret the document's contents. Typically one of the computers will be in charge of the format for a particular XML document and this information is shared with the other computer in the form of a header or a separate document (DTD or XML Schema).

DTD – Document Type Definition

The first method used to provide the information about the structure and format of the XML data in a document was the Document Type Definition (DTD). The DTD defines the elements that may be included in the document, what attributes these elements have, and the ordering and nesting of the elements. The DTD is associated with an XML document through the use of a document type declaration (DOCTYPE) which is coded near the beginning of the XML document. The DTD declarations can be code inline with the XML document or as a separate external file. Additionally it is possible to use a combination declarations from both inline coding and an external file.

Inline DTD
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html [
  <!-- internal DTD declartions embedded here -->
]>



External File DTD
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Top



Reference Articles

Top