get XML Namespace Elements using PHP SimpleXML


PHP has a great SimpleXML library that converts XML to an object that can be processed with normal property selectors and array iterators. I’ve been using this quite a bit lately to process some XML documents.
The library documentation isn’t that great when it comes to processing Namespace Elements within your XML document. An example of such use case is when you are parsing an RSS feed that has XML Namespace elements.
Consider the following example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
....
  <item>
    <title>My Title</title>
    <description>My Item</description>
    <dc:publisher>ABC</dc:publisher>
    <dc:creator>DEF</dc:creator>
    <dc:date>2009-02-12T16:53:25Z</dc:date>
  </item>
  ...
</channel>
</rss>

For me to access things like the Title and Description elements, its as simple as:

1
2
3
4
5
6
$feed = file_get_contents("http://linkto.my.feed");
$xml = new SimpleXmlElement($feed);
foreach ($xml->channel->item as $entry){
  echo $entry->title;
  echo $entry->description;
}

But what if I want to access my namespace elements such as dc:publisher or dc:creator? You would think it ‘could’ be as simple as this:

1
2
3
4
5
6
7
//This doesn't work
...
foreach ($xml->channel->item as $entry){
  echo $entry->publisher;
  echo $entry->creator;
  ...
}

The code above doesn't work because the publisher and creator elements sit inside different namespaces. So how do we do this? If you recall, the second line of our feed had this:

1
.... xmlns:dc="http://purl.org/dc/elements/1.1/">

So we know from above that anything in the dc namespace refers this URL: http://purl.org/dc/elements/1.1. Now that we know this, we can easily do this:

1
2
3
4
5
6
7
8
9
10
$feed = file_get_contents("http://linkto.my.feed");
$xml = new SimpleXmlElement($feed);
foreach ($xml->channel->item as $entry){
  echo $entry->title;
  echo $entry->description;
  //Use that namespace
  $dc = $entry->children(‘http://purl.org/dc/elements/1.1/');
  echo $dc->publisher;
  echo $dc->creator;
}

That would work. Now a cleaner way is to read the namespace URI form the document itself using the getNamespaces method:

1
2
3
4
5
6
7
8
9
10
...
foreach ($xml->channel->item as $entry){
  ...
  //Use that namespace
  $namespaces = $entry->getNameSpaces(true);
  //Now we don't have the URL hard-coded
  $dc = $entry->children($namespaces['dc']);
  echo $dc->publisher;
  echo $dc->creator;
}

That's it! I found this useful when getting an RSS feed using SimpleXML and wanting to parse the XML Namespace elements.

Comments

  1. Interesting post. Thanks for the share. Keep posting such kind of information on your blog. I bookmarked it for continuous visit.
    html5 media player

    ReplyDelete
  2. Thaaaank you !
    I spent all last day looking for that solution, and it’s gonna save my next two days (at least) !

    ReplyDelete

Post a Comment

Popular posts from this blog

ubuntu package installation

Drupal Bootstrap Database

How to fix 500 internal privoxy error