欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

PHP cdata 处理(详细介绍)

程序员文章站 2023-01-02 19:13:19
当时在网上找了一个cdata的转换器, 修改之后, 将cdata标签给过滤掉。如下复制代码 代码如下: // states:   &...
当时在网上找了一个cdata的转换器, 修改之后, 将cdata标签给过滤掉。如下
复制代码 代码如下:

 // states:
        //
        //     'out'
        //     '<'
        //     '<!'
        //     '<!['
        //     '<![c'
        //     '<![cd'
        //     '<![cdat'
        //     '<![cdata'
        //     'in'
        //     ']'
        //     ']]'
        //
        // (yes, the states a represented by strings.)
        //
        $state = 'out';
        $a = str_split($xml);
        $new_xml = '';
        foreach ($a as $k => $v) {
            // deal with "state".
            switch ( $state ) {
                case 'out':
                    if ( '<' == $v ) {
                        $state = $v;
                    } else {
                        $new_xml .= $v;
                    }
                break;
                case '<':
                    if ( '!' == $v  ) {
                        $state = $state . $v;
                    } else {
                        $new_xml .= $state . $v;
                        $state = 'out';
                    }
                break;
                 case '<!':
                    if ( '[' == $v  ) {
                        $state = $state . $v;
                    } else {
                        $new_xml .= $state . $v;
                        $state = 'out';
                    }
                break;
                case '<![':
                    if ( 'c' == $v  ) {
                        $state = $state . $v;
                    } else {
                        $new_xml .= $state . $v;
                        $state = 'out';
                    }
                break;
                case '<![c':
                    if ( 'd' == $v  ) {
                        $state = $state . $v;
                    } else {
                        $new_xml .= $state . $v;
                        $state = 'out';
                    }
                break;
                case '<![cd':
                    if ( 'a' == $v  ) {
                        $state = $state . $v;
                    } else {
                        $new_xml .= $state . $v;
                        $state = 'out';
                    }
                break;
                case '<![cda':
                    if ( 't' == $v  ) {
                        $state = $state . $v;
                    } else {
                        $new_xml .= $state . $v;
                        $state = 'out';
                    }
                break;
                case '<![cdat':
                    if ( 'a' == $v  ) {
                        $state = $state . $v;
                    } else {
                        $new_xml .= $state . $v;
                        $state = 'out';
                    }
                break;
                case '<![cdata':
                    if ( '[' == $v  ) {
                        $cdata = '';
                        $state = 'in';
                    } else {
                        $new_xml .= $state . $v;
                        $state = 'out';
                    }
                break;
                case 'in':
                    if ( ']' == $v ) {
                        $state = $v;
                    } else {
                        $cdata .= $v;
                    }
                break;
                case ']':
                    if (  ']' == $v  ) {
                        $state = $state . $v;
                    } else {
                        $cdata .= $state . $v;
                        $state = 'in';
                    }
                break;
                case ']]':
   if (  '>' == $v  ) {
    $new_xml .= htmlentities($cdata);
#       $new_xml.= $cdata;
//                        $new_xml .= str_replace('>','>',
  //                                  str_replace('>','<',
    //                                str_replace('"','"',
      //                              str_replace('&','&',
        //                            $cdata))));
                        $state = 'out';
                    } else {
                        $cdata .= $state . $v;
                        $state = 'in';
                    }
                break;
            } // switch
        }
        //
        // return.
        //
            return $new_xml;

最近发现,总是有alert发出来, 说是simplexml解析出错。

发现是原来有xml的数据是<![cdata[domain[test]]] >. 出现了连续的3个], 造成上面的解析函数不能处理。

而且这个问题很难修正, 你不知道下次会不会有4, 5个]出现。

所以决定还是将这段解析 的代码换成dom xml,本身 dom的处理还是比较简单的,

包含domelement, domdocument, domnodelist, domnode几个 component.

对于 domnode有nodevalue, nodetype, nodename的成员函数。

首先先用loadxml将string转化为domdocument对像, 再用getelementsbytagname转化为domnodelist对像, 再使用->item(0)转化为domnode, 然后就可以使用上面的三种方法了。

对于 <aa color='red'>test</aa>这种xml标签, 要使用 attribute函数。