Главная страница


ru.perl

 
 - RU.PERL ----------------------------------------------------------------------
 From : Valentin Ermolaev                    2:463/544.12   24 Jun 2002  00:40:07
 To : Ivan V. Klepikov
 Subject : стринги
 -------------------------------------------------------------------------------- 
 
 
  IVK> задача такова. есть небольшой хтмл-файл. как сделать так, чтобы в нем
  IVK> поубирались все тэги? я что-то не догоню немножко как эти регекспы
  IVK> формируются... поможите!
 
 == [perldoc -q 'remove html'] ==
 =head1 Found in /usr/libdata/perl/5.00503/pod/perlfaq9.pod
 
 =head2 How do I remove HTML from a string?
 
 The most correct way (albeit not the fastest) is to use HTML::Parse
 from CPAN (part of the HTML-Tree package on CPAN).
 
 Many folks attempt a simple-minded regular expression approach, like
 C<s/E<lt>.*?E<gt>//g>, but that fails in many cases because the tags
 may continue over line breaks, they may contain quoted angle-brackets,
 or HTML comment may be present.  Plus folks forget to convert
 entities, like C<<> for example.
 
 Here's one "simple-minded" approach, that works for most files:
 
     #!/usr/bin/perl -p0777
     s/<(?:[^>'"]*|(['"]).*?\1)*>//gs
 
 If you want a more complete solution, see the 3-stage striphtml
 program in
 http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz
 .
 
 Here are some tricky cases that you should think about when picking
 a solution:
 
     <IMG SRC = "foo.gif" ALT = "A > B">
 
     <IMG SRC = "foo.gif" 
          ALT = "A > B">
 
     <!-- <A comment> -->
 
     <script>if (a<b && a>c)</script>
 
     <# Just data #>
 
     <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
 
 If HTML comments include other tags, those solutions would also break
 on text like this:
 
     <!-- This section commented out.
         <B>You can't see me!</B>
 
     -->
 
 == [perldoc -q 'remove html'] ==
 
 --- [VE2-UANIC]
  * Origin:  (2:463/544.12)
 
 

Вернуться к списку тем, сортированных по: возрастание даты  уменьшение даты  тема  автор 

 Тема:    Автор:    Дата:  
 стринги   Ivan V. Klepikov   23 Jun 2002 13:24:12 
 стринги   Valentin Ermolaev   24 Jun 2002 00:40:07 
 стринги   Alexey A Kudacov   23 Jun 2002 21:58:04 
 Re: стринги   Artem Chuprina   26 Jun 2002 13:42:53 
Архивное /ru.perl/33150d164056.html, оценка 1 из 5, голосов 10
Яндекс.Метрика
Valid HTML 4.01 Transitional