YB3.cc这个站点小说不错不少。这里整理发布能用的采集规则。
介绍一下关关采规则当中需要用到的一些标签
\d* 表示数字 \s* 表示空格+换行 .+? 表示字符(不能为空) .* 表示字符(可以为空)
() 表示我们需要的部分 ((.|\n)*) 章节的内容部分,包括了换行。
=====与杰奇后台标签的对应关系=====
!!!! 相当于 ([^><]*) ~~~~ 相当于 ([^><‘”]*) ^^^^ 相当于 ([^><\d]*)
$$$$ 相当于 ([\d]*)
**** 相当于 (.*)
如果不行。就根据相关提示调整 复制代码保存为 xml 文件。放在关关规则文件夹里。在关关里面选择即可,规则适用于 V1.20.7.9 版本,关关文件夹日期:2016.4.28 这个版本的关关。
<?xml version="1.0"?> <RuleConfigInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <RuleVersion> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>xsz.tw.小说网TXT</Pattern> <RegexName>RuleVersion</RegexName> </RuleVersion> <RuleID> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>93</Pattern> <RegexName>RuleID</RegexName> </RuleID> <GetSiteName> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>yb3</Pattern> <RegexName>GetSiteName</RegexName> </GetSiteName> <GetSiteCharset> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>utf-8</Pattern> <RegexName>GetSiteCharset</RegexName> </GetSiteCharset> <GetSiteUrl> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>https://www.yb3.cc/</Pattern> <RegexName>GetSiteUrl</RegexName> </GetSiteUrl> <NovelSearchUrl> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern /> <RegexName>NovelSearchUrl</RegexName> </NovelSearchUrl> <NovelSearchData> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern /> <RegexName>NovelSearchData</RegexName> </NovelSearchData> <NovelSearch_GetNovelKey> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern /> <RegexName>NovelSearch_GetNovelKey</RegexName> </NovelSearch_GetNovelKey> <NovelListUrl> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>https://www.yb3.cc/</Pattern> <RegexName>NovelListUrl</RegexName> </NovelListUrl> <NovelListFilter> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern /> <RegexName>NovelListFilter</RegexName> </NovelListFilter> <NovelList_GetNovelKey> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern><span class="s2"><a href="/5200/(\d+)/" target="_blank"></Pattern> <RegexName>NovelList_GetNovelKey</RegexName> </NovelList_GetNovelKey> <NovelUrl> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>http://www.yb3.cc/5200/{NovelKey}/</Pattern> <RegexName>NovelUrl</RegexName> </NovelUrl> <NovelErr> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>对不起,该文章不存在!</Pattern> <RegexName>NovelErr</RegexName> </NovelErr> <NovelName> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>og:novel:book_name" content="(.+?)"</Pattern> <RegexName>NovelName</RegexName> </NovelName> <NovelAuthor> <FilterPattern><a.+?> </a> &nbsp;</FilterPattern> <Method>Match</Method> <Options>None</Options> <Pattern>og:novel:author" content="(.+?)"</Pattern> <RegexName>NovelAuthor</RegexName> </NovelAuthor> <LagerSort> <FilterPattern><a.+?> </a> &nbsp;</FilterPattern> <Method>Match</Method> <Options>None</Options> <Pattern>og:novel:category" content="(.+?)"</Pattern> <RegexName>LagerSort</RegexName> </LagerSort> <SmallSort> <FilterPattern><a.+?> </a> &nbsp;</FilterPattern> <Method>Match</Method> <Options>None</Options> <Pattern>og:novel:category" content="(.+?)"</Pattern> <RegexName>SmallSort</RegexName> </SmallSort> <NovelIntro> <FilterPattern><script((.|\n)*?)</script> &lt;♂< &gt;♂> <a.+?</a> </div> </p></FilterPattern> <Method>Match</Method> <Options>None</Options> <Pattern>og:description" content="((.|\n)*?)"</Pattern> <RegexName>NovelIntro</RegexName> </NovelIntro> <NovelKeyword> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>og:novel:book_name" content="(.+?)"</Pattern> <RegexName>NovelKeyword</RegexName> </NovelKeyword> <NovelDegree> <FilterPattern>a♂已完结 b♂连载中</FilterPattern> <Method>Match</Method> <Options>None</Options> <Pattern>og:novel:status" content="(.+?)"</Pattern> <RegexName>NovelDegree</RegexName> </NovelDegree> <NovelCover> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>og:image" content="(.+?)"</Pattern> <RegexName>NovelCover</RegexName> </NovelCover> <NovelDefaultCoverUrl> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>noimg.jpg</Pattern> <RegexName>NovelDefaultCoverUrl</RegexName> </NovelDefaultCoverUrl> <NovelInfo_GetNovelPubKey> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern><meta property="og:novel:read_url" content="(.+?)"/></Pattern> <RegexName>NovelInfo_GetNovelPubKey</RegexName> </NovelInfo_GetNovelPubKey> <PubCookies> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern /> <RegexName>PubCookies</RegexName> </PubCookies> <PubIndexUrl> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>{NovelPubKey}</Pattern> <RegexName>PubIndexUrl</RegexName> </PubIndexUrl> <PubIndexErr> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>获得目录页错误</Pattern> <RegexName>PubIndexErr</RegexName> </PubIndexErr> <PubVolumeContent> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern /> <RegexName>PubVolumeContent</RegexName> </PubVolumeContent> <PubVolumeSplit> <FilterPattern /> <Method>Spilt</Method> <Options>None</Options> <Pattern><h3</Pattern> <RegexName>PubVolumeSplit</RegexName> </PubVolumeSplit> <PubVolumeName> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>>(.+?)</h3></Pattern> <RegexName>PubVolumeName</RegexName> </PubVolumeName> <PubChapterName> <FilterPattern>~伪后记~|伪后记</FilterPattern> <Method>Match</Method> <Options>None</Options> <Pattern><dd><a href="/5200/\d+/\d+.html">(.+?)</a></Pattern> <RegexName>PubChapterName</RegexName> </PubChapterName> <PubChapter_GetChapterKey> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern><dd><a href="(/5200/\d+/\d+.html)">.+?</a></dd></Pattern> <RegexName>PubChapter_GetChapterKey</RegexName> </PubChapter_GetChapterKey> <PubContentUrl> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>{ChapterKey}</Pattern> <RegexName>PubContentUrl</RegexName> </PubContentUrl> <PubContentErr> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>获得章节内容页错误</Pattern> <RegexName>PubContentErr</RegexName> </PubContentErr> <PubTextUrl> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern /> <RegexName>PubTextUrl</RegexName> </PubTextUrl> <PubContentText> <FilterPattern><span.+?>|<font.+?>|<[Ss][Cc][Rr][Ii][Pp][Tt](.|\n)+?</[Ss][Cc][Rr][Ii][Pp][Tt]>|<[Ff][Oo][Nn][Tt](.|\n)*?</[Ff][Oo][Nn][Tt]>|<[Ii][Ff][Rr][Aa][Mm][Ee](.|\n)+?</[Ii][Ff][Rr][Aa][Mm][Ee]>|<[Aa].+?</[Aa]>|<[Dd][Ii][Vv].+?>|</[Dd][Ii][Vv]>|<!--.+?-->|<[Ss>][Pp][Aa][Nn](.|\n)*?</[Ss>][Pp][Aa][Nn]>|0.{0,10}0.{0,10}小.{0,10}说|</br>|<br>|本書首发于看書罔|未完待续|</span>|</>|</font>|\[\$|妙\]|\[笔|\$|i\]|\[-阁\]|com|\(。\)|U8\?小说|\?.\?|U\?8\?X\?S|\?U\?|8\?小说|U\?8\?X|S\?|\?U8|小说|U|8|\?X\s*\?|\?\?U|8小|说\?|X|S`|[WwMm]+\.[0-9a-zA-Z]*\.[CcOoMmIiNnEeTtLlAa]|手机用户|请浏览|m.114zw.la|阅读|更优质的阅读体验|天才壹秒記住|114|中文网|』|ф|①|④ω|z|la|呅網|為您|提供精彩|小說閱讀|『|起点读书|最快更新|无弹窗请|&nbsp;&nbsp;&nbsp;&nbsp;</FilterPattern> <Method>Match</Method> <Options>IgnoreCase</Options> <Pattern><div id="content">((.|\n)+?)</div></Pattern> <RegexName>PubContentText</RegexName> </PubContentText> <PubContentPageUrl> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern /> <RegexName>PubContentPageUrl</RegexName> </PubContentPageUrl> <PubContentPageKey> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern /> <RegexName>PubContentPageKey</RegexName> </PubContentPageKey> <PubContentReplace> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern>[WwWwωщщψшШ].{0,3}[WwWwωщщψшШ].{0,3}[WwWwωщщψшШ].{0,3}[00OoOoο].{0,3}[00OoOoο].{0,3}[XxXxχ].{0,3}[SsSs].{0,7}[CcCcСΓ].{0,3}[00OoOoοó].{0,3}[MmMmМ]|[00OoOoο].{0,3}[00OoOoο].{0,3}[XxXxχ].{0,3}[SsSs].{0,7}[CcCcСΓ].{0,3}[00OoOoοó].{0,3}[MmMmМ]|[HhHΗh].{0,3}[TtTt].{0,3}[TtTt].{0,3}[PpPpρр]://|[WwWwωщщψ].{0,3}[WwWwωщщψ].{0,3}[WwWwωщщψ]|[WwWwωщщψ].{0,3}[AaàAaαа].{0,3}[PpPpρр]|[CcCcС].{0,3}[00OoOoο].{0,3}[MmMmМ]|[NnNnΠ∩η].{0,3}[EeEeε].{0,3}[TtTt]|[00OoOoο].{0,3}[RrRr].{0,3}[GgGg]|[CcCcС].{0,3}[NnNnΠ∩η]</Pattern> <RegexName>PubContentReplace</RegexName> </PubContentReplace> <PubContentChapterName> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern /> <RegexName>PubContentChapterName</RegexName> </PubContentChapterName> <PubContentChapterNum> <FilterPattern /> <Method>Match</Method> <Options>None</Options> <Pattern /> <RegexName>PubContentChapterNum</RegexName> </PubContentChapterNum> </RuleConfigInfo>
© 版权声明
内容来源于网络或本站原创,如有任何问题请联系站长
THE END
暂无评论内容