-
43251.近似图挖掘与标签的成本
[信息传输、软件和信息技术服务业] [2013-12-05]
Many real-world graphs have complex labels on the nodes and edges. Mining only exact patterns yields limited insights, since it may be hard to find exact matches. However, in many domains it is relatively easy to compute some cost (or distance) between different labels. Using this information, it becomes possible to mine a much richer set of approximate subgraph patterns, which preserve the topol- ogy but allow bounded
label mismatches. We present novel and scalable methods to efficiently solve the approximate isomorphism problem. We show that the mined approximate patterns yield interesting patterns in several real-world
graphs ranging from IT and protein interaction networks to protein structures.
关键词:数据挖掘;图分析;近似技术
-
43252.去噪及两个最好的降噪法
[信息传输、软件和信息技术服务业] [2013-12-05]
Given two arbitrary sequences of denoisers for block lengths tending to infinity we ask if it is possible to construct a third sequence of denoisers with an asymptotically vanishing (in block length) excess expected loss relative to the best expected loss of the two given denoisers for all clean channel input sequences. As in the setting of DUDE, which solves this problem when the given denoisers are sliding block denoisers, the construction is allowed to depend on the two given denoisers and the channel transition probabilities. We show that under certain restrictions on the two given denoisers the problem can be solved using a traightforward application of a known loss estimation paradigm. We then show by way of a counter-example that the loss estimation approach fails in the general case. Finally, we show that for the binary symmetric channel, combining the loss estimation with a randomization step leads to a solution to the stated problem under no restrictions on the given denoisers.
关键词:损失评估范式;渠道转型概率;直接应用
-
43253.通用的碰撞弹性存储为靛蓝和其他
[信息传输、软件和信息技术服务业] [2013-12-05]
Computer crashes threaten application data integrity. The threat is particularly acute for expensive industrial equipment such as high-volume printing presses, whose stringent uptime requirements demand both high performance during normal operation and rapid recovery following crashes. We have designed and implemented a novel general-purpose persistent memory buffer that protects application data from crashes and is easy to integrate into existing software. Our solution slid with remarkable ease beneath the mature, complex, highly tuned software that controls HP Indigo printing presses, reducing recovery times from days to minutes while adding negligible overhead to failure-free operation. The new system has been in successful production use at dozens of beta test sites for several months and will eventually ship with all new Indigo presses. Our novel crash resilience strategy is not specific to Indigo presses and is likely applicable in a wide range of HP products, so we have developed a portable implementation that is available upon request.
关键词:容错;靛蓝;存储;持续堆
-
43254.IBM加速器机数据分析,第6部分:加快InfoSphereBigInsights应用的故障排除
[信息传输、软件和信息技术服务业] [2013-12-05]
Expedite the troubleshooting of InfoSphere? BigInsights? applications using the IBMAccelerator for Machine Data Analytics with IBM InfoSphere BigInsights to do Hadoop loganalysis.
关键词:加速器;机器数据分析;故障排除
-
43255.DWDM集成硅光子收发器与自适应的CMOS电路的芯片到芯片光学互连
[信息传输、软件和信息技术服务业] [2013-12-05]
The rapid scaling of microprocessors has shifted the critical bottleneck of high‐performance computing systems from the computational units to the communication infrastructure. By taking advantage of the parallelism and capacity of dense wavelength‐division‐multiplexed (DWDM) technology, optical interconnects using nanophotonics offer a high‐bandwidth, low‐latency, and energy‐efficient solution within a small footprint, when compared with their electrical counterparts.
关键词:光学互连;自适应;集成硅光子收发器
-
43256.疯狂的:迈向SDN控制程序的模块化组合
[信息传输、软件和信息技术服务业] [2013-12-05]
Software-Defined Networking (SDN) promises to enable vigorous innovation, through separation of the control plane from the data plane, and to enable novel forms of network management, through a controller that uses a global view to make globally-valid decisions. The design of SDN controllers creates novel challenges; much previous work has focused on making them scalable, reliable, and efficient. However, prior work has ignored the problem that multiple controller functions may be competing for resources (e.g.,link bandwidth or switch table slots). Our Corybantic design supports modular composition of independent controller modules, which manage different aspects of the network while competing for resources. Each module tries to optimize one or more objective functions; we address the challenge of how to coordinate between these modules to maximize the overall value delivered by the controllers' decisions, while still achieving modularity.
关键词:SDN;资源管理;构成;政策;拍卖
-
43257.实时异常检测使用InfoSphereStreams时间序列工具箱
[信息传输、软件和信息技术服务业] [2013-12-05]
InfoSphere Streams, which processes data in real time, includes the TimeSeries Toolkit forbuilding real-time analytical solutions. With the TimeSeries Toolkit operators for preprocessing,analyzing, and modeling multidimensional time series data in real time, you can create ananomaly detection application to monitor systems across the domains of cybersecurity,infrastructure, data center management, healthcare, and environment.
关键词:异常检测;应用程序;时间序列
-
43258.提高以使用内联块为基础的重复数据删除的备份系统的恢复速度
[信息传输、软件和信息技术服务业] [2013-12-05]
Slow restoration due to chunk fragmentation is a serious problem facing inline chunk-based data deduplication systems: restore speeds for the most recent backup can drop orders of magnitude over the lifetime of a system. We study three techniques--increasing cache size, container capping, and using a forward assembly area--for alleviating this problem. Container capping is an ingest-time operation that reduces chunk fragmentation at the cost of forfeiting some deduplication, while using a forward assembly area is a new restore-time caching and prefetching technique that exploits the perfect knowledge of future chunk accesses available when restoring a backup to reduce the amount of RAM required for a given level of caching at restore time. We show that using a larger cache per stream--we see continuing benefits even up to 8 GB--can produce up to a 5-16X improvement, that giving up as little as 8% deduplication with capping can yield a 2-6X improvement, and that using a forward assembly area is strictly superior to LRU, able to yield a 2-4X improvement while holding the RAM budget constant.
关键词:重复数据删除;碎片;恢复;缓存;离线缓存
-
43259.从SPSS调用R——SPSS的R插件介绍
[信息传输、软件和信息技术服务业] [2013-12-05]
Starting with version 16, IBMSPSS provides a free plug-in that enables you to run R syntaxfrom within SPSS. The plug-in connects R to the active database. You can write results that areobtained from R into a new SPSS database for further manipulation in SPSS. This article is forthe reader who is familiar with R and SPSS but who has not yet tried to use them in tandem.
关键词:免费插件;活动数据库;串联
-
43260.大数据架构和模式,第3部分:了解大数据解决方案的架构层
[信息传输、软件和信息技术服务业] [2013-12-05]
The logical layers of a big data solution help define and categorize the various componentsrequired for a big data solution that must address the functional and non-functionalrequirements for a given business case. This set of logical layers outlines the criticalcomponents of a big data solution from the point where data is acquired from various datasources to the analysis required to derive business insight to the processes, devices, andhumans who need the insight.
关键词:逻辑层;分析;关键组成部分